Workload and Durability (Part 1)

by Robert Dudek
November 30, 2004

Much has been written on the subject of pitch counts. In some quarters, the notion that high pitch counts are dangerous to a pitcher’s health is an article of faith; the idea makes intuitive sense. There is just one problem — a lack of evidence in its favor.

Earlier this year, Rob Neyer and Bill James published the exceptional Neyer/James Guide to Pitchers — an encyclopedia of the pitching repertoire of nearly every significant pitcher in Major League and Negro League history. In one essay, titled “Abuse and Durability” (pp. 449-463), James runs a series of matched-pair studies, identifying the most similar non-abused pitchers to pitchers listed as “abused” in various editions of Baseball Prospectus based on the Pitcher Abuse Points (PAP) system devised by Rany Jazayerli and Keith Woolner. The results skew in one direction: the “abused” pitchers keep more of their value (on average) than comparable “non-abused” pitchers. That’s right — keep their value.

James concludes his essay speculating about what is behind the phenomenon:

Most injuries to pitchers are not the result of chronic overuse; some are, particularly to young pitchers, but most are not. They’re catastrophic events, just like a heart attack or a torn muscle. They happen suddenly, and they happen when a pitcher goes outside the envelope of his previous conditioning.

Backing away from the pitcher’s limits too far doesn’t make a pitcher less vulnerable; it makes him more vulnerable. And pushing the envelope, while it may lead to a catastrophic event, is more likely to enhance the pitcher’s durability than to destroy it.

And yet, questions linger. James himself notes that since power pitchers last longer (and tend to throw more pitches per inning) than finesse types, controlling for quality of pitcher isn’t sufficient to isolate the effect of high pitch counts. In addressing the issue of pitch count we must be sensitive to differences of pitcher type.

The quality of a matched-pair study depends on how similar your comparison groups are in all respects save for the one under study. On the other hand, pegging the similarity standard too high may lead to too few matches to tell us anything useful. A balance must be struck between sample size and degree of similarity.

Matched-Pair Workload Study #1

Starting with a large pool of players from which to match leads to more good matches. To that end, I settled on a pool of starting pitchers born after 1945 and before 1970. This 24-year period encompasses the baby boom and immediate post-boom generations. All but a handful of pitchers born before 1970 are either retired or no longer starting in the majors, so we don’t need to worry very much about incomplete data.

To start we need to define heavy and moderate workloads for starting pitchers. A heavy workload was defined as exceeding 3,800 estimated pitches⁽¹⁾ in a given year; 3,000 to 3,600 estimated pitches was defined as a moderate workload. Because of the power of the pitch count and the pervasiveness of the five-man rotation, very few pitchers have exceeded 3,800 pitches in recent years (starting 34 times, a pitcher would need to average almost 112 pitches a start).

Group A pitchers were those who had at least one heavy workload season before age 28. Group B pitchers were those who never exceeded 3,600 estimated pitches in a year before age 28. Matches were based on highest similarity score, using single season to single season comparisons, and taking into account the following characteristics:

(1) Strikeouts per Opportunity [K/(BF-IW]

(2) Non-Intentional Walks per Opportunity [(W-IW)/(BF-HBP-IW)]

(3) Earned Run Average [ER/IP*9]

(4) Year of Birth

(5) Age on July 1st⁽²⁾

(6) The matched pitchers must throw with the same hand

Here’s a hypothetical example of how similarity scores work in this study. Imagine two pitchers with identical ERAs, strikeout rates and walk rates. These pitchers are the same age (to the day) and are born in the same year. The Group A pitcher, however, throws 750 estimated pitches more than the Group B pitcher. The method considers this a perfect match — earning 1,000 points. In actual cases, the differences in each category result in points deducted from 1,000; the higher the final similarity score, the greater the (statistical) similarity between the two pitchers.

The final requirement was that no Group B pitcher could be matched with more than one Group A pitcher; the match with the higher similarity score was given priority. Each matched season was designated Year Zero for that particular pitcher. A more detailed description of the comparison method⁽³⁾ can be found in the footnotes.

Quality Control

Before we turn to the matched pairs, let’s consider what James calls “quality leakage.” James noted that in matched pair studies, there is a tendency for very good pitchers to be matched with lesser pitchers because the former are usually unique. James’ solution was to select pitchers for his “Group B” that were of slightly higher quality (more Win Shares) than his “Group A” pitchers so as to offset the leakage. I took a different approach: I disposed of the worst third (according to similarity score) of the matched pairs.

Of the 69 matched pairs, the 23 least similar pairs were removed from consideration. I believe this is sufficient to alleviate the worst effects of the quality leakage problem, while maintaining a sufficiently large sample. To illustrate, the worst “match” among the original 69 pairs was Nolan Ryan/David Cone. Ryan is nearly a generation older than Cone and walked and struck out batters at a greater rate as a young pitcher. Because they are so dissimilar, there is no reason to think that the Ryan/Cone match tells us anything about durability.

**Unmatched Group A pitchers**
Vida Blue (’71)	Ted Higuera (’86)	John Montefusco (’75)
Bert Blyleven (’73)	Catfish Hunter (’72)	Mike Mussina (’96)
Jim Clancy (’80)	Randy Jones (’76)	Gary Nolan (’70)
Joe Coleman (’74)	Clay Kirby (’71)	J.R. Richard (’76)
Ron Darling (’85)	Mark Langston (’87)	Nolan Ryan (’74)
Larry Dierker (’69)	Bill Lee (’73)	Frank Tanana (’76)
Dwight Gooden (’85)	Dennis Leonard (’77)	Fernando Valenzuela (’82)
Ron Guidry (’78)	Jon Matlack (’74)

The “cast-offs” were pooled to create a new group (Group C); I’ll consider them in Part 2 of this series. A few Hall of Fame-type pitchers from Group A made it into the study, most notably Roger Clemens and Greg Maddux. Should we exclude them as well? Arbitrarily removing “special arms” seems like a sensible approach, but it creates its own problems (which I will also consider in Part 2). Hand-picking which pairs stayed and which went was not the path I wanted to go down.

Without further ado, the 92 subjects of Study #1 are:

Group A Pitcher	Sim.	Group B Pitcher	—	Group A Pitcher	Sim.	Group B Pitcher
Len Barker(’80)	929	Jose Guzman(’88)		D.Lemanczyk(’77)	959	Bart Johnson(’76)
Bill Bonham(’74)	936	Ken Forsch(’73)		Greg Maddux(’91)	959	Andy Benes(’92)
Oil Can Boyd(’85)	954	John Burkett(’90)		Dennis Martinez(’79)	955	Bill Gullickson(’83)
Tom Bradley(’71)	927	Reggie Cleveland(’72)		Jack McDowell(’92)	957	S.Bankhead(’89)
Kevin Brown(’92)	955	Pedro Astacio(’96)		Doc Medich(’74)	967	Bob Moose(’73)
Tom Browning(’85)	972	Jamie Moyer(’88)		Mike Moore(’86)	984	Andy Hawkins(’86)
Ron Bryant(’73)	956	John Curtis(’73)		Jack Morris(’82)	966	Eric Show(’83)
Steve Busby(’74)	933	Gary Gentry(’69)		Mike Norris(’80)	925	Orel Hershiser(’85)
Roger Clemens(’87)	966	Erik Hanson(’90)		Melido Perez(’92)	962	Pete Harnisch(’93)
Jim Colborn(’73)	928	Dave Frost (’79)		Dan Petry(’83)	953	Jay Tibbs (’85)
Joe Decker(’74)	938	Buzz Capra(’74)		Rick Reuschel(’74)	949	Rick Langford(’77)
D.Eckersley(’78)	971	Scott Sanderson(’80)		Jerry Reuss(’73)	944	Bob Shirley(’77)
Cal Eldred(’93)	953	Ben McDonald(’92)		Steve Rogers(’77)	942	Burt Hooten(’77)
R.Erickson(’78)	950	Mark Lemongello(’78)		Bret Saberhagen(’88)	929	Frank Castillo(’92)
Alex Fernandez(’96)	956	Tommy Greene(’93)		Jim Slaton(’76)	953	Bob Forsch(’75)
Ed Figueroa(’76)	935	Alan Foster(’73)		John Smoltz(’93)	936	Kevin Appier(’95)
Mike Flanagan(’78)	942	Bob Ojeda(’84)		Mario Soto(’83)	953	Tim Belcher(’89)
W.Garland(’77)	962	Doyle Alexander(’77)		Paul Splittorf(’73)	941	John Candelaria(’80)
Ross Grimsley(’74)	935	Ken Brett (’73)		Dave Stieb(’83)	956	Charlie Lea (’83)
Mark Gubicza(’88)	967	Ken Hill(’92)		Rick Sutcliffe(’83)	953	Dave Stewart(’84)
Ed Halicki(’77)	953	Pete Vuckovich(’79)		Dick Tidrow(’73)	946	Glenn Abbott(’77)
Pat Hentgen(’96)	964	Ramon Martinez(’95)		Frank Viola(’86)	948	Britt Burns(’85)
Jim Hughes(’75)	938	Dave Freisleben(’74)		Mike Witt(’86)	936	Jose Rijo(’91)

The weighted average performance of the Group A pitchers was 17 wins, 13 losses, 3.52 ERA, 15.0% strikeout rate, 7.3% walk rate, 268.0 IP, and 4,038 estimated pitches.

The weighted average performance of the Group B pitchers was 13 wins, 11 losses, 3.54 ERA, 15.0% strikeout rate, 7.5% walk rate, 216.7 IP, and 3,268 estimated pitches.

The only significant statistical differences between the two groups in Year Zero are those related to workload. Aha, you might say — that’s only one season. Could the Group B pitchers be (in truth) inferior and their Year Zero performance merely a result of a preponderance of career years? Could there be differences in performance in the years leading up to the seasons in question? The numbers for the average Group A and Group B pitcher for the three years up to and including Year Zero …

**Year -2 to Year Zero**
	IP	Pitches	ERA	K rate	W rate	Wins	Losses
Group A average	594.7	9008	3.54	15.3	7.6	36	30
Group B average	452.7	6833	3.56	15.2	7.4	27	24

… tell the same tale. Apart from workload indicators, the two groups appear to be a very good match.

Suppose you are the general manager of a baseball team and are considering acquiring one of two pitchers: a 25-year-old pitcher who threw 3,900 pitches in 2004 and a very similar pitcher who threw only 3,300. Your scouts don’t turn up any major differences between the two and their overall performance over the last three years has also been very similar. The one difference is that the first pitcher has been subjected to a significantly greater workload than the second pitcher. Who would you choose and why?

Is surviving the heavy workload a marker of greater durability, or instead does the greater “mileage” mean you’d be better off acquiring the “underused” pitcher? The answer … next week.

References & Resources
⁽¹⁾ Pitches thrown were estimated using the Extended Pitch Count Estimator developed by Tangotiger.

⁽²⁾Age was calculated using exact date of birth as of July 1st of the year in question.

⁽³⁾Similarity Scores were determined by dividing the assigned weight for each category by the standard error based on the population of 3000+ pitch seasons in the pool. The weights for each category were as follows: strikeout rate= 40 points; ERA= 40 points; Age= 30 points; birth year= 30 points; walk rate= 20 points; estimated pitches=20 points; Total= 180. For all categories (except estimated pitches thrown) the absolute difference between the two pitchers was multiplied by the assigned weight and divided by the standard error. For estimated pitches, the absolute difference from a difference of 750 pitches was multiplied by the assigned weight and divided by the standard error.

Sample Calculation (Figures in blue = standard error)

Pat Hentgen (1996), born 1968: 16.1% K rate, 8.3% W rate, 3.22 ERA, 27.63 age, 4,012 estimated pitches
Ramon Martinez (1995), born 1968: 16.2% K rate, 9.0% W rate, 3.66 ERA, 27.28 age, 3,150 estimated pitches

Strikeout Points: abs(.161-.162)40/.0400 = 1.00*	Walk Points: abs(.083-.090)20/.0211 = 6.64*
ERA Points: abs(3.22-3.66)40/1.026 = 17.15*	Age Points: abs(27.63-27.28)30/1.814 = 5.79*
Year of Birth Points: abs(1968-1968)30/6.99 = 0.00*
Estimated Pitches Points: (abs(750-abs(4012-3150)))20/354.6 = 6.32*

Sum of Deductions: 1.00 + 6.64 + 17.15 + 5.79 + 0.00 + 6.32 = 36.90

Similarity Score = 1000 – 36.90 = 963.10 (rounded off to 963**)

** Due to rounding errors in the above calculations, the correct similarity score was not 963, but rather 964 (as noted in the main text)

BAL	CHW	ATH
BOS	CLE	HOU
NYY	DET	LAA
TBR	KCR	SEA
TOR	MIN	TEX

ATL	CHC	ARI
MIA	CIN	COL
NYM	MIL	LAD
PHI	PIT	SDP
WSN	STL	SFG