Pitchers and the Seven-Month Season

by Shane Tourtellotte
February 25, 2016

Andy Pettitte had six heavy-use postseasons in his career. (via Chris Ptacek)

At the end of my last article at THT, “The In-Season Aging Curve,” I indulged in some speculation about whether older pitchers’ skills eroded faster during the playing season than in the offseason. The data I used gave me no grounds for a conclusion either way. Were the erosion to happen faster in-season, though, it raised the unfortunate possibility that pitchers who had longer seasons—meaning those who pitched deep into the postseason—would be worn down by the grind and pitch worse the next season, and possibly beyond.

I teased that I might have more to say on the matter in months to come. Teasing isn’t really nice, so I got to work on the matter right away.

I wound up both narrowing and expanding the question I posed. I looked at just the following year after a heavy postseason workload, and I did not limit myself to older pitchers. This was probably a wise shift, since two of the biggest controversies surrounding pitcher workloads and the postseason in recent years have involved younger hurlers.

In 2012, Stephen Strasburg had just recovered from major arm surgery and a virtually lost 2011, but was pitching well. The Washington Nationals front office, though, was cautious with its young blue-chipper, and declared partway through the season that Strasburg would not be allowed to pitch past 160 to 180 innings. When Washington vaulted into playoff position ahead of expectations, management refused to alter its stance to make use of Strasburg in October. He was shut down after his Sept. 7 start, at 159.1 frames, and the Nationals lost the National League Division Series, three games to two, without him.

The front office was thinking ahead to the next four seasons that Strasburg would be pitching for them, trying to insure that he’d be available and effective for what they anticipated would be several playoff runs and a chance at multiple rings. Instead, the following three seasons brought just one more postseason appearance, which also ended in the NLDS. Many Nationals fans, looking at very likely just one more season of Strasburg before he departs via free agency, have tormented themselves playing the “what if” game.

(For added perspective on the Strasburg affair, Jack Marshall’s article on baseball ethics in the 2013 THT Baseball Annual is recommended.)

Just last year, history began to repeat itself. Matt Harvey, a hot young pitcher who lost the previous season to Tommy John surgery, had a strong comeback year that helped push his New York Mets into serious contention. Then talk of limiting Harvey’s workload floated up, this time advanced by his agent, Scott Boras. A feeding frenzy ensued in New York media, with the result that Harvey kept pitching in September, and October, and November.

Many of you recall how that ended. I even did a little writing about it during some postseason moonlighting at FanGraphs. It may not have ended well, but Harvey’s pitching did help get the Mets into the World Series, and any price it exacts later the Mets will worry about later—though “later” will be upon them pretty soon now.

These aren’t ordinary pitcher-workload situations, bound up as they are with injury concerns. Still, they illustrate the trade-offs teams may fear, that riding a pitcher hard after Game 162 will leave him worse off for the season or seasons to come. Most teams, of course, will chase the glory and take the consequences.

It’s time to figure out what those consequences might be.

Ground Rules

I set a fairly high limit for what constitutes heavy postseason usage, requiring a pitcher to have at least 100 batters faced in the playoffs to qualify. I preferred batters faced to innings pitched for being a somewhat more precise measure of workload, though in retrospect I could have done better by using total pitches.

My time frame was 1996 to 2014. Before that, the 1994-95 strike shortened season lengths, keeping pitchers from a “full” workload; 2015 is unusable because we don’t know yet how pitchers did in the following season. Using these years and the batters-faced floor, I compiled 71 pitcher years that included heavy postseason workloads.

This was 52 individuals, some having multiple heavy-use postseasons. Andy Pettitte dominated with six, spread from 1996 to 2009. Roger Clemens had three, counting just the back half of his career. Also, every pitcher who qualified was a starter. Even with three-plus playoff rounds, modern usage patterns made it impossible for relievers to accrue enough batters faced. The closest approach was 70 by Francisco Rodriguez for the 2002 Anaheim Angels.

Once I had the seven-month pitchers, I needed the comps. I chose comparison pitchers from the same seasons, and required them to match the postseason workhorses in several categories. First, they had to pitch roughly the same number of innings in the regular season. I went with a five percent leeway to either side. Second, they had to be within one year of age of the subject. This was to keep differing levels of age-related decline (or improvement) from skewing the results.

A Hardball Times Update

by RJ McDaniel

Goodbye for now.

Third, they had to be roughly equivalent in pitching quality. I used both ERA- and FIP- as measures, doing the study for each. The comp pitcher had to be within 10 points plus or minus of the subject pitcher to be counted. This was to prevent unequal levels of regression to the mean from warping things, likely to the subject’s detriment. (One assumes a pitcher getting heavy duty in October will have performed well that season, and thus would be due some negative regression.)

Fourth and last, the comp pitcher had to have missed the playoffs, or at the very least not pitched in them. It does no good to measure pitchers against a control group that has the same difference you’re trying to measure. This carved away a distressingly large portion of an already narrowing group, especially when good pitching teammates on a deep playoff team eliminated themselves. (Hello, late-1990s Atlanta Braves.) Oddly, Pettitte didn’t end up with this trouble.

Before screening for playoff appearances, I had 44 pitchers with 104 comps for ERA-, and 40 pitchers with 105 comps for FIP-. Once I combed out potential comps who also pitched in October, those numbers dropped to 31 and 59 for ERA-, 34 and 67 for FIP-. Workhorse Andy Pettitte managed to retain four years with 10 and six comps respectively, the most of anyone, but Roger Clemens ended up with zero comps. Unique as the Rocket was, I ought not have been surprised by that.

Once the comp groups were selected, I looked at performance. For both ERA- and FIP-, I counted how much the playoff pitchers improved or degraded the following season, doing likewise for their comps. I then measured those movements against each other, to see whether the seven-month pitchers had comparatively better or worse performances.

I lost a few more comps at this point, because one pitcher in the ERA- comp group and two in the FIP- comp group did not pitch in the following year. Two missed the year but returned, while the other never pitched in the majors again. I note that all the October pitchers pitched the following season. This is a suggestion, though little more, that postseason workhorses are selected for durability. (Or perhaps it’s just that the presumed workhorses who ended up breaking down didn’t reach my batters-faced threshold.)

Results

First, the performances of the ERA- group. (Remember that in a minus metric, smaller is better. Just like ERAs.)

HEAVY POSTSEASON PITCHERS vs. COMPS, BY ERA-

Pitchers	ERA-	Next Yr. ERA-	Difference
Seven-month	84.74	89.16	4.42
Six-month	87.84	103.98	16.14

The non-playoff pitchers had an ERA- fall-off 11.7 points greater than the seven-month pitchers. By this measure, the hard-worked pitchers held much more of their value the following season.

There is a complication, however. Some of the comps had highly ineffective (and generally short) follow-up seasons, five of them packing on at least 90 added points of ERA-. This may throw off the mean averages, so I re-examined the numbers using medians. By that method, the comp pitchers averaged 10 points of drop-off rather than 16.1, and declined 2.5 points more than their matching postseason pitchers instead of 11.7. The result is not as dramatic, but the seven-month pitchers still retain more of their effectiveness.

(I should observe that none of the postseason toilers had nearly as great a collapse the following season as some of the comps. In fact, the biggest decline in the ERA- group, Matt Cain 2012-13, is 40 points, smaller than the biggest ERA- improvement, 46 points by Al Leiter 1997-98.)

The FIP- scale was created to measure core ability and filter out the variances that can make ERA fluctuate. We would expect the numbers to behave more sedately here, and they do that. (Numbers here do not add up precisely due to rounding.)

HEAVY POSTSEASON PITCHERS vs. COMPS, BY FIP-

Pitchers	FIP-	Next Yr. FIP-	Difference
Seven-month	88.75	90.89	2.14
Six-month	91.23	94.38	3.15

This time, the non-playoff pitchers dropped off by 1.02 FIP- points more than the playoff hurlers. A far smaller drop than with ERA-, but the direction remains the same: seven-month pitchers hold up better the next season.

There was just one FIP- crash by comps, gaining 75 points the following season. The next nearest were in the 30s, balanced by an equal number of minus-30s. Still, I ran the medians for this group also. The comps’ median FIP- decline was two points instead of 3.15, and the median difference from their matching postseason pitchers was one point, effectively equal to 1.02. The result again holds.

(And once again, the October workhorses showed a bigger maximum improvement—34 points by Leiter’s ’97-’98 again—than the maximum decline—27 points, done both by Colby Lewis 2010-11 and Jarrod Washburn 2002-3.)

It’s interesting to observe that the ERA- figures for the playoff workhorses were significantly better than the FIP- numbers, and made bigger regressions in the following seasons. Pitchers are being judged by the more traditional measures of effectiveness, the ones more influenced by single-season factors, when being chosen for big playoff roles. No surprise, really, but it could be worth watching whether, as analytics penetrates deeper into a new generation of managers, they begin using different criteria.

By either ERA- or FIP-, pitchers shouldering a lot of postseason work do not suffer for it the next season. If anything, they have a somewhat stronger follow-up year than their comps did. I’ll make further observations on this after a quick digression.

Something in an Extra-Long?

Of all the pitchers in the 19-year span of this survey, the one who pitched the most in a single postseason came in the final year I covered. Madison Bumgarner in 2014 pitched 52.2 postseason innings, facing 195 batters, almost twice my cutoff value for a heavy playoff workload. If you’re wondering, it didn’t seem to hurt him. He produced an 88/88 in ERA-/FIP- in the 2014 regular season, and followed it up last year with an 81/79 mark.

Bumgarner had the most postseason work of any pitcher in the Wild Card era (I checked 1995 and 2015 to be sure), with six starts plus an extended relief appearance. Before the divisional round arrived, the most starts a pitcher could plausibly have in one postseason was five. Even without combing through all the playoffs in history, I am confident in saying Madison Bumgarner worked the heaviest postseason in the history of baseball.

This is an anecdotal point in favor of long postseasons being no threat to a pitcher, or even being beneficial. Of course, one data point is a lousy foundation on which to base such a conclusion. Expand it to the top 10 postseason workhorses I gathered (not all of whom had comps), and they average rises of 2.3 ERA- points in the following year (against 4.42 for the whole set) and 5.3 FIP- points (versus 2.14 for everyone). That’s entirely ambiguous, not really pointing anywhere. If you want to be mischievous and expand it to the top 15, it ends up a drop of 0.33 ERA- points and a rise of just 1.73 FIP- points.

The point of this was to look for any sign that the heaviest of heavy postseason workloads might produce an erosion of skills the next season. Not only is there no such sign, what little indication there is says instead that it provides the slightest of benefits, much the same as for the regular heavy cohort.

I do not take the extra-long over-performance at face value. I do not even assume that the better performance of the full group derives from the added work they got in the playoffs. It is possible instead that managers know, through the experience they have with the pitchers on their staffs, who can bear the extended haul better. Shuffling a No. 3 pitcher into the second postseason slot, the kind of thing we see a fair amount but don’t think about very much after the fact, could be the method of giving an extra start to somebody deemed able to handle it.

Whether the effect comes from managers’ discretion or from longer work actually strengthening the pitchers, it’s safe to say that a long postseason will not leave a pitcher gassed in the year that follows. Those Mets fans who worried about the effects of wringing every last inning from Matt Harvey (or Royals fans thinking of Edinson Volquez) can breathe easier. Not that he’s guaranteed not to have a down year: he’ll just be running the same risks all pitchers do.

References and Resources

Baseball-Reference for pitchers’ postseason workloads
FanGraphs for pitchers’ peripherals

A writer for The Hardball Times, Shane has been writing about baseball and science fiction since 1997. His stories have been translated into French, Russian and Japanese, and he was nominated for the 2002 Hugo Award.

14 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Carl

9 years ago

Shane,

Interesting article, but couple of issues with your methodology. Why use comp pitchers at all? By doing so, you get different ages, different types of pitchers (fastball-slider, soft-tosser, left-righty, etc.) different leagues, etc. I would think a far simpler methodology that would remove these error points would be to look at the same pitchers the following season vs. the regular season they just finished. Perhaps then group by the ages of the pitchers to see if pitchers were less burdened by extra work if they were say age 27+?

Also, a frequent concern of post season burnout is that so many of the high leverage innings come on short rest. Whether Seaver in the 69 and 73 post seasons, Guidry in 78, Johnson in 01, Schilling in 04, or Sabathia in ’09 it seems (anecdotally) that power pitchers pitching on short rest in the post season leave something on the mound. I recall when Torre had the Yankees pitchers show up a week late to spring training to offset the extra workload. Would have loved to see you take your analysis down this type of road.

Michael Bacon

9 years ago

What about combining the regular season and post season work loads? It seems like how many pitches thrown during the whole season should be examined.
It would be interesting to examine a team that goes to the post season regularly, like the Maddux, Glavine, and John Smoltz, Braves, for example. It was strange to see this headline from Smoltz, a pitcher who underwent elbow ligament replacement surgery: John Smoltz thinks coddling players will ‘cripple’ baseball (http://www.nydailynews.com/sports/baseball/john-smoltz-thinks-coddling-players-cripple-baseball-article-1.2537344)
What about studying pitchers from 1969-1992 as a control with which to contrast the pitchers of the ragin’ roid era?
Any “cab driver” or “barber” will tell you that the heavier the work load, no matter when, or where, will ultimately have a deleterious effect upon a pitcher. Check out Dick Ellsworth with the early 60s Cubs, for example. He had never thrown over 200 innings until 1962 when he was 22 and hurled 209. The next season he threw 291 with a FIP of 2.63. In ’64 he completed 257 innings with a FIP of 4.04. He was never again the same pitcher he was when allowed to throw almost 300 innings.
Maybe MLB should talk to Bob Gibson, Steve Carlton, and Fergie Jenkins to ascertain how they were able to do it with only three days rest between starts “back in the day.” Maybe MLB should listen to a former pitcher who has studied the physics of pitching and has ideas outside of the “mainstream,” like Mike Marshall, who pitched in over 100 games and 200 innings in the 1974 regular season.
Check out what the pitching coach of the 1990-2000 Braves, Leo Mazone, had to say: http://washington.cbslocal.com/2014/04/22/mazzone-throwing-more-often-with-less-exertion-key-to-avoiding-arm-injuries/

Luis Venitucci

9 years ago

Better pitchers will pitch more. Since they ARE better, the likelihood of a performance decrease is less as well it would seem. Good research though. Thank you.

Jeff Zimmermanmember

9 years ago

Agree 100% with the conclusions:

http://www.fangraphs.com/blogs/did-bumgarner-and-shields-throw-too-many-pitches/

www.youtube.com

9 years ago

Heya great website! Does running a blog such as this take a large amount of work?
I’ve absolutely no expertise in coding but I was hoping to start my own blog in the near future.
Anyhow, should you have any suggestions or techniques for new
blog owners please share. I know this is off subject but I just
needed to ask. Kudos!

MGL

9 years ago

“Third, they had to be roughly equivalent in pitching quality. I used both ERA- and FIP- as measures, doing the study for each. The comp pitcher had to be within 10 points plus or minus of the subject pitcher to be counted. This was to prevent unequal levels of regression to the mean from warping things, likely to the subject’s detriment. (One assumes a pitcher getting heavy duty in October will have performed well that season, and thus would be due some negative regression.)”

Love that you thought of the selective sampling/regression issues and love that you went with a “matching pairs” study. I love those!

Now back to reading the rest of the article!

MGL

9 years ago

Eh, not so thrilled anymore. I don’t think you can conclude ANY cause/effect relationships. I am surprised that the results are that close, but I am not surprised that they are in the direction you found (in favor of the workhorses).

(BTW, I am not a big fan of eliminating outliers like you did (for ERA-) in these kinds of studies. Most of the time, the mean is important and not the median.)

Even though you created a comp group which was good, there is almost no way around the most important selective sampling issue. That is, the pitchers who pitched in the post-season were “selected” for being healthy at the end of the season. The comps were not. It is likely that many of the comps were not healthy toward or at the end of the season and that is why they performed slightly worse the next year than the workhorses.

In fact, it is entirely possible that throwing those extra innings in the post-season DID create a worse performance the next year. You have NO way of knowing that without forcing half of them to not pitch the post-season (even though they were healthy enough to do so) and half of them to do so.

Let me give you an example of how the cause/effect relationship can be almost anything:

Let’s say that all your post-season pitchers were healthy enough to pitch in the post season which is presumably the case. Let’s say they pitched at an FIP- of 90 in the reg season and that their true talent was 92 and that is exactly what they are supposed to pitch in the next regular season. Now, let’s say that pitching in the post-season is a problem such that they pitch to a tune of 94 in the next season. IOW, they would have pitched at 92 had they not pitched in the post-season (and gotten worn out or possibly a little injured).

Now you have your comp group who are also at 90 for the reg season and their true talent is 92 also. But, some of these guys are worn out and injured already and we don’t know that, such that they could not have pitched a lot in the post-season even if they wanted to. That has to be the case. So they are going to pitch at 94 or 96 or 98 the next season collectively. The ones who could have pitched in the post-season had their teams made it, would pitch at 92 (their true talent) the next year. The ones who are worn out or injured (BTW, pitchers who get hurt pitch worse in the future other than TJ pitchers) pitch at 100 or something like that, so overall they pitch at 94. Again, the ones who pitched in the post-season are presumably not worn out or injured.

Basically, those pitchers who would have been in your “post-season workhorse” sample, but for the fact that they were worn or injured and could not pitch in the post-season (like the aforementioned Strasburg) are eliminated from your post-season sample creating “hidden” biased sample as compared to your comps.

So again, given any hypothesis, the results could go any way because of this bias, so I don’t think that you can conclude anything in cause/effect.

One thing you might do to see if in fact my bias theory is true is to look at pitchers on the same playoff teams who had the same requisite stats but did not pitch in the playoffs and include them in your post-season sample! If there are none, then I am wrong. If there are some but they pitch just as well as the post-season guys, then I am also likely wrong…

McKay

9 years ago

Reply to MGL

MGL, you’re not wrong! At least with regard to causality and this design’s inability to assess it. No further evidence necessary.

That said, I agree that the matched-pair design is a strength. The data points this study produce are interesting if not causal.

The slight advantage observed for the playoff pitchers may be hinting at an underlying mechanism.

Shane, did you happen to run an analysis of variance? Less interested in the P values than the effect sizes… Judging based on means along is… difficult.

Removal of outliers is perhaps a contentious issue, but I like seeing both analyses.

Overall, excellent work! Perhaps a safer conclusion is that the pitchers chosen to shoulder heavy playoff workloads have managed to maintain at least as much of their previous season performance as matched-controls.

Jacklyn

8 years ago

Reply to McKay

Does anybody even know what Zionist means? No.Zionism ï»¿ (ËÉÉˆÉªÉ™ËŒnzªza™m) â€” n1. a political movement for the establishment and support of a national homeland for Jews in Palestine, now concerned chiefly with the development of the modern state of Israel2. a policy or movement for Jews to return to Palestine from the Diaspora

http://www./

8 years ago

Reply to McKay

Keep these articles coming as they’ve opened many new doors for me.

Robbie

8 years ago

DuÅ¼a roÅ›nij Radomska! Twoje dzieciÄ™ jest juÅ¼ celebrytÄ…, wiÄ™c czego chcieÄ‡ wiÄ™cej?! No moÅ¼e niech przyniesie matce jakie czerwone ferrari i dezodorant za 3 zeta z kiosku, a co! Jak szaleÄ‡ to szaleÄ‡! Niech Ci w tramwaju nikt z rana (ani z poÅ‚udnia) nie chucha czochem. Niech kierowca w koÅ„cu odpowie na dzieÅ„ dobry! No i Radomska kochana, niech ci nie zabraknie wÄt…›liwoÅpci! CaÅ‚ych wagonÃ³w wÄ…tpliwoÅ›ci ci Å¼yczÄ™! ObiecujÄ™ teÅ¼ solenie wÄ…tpliwoÅ›ci kolekcjonowaÄ‡ i posyÅ‚aÄ‡ Radomskiej przepasane wstÄ…Å¼kÄ…! Tak mi dopomÃ³Å¼ Kung Fu Pando! Howk! p.s. no to co, wÃ³dka?

http://www./

8 years ago

Ã…h, nÃ¥ mÃ¥ jeg finne frem Narnia-serien igjen. Det er virkelig fantastiske bÃ¸ker, og jeg har samme opplevelse av den kristne analogien som du. Sterkere enn mye annet "relliÃ¸st" jeg har lest.

kann man kfz versicherung sofort kündigen

8 years ago

It’s about time someone wrote about this.

kfz versicherung günstiger mit kind

8 years ago

I’m sorry, but my first reaction is “pathetic”. If you are going to address what people are saying about you…then you should KNOW what they are saying about you. I don’t think that yours or spunky’s reviews have been personal attacks at all…but you’ve just been personally attacked. You go girl!

BAL	CHW	ATH
BOS	CLE	HOU
NYY	DET	LAA
TBR	KCR	SEA
TOR	MIN	TEX

ATL	CHC	ARI
MIA	CIN	COL
NYM	MIL	LAD
PHI	PIT	SDP
WSN	STL	SFG