Jonathan Papelbon and Replacement Level

by David Gassko
September 18, 2006

Let’s talk about replacement level for a moment. Almost everything there is to know about replacement level is actually contained within this article, but I want to discuss one specific aspect. It’s called chaining, and it’s one of the most important and overlooked concepts in baseball player evaluation (and just a note, I’m as guilty of this as anyone).

Before we look at chaining, let’s remind ourselves why we want to use a replacement level baseline in the first place. Well, replacement level allows us to weigh a player’s playing time and his overall performance. It allows us to perform a cost/benefit analysis: How much is a player contributing over a free replacement?

So generally what baseball analysts do to calculate a player’s contribution over replacement is to take his performance, subtract from that what a replacement-level player would do and then multiply that by playing time. For example, a first baseman might create .25 runs per out over 450 outs, while a replacement-level first baseman would create .15 runs per out. So our first baseman is (.25 – .15)*450 = 45 runs above replacement.

But wait, let’s pause for a second. If our first baseman is out for the season, the replacement player will only replace him on the roster, not in the lineup. The best hitter on the bench will play instead of the first baseman, and the replacement player will replace that hitter. So we have to re-do our analysis.

Let’s say that the bench player creates .18 runs per out. So he’ll replace our first baseman, and create (.25 – .18)*450 = 32 runs less. Meanwhile, the replacement player will replace our bench player’s 75 outs, and be (.18 – .15)*75 = 2 runs worse.

Overall, the team has just gotten 34 runs worse (that’s 32 + 2), not 45! And that, in a nutshell, is the concept of chaining.

If you don’t like numbers, I’ll put that into words instead: When a player goes down, his replacement in the lineup is generally a better than replacement-level player. Thus, to calculate runs above replacement, it is incorrect to simply subtract a replacement level baseline from a player’s performance and multiply that by playing time. By doing that, we end up overstating the player’s value.

So why am I telling you all of this? Couldn’t I have just provided that link (which you should really click on—it’s tough stuff but essential to understand) in the first paragraph and been done? Well, no.

The thing with chaining is that even if we don’t apply it to position player analysis, we’re not losing much because everyone is still being compared to the same baseline. The first baseman’s value might be overstated by 10 runs, but so is everyone else’s. But when it comes to closers, we’re looking at a whole different ballgame.

In a recent article, Nate Silver of Baseball Prospectus weighed in on whether or not Jonathan Papelbon should be moved into the starting rotation next season. Using replacement level analysis, he concluded that Papelbon would have to post a 3.70 ERA in 200 innings as a starter to have the same value as he would with a 2.00 ERA in 75 innings as a closer.

How is that possible? Well, let’s go through the math. Nate assumed that a replacement-level reliever would post a 4.00 ERA, which would make Papelbon worth about 17 runs above replacement as a reliever. However, Nate noted, because a closer’s innings are worth so much more than an average inning (see Tom Tango’s articles on Leveraged Index for the why), Papelbon is worth more like 29 runs above replacement.

If a replacement-level starter has a 5.00 ERA (players are about one run worse starting than they are in relief), then to be 29 runs above replacement in 200 innings, Papelbon would have to post about a 3.70 ERA, since (5.00 – 3.70)*200/9 = 29.

Sounds good, right? Not if we remember chaining. Let’s look at this a different way. Here’s a model of an average six-pitcher bullpen:

RA	LI	IP	RAR
3.00	1.70	80	30
3.40	1.20	80	17
3.80	1.00	80	11
4.20	0.80	80	6
4.60	0.70	80	2
5.00	0.60	80	0

My model differs from Nate’s in a few ways, none of which are particularly substantial. I assume that the closer will have a 1.70 leveraged index instead of 1.75. I assume that each reliever will pitch 80 innings, instead of 75. And I assume that the league RA is 5.00, with a replacement-level RA of 6.00 for starters and 5.00 for relievers. Using all these assumptions, we can calculate the runs above replacement for an average bullpen, which I’ve put in that last column. Our average bullpen will be 66 runs better than replacement-level.

Now here’s what happens if you replace the closer with a replacement-level player:

RA	LI	IP	RAR
5.00	1.70	80	0
3.40	1.20	80	17
3.80	1.00	80	11
4.20	0.80	80	6
4.60	0.70	80	2
5.00	0.60	80	0

Suddenly, the bullpen is only 36 runs above replacement, which makes an average closer 30 runs better than replacement, since the only thing we’ve done is sub-in a replacement-level player for him, and the bullpen has gotten 30 runs worse. Papelbon, who is better than your average closer, would come in at 38 runs above replacement if we assume that he would have a 2.50 RA as a closer, which is in-line with Silver’s assumption (since I’ve set the league to be slightly higher-scoring and am using runs instead of earned runs).

A Hardball Times Update

by RJ McDaniel

Goodbye for now.

But wait, don’t forget about chaining. What would actually happen is that the setup man, who is still a pretty good reliever, would slide into the closer’s slot, the next-best reliever would become the setup man, and so on. Here’s what the bullpen would actually look like if the closer was lost for the season:

RA	LI	IP	RAR
3.40	1.70	80	24
3.80	1.20	80	13
4.20	1.00	80	7
4.60	0.80	80	3
5.00	0.70	80	0
5.00	0.60	80	0

That’s 47 runs above replacement, or only 19 runs worse than the bullpen would be with the closer! So really, instead of being worth 30 runs above replacement, as a traditional analysis would tell us, your average closer is worth only 19 runs better than replacement. And Papelbon is worth 27 runs over replacement, not 38.

So how does that impact our analysis? Well, let’s do the math. For a starter to be 27 runs better than replacement in 200 innings, in a league where the replacement-level for starting pitchers is a 6.00 RA (and average is 5.00), he would have to have an RA of about 4.75. If we don’t account for chaining, we end up expecting an RA of 4.25. That’s a huge difference!

In fact, a 4.75 RA is like a 4.30 ERA. A 4.25 RA is like a 3.80 ERA (similar to Nate’s answer, as you can see). A 4.30 ERA is just 5% better than average. A 3.80 ERA is 20% better.

So in other words, a great closer, one who posts an ERA that is half of the league average, is equivalent to a starter who is just 5% better than average!

Papelbon should be in the starting rotation. And in discussing replacement-level, we should pay more attention to chaining.

BAL	CHW	ATH
BOS	CLE	HOU
NYY	DET	LAA
TBR	KCR	SEA
TOR	MIN	TEX

ATL	CHC	ARI
MIA	CIN	COL
NYM	MIL	LAD
PHI	PIT	SDP
WSN	STL	SFG