Criminals of WAR
I’m not going to single anyone out, since we’re all guilty of abusing FanGraphs’ Wins Above Replacement metric. But I’ve been seeing cases pop up where it’s getting out of hand. So I’ve set up a few guidelines for how to go about using WAR responsibly. Do not break these rules, or I may call you out.
1. Do not exclude baserunning from a position player’s WAR. I’m sure David Appelman will include baserunning in the next edition of WAR, since it’s so easy to calculate, but the numbers are already out there, so please take the time to go to BP, B-Ref, or BJOL to look up the numbers and tack them on.
2. Do not place undue trust in WAR for catchers. How much of a catcher’s value do you think is in his defense? I’ll give you a hint: it’s a lot. FanGraphs has unfortunately yet to give an effort to quantifying this vital aspect of the game, other than with the positional adjustment. In fact, catchers should possibly be considered a separate group of players with a separate replacement level and therefore be treated as different from all other position players.
3. Do not place undue trust in WAR for pitchers. First off, pitcher defense and hitting aren’t included. This should be righted ASAP. Then there are the more nuanced issues like how leverage is accounted for and the conversion of FIP to runs. Personally, I’d trust the calculations of David Gassko’s pitching runs created or StatCorner’s WAR well before I would FanGraphs’ WAR.
4. Do not cite WAR as a measure of skill. WAR measures production. FanGraphs has a lot more granular data if you’re trying to assess skill. And if you’re going to try to make a projection of WAR, regress each component individually. Also, players with negative WAR still may have value if they excel at a certain skill that can be leveraged.
5. Do not use the linear conversion of WAR to salary to determine what a team should be willing to pay a free agent. Every team has a different scale, depending on that team’s market and where the team is on the win curve. Few teams should pay $5 million for a single win.
I’m sure there are other commandments I’m missing, so feel free to add your own.
But FIP is production too: strikeouts, walks and home runs allowed. It’s a subset of ERA and it’s more constant, or more predictable, than ERA, but it’s still production.
#5 is my personal favorite. I think the salary figures on Fangraphs are good guidelines, but that’s all they are. They’re the beginning of a good discussion about what a team should pay a player, not the end.
Regarding #5, Vince Gennaro’s Diamond Dollars explains brilliantly why some teams would pay different $ depending on their current and expected win total. People tend to forget (I’m a culprit as well) that the marginal win changes for teams, even for the same team in different years. A middle-rotation starter is not worth much to the Pirates, as it will not mean much in terms of increasing playoff odds, but a wildcard team may pay much more for the same pitcher, depending how many (or few) wins the team needs in order to make the playoffs.
<>
I don’t mean to get snippy, but this was a completely unsupported statement. tRA has been shown to be barely better than FIP. Neither PRAA or tRA WAR on stat corner adjust relievers for leverage to my knowledge. They don’t have dynamic run converters for starting pitchers as far as I know either. Also, none of the pitcher WAR include defense for pitchers, which is going to be pretty minimal, and you can always look up a pitcher’s WAR on offense (which we do calculate) and add it to their defense.
What exactly is there not to “trust” about pitcher WAR on FanGraphs again? We have a really long series about how WAR is calculated exactly for pitchers. If you don’t like FIP used in a WAR calculation, then you shouldn’t like tRA either. All in all they’re pretty comparable and if you’re not going to trust one of them, you may as well not trust any of the others either because their methods of calculation are generally similar.
“Personally, I’d trust the calculations of David Gassko’s pitching runs created or StatCorner’s WAR well before I would FanGraphs’ WAR.” was the statement I was trying to quote for the post above.
Thank you ever so much for point number 5. Beyond the obvious that the Yankees will pay more for one WAR than the Pirates, there is so much more to the economics. A decent shortstop or a middle of the rotation starter is worth much more to the Twins who may see it as the one piece they are missing than it is to the Royals who need so much more to even be worth noticing. It makes sense for the Twins to overpay for that, but not the Royals.
@dkappelman: I think he’s referring to issues like the one raised in this article: http://mobile.beyondtheboxscore.com/2009/10/28/1104776/ricky-nolasco-4-war-or-1-war
Dave, thanks for commenting.
The entire way that we’re thinking about adjusting relievers for leverage might be flawed. The purpose is to find out the value of the pitcher, isolated from the context in which he pitches. The two best ways to do this are to either not account for leverage or to assign every single pitcher a “deserved” leverage index, including starters, based on the optimal average LI he should pitch in, independent of his actual LI. StatCorner doesn’t account for leverage, which I’m fine with, and PRC does pretty much what you need by adjusting the pitcher’s run environment.
I don’t see why you’d say that defense is pretty minimal for pitchers. I’d guess a good fielding pitcher is worth five runs a year and a bad one worth negative five. It all adds up.
I might be wrong about StatCorner’s WAR. I’ve never seen them write up their methodology to it, but I’ve been under the assumption they use the regressed version of tRA, and not tRA. If my assumption is false, I stand corrected.
I understand you guys calculate a pitcher’s WAR on offense, and I should have mentioned that it’s available on FanGraphs. This was an indictment on people who cite WAR for pitchers without including offense, not on FanGraphs, which does have the data available.
The number one argument for the value of catcher’s defense is probably just the defensive spectrum. It’s clear that we haven’t got a handle on how to measure a catcher’s defensive production, but there are a lot of little clues coming forward.
I particularly liked the study that showed that Piazza was a plus defender at blocking pitches in the dirt and was able to reclaim some of his defensive worth for his poor throwing. Stuff like that helps make clear why he was kept at catcher for so long.
Jeremy, thanks for clarifying.
Starting pitchers on FanGraphs are not leverage adjusted on FanGraphs, because except for some strange cases, they’re all going to have an average leverage of 1 anyway. So there’s really nothing to complain about here, they are completely context neutral.
Relief pitchers use a regressed gmLI leverage adjustment. So it only accounts for the situations they were used in. I see what you’re saying about optimal average LI, but WAR, like you said is not really predictive (with FIP maybe a little more so), so I’d say adjusting for leverage in the situation they actually pitched in does make sense. For what it’s worth, the adjustment is not huge. I think at most we’re applying a 1.5 LI adjustment once it’s regressed, because you’re not going to find any gmLI a whole lot greater than 2. Or, on the other side, nothing more than really .75. This could certainly slightly devalue relievers who are good but not optimally used.
On the defense. These guys are really only out there for 200 innings. I agree with the -5 to +5 range, but this still makes up a relatively small part of a pitcher’s value. It’s not like position players where how a player plays defense is going to drastically change his perceived value.
I’m not sure I see how the regressed version of tRA vs the non regressed version of tRA really changes the comparison to FIP they’re both going to be similar. I think the regressed version of tRA is going to water down the home run impact a little more than FIP will, but I may be wrong about that. We also use FIP in the WAR calculations because it completely takes out the defense, which then makes adding numbers up across entire teams work nicer so we’re not double counting defense somewhere.
Even in that Nolasco case, we say he’s at 4.2, statcorner say he’s at 3.5. If you just take it on runs (leave out the dynamic win conversion) we think he’s at just about 3.6. Better pitchers in FanGraphs WAR will be even better because of the dynamic run to win converter because the way they inherently lower the run environment. You can always just look at the runs and divide by 10-ish if you want to see what things would look like without it.
I understand starters have the same average LI. My point is that all pitchers are from the same group of players. They (starters and relievers) shouldn’t be treated differently. I’m not smart enough to come up with the correct metric, but I’d imagine the replacement level (or whatever you want to call it) would be fluid, based on the expected outs per outing, and the deserved leverage index would be fluid, based on the expected outs per outing as well as the pitcher’s run environment.
The regressed version of tRA tries to account for everything the pitcher controls, and dismiss everything he can’t control. That’s the purpose of WAR, no?
Statcorner calculates WAR using the straight park adjusted tRA. The regressed tRA is “just for show”.
I agree that PRC is probably the best, mainly because it uses a dynamic run estimator I believe.
Regressed tRA just regresses tRA to league average based off of the sample size. So a 6.00 tRA will be around a 5.2 tRA*, or something.
Nick, I don’t know where you’re getting any of that. If you have an explanation of StatCorner’s pWAR, please pass that along. You’re off on your assessment of tRA*. It’s park adjusted and every component that goes into tRA is regressed individually. Homer Bailey had a tRA of 7.64 and tRA* of 5.03 while Miguel Batista had a tRA of 7.93 and tRA* of 6.58.
“My point is that all pitchers are from the same group of players. They (starters and relievers) shouldn’t be treated differently. I’m not smart enough to come up with the correct metric, but I’d imagine the replacement level (or whatever you want to call it) would be fluid, based on the expected outs per outing, and the deserved leverage index would be fluid, based on the expected outs per outing as well as the pitcher’s run environment.”
Well, back to the original point, which is your complaint about leverage not being applied properly or on a sliding scale, if you’re making the same argument about replacement level in general, then it seems like we both agree there should be leverage adjustment for relievers, but you’re just skeptical of the way it’s being applied in FanGraphs WAR?
At least we’re making some adjusting for leverage and relievers leverage wise and skill wise are used somewhat properly, so it’s not like the system we use is going to be really out of whack. Sure some relievers may be docked ever so slightly because they’re in the setup role instead of the closer role, but I don’t think those differences are going to be material in the vast majority of cases.
I don’t see why wouldn’t treat relievers and starters different. They’re two different roles, and then leverage is essentially applied to further define the role of the reliever.
I think all these systems have their merits and potential drawbacks, but I just thought it was particularly unfair of you to single out FanGraphs WAR for pitchers and say more or less, it’s untrustworthy.
Otherwise, I agree with everything you’re saying in the article. Like all these stats, none are perfect or should be used in a vacuum, but I guess if you’re going to pick one, you could do a lot worse than WAR.
“Like all these stats, none are perfect or should be used in a vacuum, but I guess if you’re going to pick one, you could do a lot worse than WAR.”
Well said. So many flame wars could be avoided if people would follow that simple logic.
Not to be a noob, but what is BJOL? And where are the baserunning stats at B-Ref?
Jeremy – I’m saying the version of WAR that’s shown at StatCorner is calculated using tRA, which is park adjusted by NOT regressed. tRA* is the regressed version, and that’s not used for anything in particular. That’s why the statement I quoted from you above is confusing.
Firpo, Bill James Online, and if you go to a hitter’s page on Baseball Reference and scroll down, there’s a section with baserunning stats.
Dave, I appreciate you taking the time. I’m realizing I overstated my case when it came to pitcher’s WAR. The other points were all pretty much fact, and the non-defense/fielding arguments against pitcher WAR are based on theory.
However, I still don’t think you should treat relievers and starters differently because they’re from the same group of players. Think of it in terms of positional adjustments. A one inning pitcher (think of innings in terms of expected outs) gets a negative positional adjustment because of the lack of scarcity and lack of difficulty at the position. A six inning pitcher gets a higher positional adjustment. And it’s all fluid in between.
Same with leverage. A one inning pitcher with a great FIP has a high deserved leverage index, but a six inning pitcher with a great FIP should have a higher LI too, since theoretically he could be brought in the fourth inning. Value should be independent of how a manager uses his players, and only be based on what the player was able to control on the field.
WAR is the best out there, and we all know that. But some people are missing its limitations, like baserunning, catcher defense, pitcher hitting/fielding, quality of opposition, and these aspects of the game need to be mentioned.
Thanks for writing this. A lot of this needed to be said. It drives me nuts to see people say “So-and-so was worth $13.6M last year” as if that is an etched-in-stone fact. That is really sloppy thinking.
Also, to Dave Appelman: I read and appreciate Fangraphs, but I didn’t interpret the main thrust of this article as anti-Fangraphs as much as I read it as a plea for people to think more critically about and be more careful with what they read there.
Re: the point on catcher defense, Tango said that the difference between Piazza/Pudge (worst and best of all time) was only 20 runs. RJ Anderson cites it here ( http://www.beyondtheboxscore.com/2009/1/30/740437/rambling-on-catcher-defens ), but I don’t remember the original source nor do I have time to find it now. The point being, if the difference between the best and worst of all time is only 20 runs, chances are we’re not missing by a whole lot. We should be able to do pretty well as long as we make a mental adjustment based on what we know about the player, his age, his arm, etc.
Hear, hear. Took a stab at starting the salary discussion here:
http://saberrattling.wordpress.com/2009/11/12/money-matters-what-do-teams-pay-per-win/
Would appreciate input. Thanks!
I’m unsure of #5. I should probably read the book someone else mentioned before commenting, but oh well. Basically, we’re not concerned about the marginal value of a win to a specific team as much as what the market will pay for them. If the Pirates won’t pay ~$4.5M for a win…well, the Pirates won’t be landing any free agents, other teams that will pay more will get them. I guess I could see a price discrimination type supply and demand setup, though, where some teams can pay more and some teams can pay less, and with a limited number of available positions/roster spots, there will still be supply left over for the teams that pay less…
99 times out of 100, the stat nerd and the grizzled veteran manager will come to the same conclusion as to who are the best players. Where it really gets interesting is when you’re talking about the mediocre to worst (yet still essential) players. The problem then becomes which stat or set of stats to value.
1. wOBA/WAR does include SB and CS, which is what most people think of first when they think of baserunning. There are subtler aspects, such as taking an extra base, that are missed for now, but they tend to not amount to a large run value. I do recommend B-Ref though for the XBT% (% extra base taken) and RS% (% runner scored). Also, in the situational stats, look at BRS% (% of base runners scored by the hitter) and advances relative to average for for hints of other possible skills not captured by WAR.
2. Agree a useful measure of catcher defense is lacking, and thus it’s hard to produce a useful overall WAR. But as for using a separate replacement level, that is exactly what positional adjustments do. There is no need for any further adjustment there.
3. It would be nice to have defense and hitting all included in one place for pitchers. The other issues apply more to relievers, rather than SP. I always liked the BP expected runs report, which adjusts for things like inhereted and bequeathed runners, for considering how truly effective a reliever was (in a context dependent way).
4. WAR really seems to be mostly a measure of skill. Yes, it is based on data like actual HR, SO, BB, etc. But the measures emphasized are those which correspond most closely to the most important underlying skills (abilities to hit/avoid HR, get/avoid SO, get/avoid BB, etc.) It is not a projection system. But it’s designed to be maybe the best raw material for a good projection system.
For projection purposes, one of the biggest mistakes I see made is making too much of only one season’s WAR. If the object is to get at underlying skills, you want as large a sample as possible. The next biggest mistake I see made isn’t lack of regression, it’s lack of considering aging curves. If you have a large enough multi-year data sample, and apply appropriate aging curves, you really aren’t going to need a lot of regression for a decent projection. For players older than about 32 or younger than about 26, at least, these errors seem to be at least as significant as lack of regression. Smaller samples do require more regression, but they still aren’t that reliable even with the regression.
5. I think the salary conversions seem to work very well as measures of a players baseball value, which really is about what they should be paid. Team specific conditions tend to not change that much. Wins in baseball are a largely fungible commodity. Most players can help more than one team. Most teams have more than one place they can add wins. Guys may get paid more in some regions, but that really only requires a regional COLA adjustment. Likewise, I suppose a Canadian team might want to convert to Canadian dollars. It doesn’t really change the underlying valuation.