The continued misuse of FIP

by Derek Carty
June 11, 2010

About a year ago, I pleaded for baseball analysts (particularly fantasy analysts) to stop using FIP in forward-looking analysis (and then expanded upon my point a month later). In the year that has since passed, FIP has largely (and rightfully) been replaced by xFIP.

Despite the strides that have been made in the use of FIP and xFIP, you will still occasionally see them misunderstood or used incorrectly. I don’t pen articles every time this happens because I think that I (and others) have covered it pretty fully in the past, but a couple of readers pointed out an article that was published earlier this week at ESPN Fantasy that I figured I would comment on since it has received some attention (Tom Tango also responded to it here).

In the article, A.J. Mass is critical of FIP and tries to show how it may not be all that it’s cracked up to be, but his methods are flawed:

Proponents of FIP would have us believe that if a pitcher’s ERA is far lower than his FIP, we should expect a regression the following season. Similarly, if a pitcher has a higher ERA than FIP, then he was probably more unlucky than anything else, and due for a bounce-back campaign. So how does that play so far in 2010? Let’s go to the leaderboard and see:

[2010 ERA Leaderboard graphic]

Certainly the season is not over yet, but even though every single one of these current ERA leaders who pitched in the majors last year has a lower ERA in 2010, only four were “predicted” to do so.

This analysis is riddled with selection bias. By looking only at the league leaders in ERA, you’re guaranteeing that the vast majority will have an ERA lower than their FIP. Why? Because they’re overperforming! They’re statistical outliers in a small sample size. Are we really expecting Jaime Garcia to post a 1.32 ERA or Ubaldo Jimenez to post an ERA under 1.00? I certainly hope not.

Nobody’s skills are that good, so how could we possibly expect FIP to say that they are? By no reasonable standard could we have predicted any of these pitchers to have an ERA as low as they currently do. Reject fielding independent pitching stats if you want (you’ll still be wrong, although you’re welcome to do so), but don’t do so on the basis that they can’t predict Livan Hernandez to post a 2.22 ERA through June 10 unless you can show me a method that can.

Also, as a minor point, we should note that FIP isn’t a projection. It’s not necessarily a predictive stat, though it is often treated as such because it is more predictive than ERA.

Mass continues:

I suppose one could argue that Livan Hernandez could well finish this year with an ERA of 4.00 and satisfy both the current prediction that he’s due for a regression this season, as well as the prediction that he would better his 5.44 ERA from 2009, and use that as “proof” that FIP works. It seems to me, however, that this particular use of FIP is misguided.

Let’s say a pitcher’s FIP does indicate that his ERA “should” be lower than it is, because his defense has let him down. Well, if his defense isn’t changing — meaning he’s going to continue to have the same basic starting lineup behind him for the rest of the season — then why should we expect a change in its impact on his ERA?

Because, like anything else, we’re looking at a finite sample. In two months of a season, we can’t say with absolute certainty how good (or bad) a particular defense is, and it certainly won’t manifest itself in 75 innings for a particular pitcher. Considering how unstable BABIP is (the primary way in which defense manifests itself in a pitcher’s line), expecting a pitcher’s BABIP through June 10 to match his BABIP at the end of the season is misguided.

This argument becomes even more absurd when you realize that pitchers on the same team rarely post identical BABIPs. By this article’s logic, though, they should, since they have the same defense behind them. That, however, is simply not the case. For example, Wandy Rodriguez currently has a .354 BABIP while teammate Roy Oswalt is rocking out to a .278 figure.

Also, this statement fails to realize that FIP does more than just strip out the effects of defense. It also helps to eliminate luck, simple random variation. There is simply too much that happens when a cylindrical bat meets a spherical ball that will land somewhere in a 100,000-square-foot playing area for BABIP to perfectly encapsulate the concept of “defense.” Fielding independent stats remove both the portion of BABIP that is defense and the portion that is luck.

Continuing on …

A Hardball Times Update

by RJ McDaniel

Goodbye for now.

The following table [2010 FIP Trailers] shows us the pitchers who have the least control of their own fates. As such, they fall victim to bad breaks and balls hit just out of the reach of a diving outfielder far more than the previous list [2010 FIP Leaders]. Some of these guys may indeed simply be bad. Others, like Ian Kennedy and his .229 batting average against, might be in for a rude awakening as the summer drags on.

More important here than having “the least control of their own fates,” these pitchers are simply bad. They’re prone to “bad breaks” to the same extent that good pitchers are—it’s just that they allow more balls in play that they can get lucky or unlucky on. They don’t “fall victim to … balls hit just out of the reach of a diving outfielder” more than the pitchers with good FIPs. Once a ball is put in play, there is minimal difference between one allowed by a good pitcher and one allowed by a bad pitcher (as judged by FIP). It’s left up to the fielder and to chance in both cases.

Finally, because Mass uses FIP instead of xFIP, he draws an incorrect conclusion about Ian Kennedy—that he is pitching poorly. Sure, his 3.17 ERA is way too low, but his 4.30 xFIP is a half-run lower than his 4.80 FIP and plenty valuable in deep mixed and NL-only leagues.

Concluding thoughts

That’s all for this week. I’m sure this is review for many of you, but I know that THT Fantasy has welcomed a lot of new readers since I last discussed FIP a year ago, so I thought it’d be a good idea to go over some of the misconceptions and misapplications of it. If you have any questions, as always, feel free to let me know.

13 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Aaron

15 years ago

Nice stuff, Derek. I find a lot of people who are new to deeper analysis lean a bit too much on FIP. Usually it’s pretty easy to just let it go because at least they’re trying but for an article like the one you’ve critiqued it’s important to step in and make sure that people aren’t being misled by the misuse of the numbers.

Josh

ESPN’s fantasy analysis is a joke. Other than Cockcroft I don’t think I read anyone else’s articles for any purpose other than to learn what the “marks” are thinking.

Briks

Derek, I know you like LIPS ERA even better than xFIP. Is LIPS ERA still not publicly available at this point?

Fantasy Alpha

AJ Mass? He is why the statement, “Don’t argue with idiots. They only bring you down to their level where you will be out-gunned.”

Jeffrey Gross

@Fantasy Alpha, I couldn’t have said it better

Scott

AJ Mass is bush league. He’s argumentative and rarely has a useful viewpoint. Good article.

Derek Ambrosino

Great piece, Derek. Take some hilarious egregious potshots and throw in some pedantic tangential metaphors (not an insult, btw) and we’d have a FJM resurrection on our hands.

BTW, who the hell is AJ Mass?

/Greg Maddux’d

But, this is clearly one of those situations where an “expert” has just enough info/knowledge to mislead people, but not enough to draw and informed conclusion to contribute to a healthy public discussion. This might happen in baseball analysis even more than it does it politics.

Robert Boden

Further reason FIP/xFIP are flawed…having a high BABIP gives you more opportunities for strikeouts, which in turn gives you more strikeouts per inning, which helps your FIP/xFIP. Making these not truly defense independent stats. Players like Brandon Morrow, and Justin Masterson sporting massive BABIP’s this year are getting a nice little boost to their xFIP from it.

John Blocker

I”m LOST, I”ve followed baseball for 60 years but I have no idea what all of the letters mean to describe a statistic

Brad Johnson

Derek,

People expect political “experts” to mislead them. They are less prepared for misleading baseball analysis. At least that’s my opinion.

philosofool

@Robert Boden

Don’t be silly. The correlation between pitcher BABIP and K/9 is zero. I just ran last season to make sure. r =.029. Nothing correlates between BABIP and strike out rates. (Remember, additional batters faced are also chances for more walks…)

Eric M. Van

@philosofool

That’s just plain wrong. You’re making the same mistake Voros did originally—just looking at one year.

In fact, there’s a profound correlation between team BABIP and K/BFP, just part of the mountain of evidence that differences in BABIP skill are real and far from insignificant.

Manny Delcarmen has a .163 BABIP this year and after having watched, scorecarded, and analyzed every PA I figure his true BABIP, with average defense and luck, to be somewhere around .250, which is massively better than league average—worth (for him) about 1.00 of ERA, in fact. BABIP has to be regressed a ton to the mean but the belief that it has to be regressed all the way is entirely unsupported. And FIP and xFIP are crucially important but they are not, in fact, the truth.

the Flint Bomber

I think Mass is pulling a free paycheck from ESPN, but I do like Brendan Roberts. I was disappointed when Cockroft took over Hit Parade.

BAL	CHW	ATH
BOS	CLE	HOU
NYY	DET	LAA
TBR	KCR	SEA
TOR	MIN	TEX

ATL	CHC	ARI
MIA	CIN	COL
NYM	MIL	LAD
PHI	PIT	SDP
WSN	STL	SFG