More on standard deviation and ERA estimators

In October, I introduced an ERA estimator called predictive FIP (pFIP). The statistic was a modified version of the original Fielding Independent Pitching statistic(FIP) that was meant to predict future performance rather than describe performance.

Predictive FIP was highly correlated with future ERA (or runs). Unfortunately though, as I pointed out earlier this month, the spread of the projections for individual pitchers was much smaller than any other commonly accepted ERA estimator or projection system.

The pFIP equation, as it stood, led to ERA projections that were tightly centered on the mean ERA. This leads to a really high overall correlation, but fairly useless individual projections that are not close to a real reflection of a pitcher’s true talent level.

Instead of scrapping the metric completely, I decided to manually force the spread of projections to be larger.

This meant putting more weight on the measures of a pitcher’s skill (homers, walks, strikeouts) and in turn taking away from the component that regresses the projection to the mean, the constant term.

In the comments of my latest piece, MGL gave some great insight into the size of spread I should be looking for:

Honestly, the best thing to do is to do a rough estimate. For example, my experience in doing projections for 20-some odd years and in closely watching and analyzing baseball for almost 30 years is that true talent is right around league average plus or minus 1.5 runs, which happens to be a SD of true talent of around .5 runs! So you did actually get around the right answer—sort of by semi-accident though!

Thanks to MGL, I set out with the goal of creating a pFIP equation that would result in something similar to a true talent of the population spread of ERA projections, with a standard deviation of ~0.5.

I began with the original pFIP equation (ERA version):

pFIP = (18.5*HR + 6*BB – 8*K)/TBF + 4.75

At first, I simply guessed and checked by multiplying the different weights by random constants, to see what combination of weights would lead to a larger spread, yet still be fairly similar to pFIP’s original weights.

Quite interestingly, this is the new pFIP equation that I came up with:

pFIP = (20*HR + 10*BB – 10*K)/TBF + 4.60

This new equation is fairly similar to the previous one; however, the skill components get slightly more weight. What made this equation both interesting and ironic was an exchange that I had with Tom Tango, the inventor of the original FIP statistic.

When I first came up with the idea of pFIP, I emailed Tango to get his thoughts on the pursuit. Here was Tango’s original response:

(FIP’s) 13,3,2 (weights) are “descriptive,” if by that you mean it correlates Year T BB, SO, HR to Year T Runs.

If you wanted BB, SO, Hr to be “predictive” (correlating those components in year T to year T+1 runs), they would have different weights. My guess is that it would be something like 2,1,1 or maybe 3,1,1 for HR, SO, BB, respectively. This is because you would regress HR the most and K the least.

Tango essentially came up with the strongest version of the pFIP equation off the top of his head, months ago. It is amazing to me, after months working with the statistic, that he was understood what pFIP should look like almost instantaneously.

Another interesting note about this version of the metric is how similar it is to another powerful ERA estimator, kwERA:

pFIP = 4.60 + 10*(2*HR +BB – SO)/BF

kwERA = 5.40 – 12*(SO-BB)/BF

kwERA is elegant in that it is both simple and a very powerful predictor of future ERA. pFIP was originally modeled after kwERA, but was a more powerful predictor, because it included home runs within the equation.

This new pFIP equation is right in line with kwERA in that it is extremely simple, and more importantly now has a much wider spread.

Does this new equation still have the same strong predictive power?

In the piece that originally introduced pFIP, I showed that over the years 2004-2012, for pitchers who threw at least 120 innings in Year X and at least 100 innings in Year X+1 that pFIP was more highly correlated with future ERA than other established estimators, such as kwERA, FIP, xFIP and SIERA.

My goal in this article is to attempt to duplicate those results and show that pFIP is still the strongest predictor, even with the wider spread of projections.

I modified the test slightly, though, by removing kwERA and adding in ERA as a baseline, as well as, also lowering the minimum number of innings in Year X from 120 innings to 100 innings.

Below, I display the results:

Predictor Correlation (r) STDEV
pFIP 0.447 0.516
SIERA 0.424 0.583
FIP 0.423 0.690
xFIP 0.418 0.572
ERA 0.367 0.858

pFIP very clearly was still the strongest predictor (most highly correlated with) of next season ERA.

I listed the standard deviation of each estimator to give an idea of the spread of each metrics estimations. pFIP’s standard deviation is still the smallest of the predictors, which benefits its correlation, but at the same time it is now much closer to the others in terms of spread, and right around the 0.5 mark that I was shooting for.

I was pleased to see these results, because after my last piece, I feared that pFIP’s strength may have rested entirely upon the fact that the spread of projections was extremely tight. However, these results seem to indicate that the methodology behind pFIP is both powerful and possibly a very good indicator of true talent level.

Is pFIP better than a projection system?

I’ve asked this question before and have changed my mind a few times on the subject, but I hope this will serve as a final-ish answer to that question.

In an earlier piece, I showed that for pitchers who threw at least 100 innings in 2011 and pitched in at least five games in 2012, pFIP had a higher correlation with 2012 ERA than the 2012 ZiPS projections.

That test was very obviously a small sample, it used the old pFIP equation and I compared pFIP to only one projection system. Thus, I decided to put this new formula to the test with a broader sample.

I found a sample of pitchers (n=354), who threw at least 100 innings in the year prior and started at least five games in the next season, for the years 2010-12. I tested pFIP against three projection systems: ZiPS, Bill James and Marcel.

The results were as follows:

Predictor Correlation (r) STDEV
ZIPS 0.386 0.630
Bill James 0.324 0.484
pFIP 0.314 0.517
Marcel 0.277 0.496

A good place to start when looking at these results is the standard deviation of each projection.

Bill James’ and Marcel’s spreads are similar to that of pFIP and right around our estimation of the “true talent” spread. It is surprising, at least to me, that ZiPS had the strongest correlation with ERA while also having the largest spread of projections. For all intents and purposes, that makes ZiPS the clear winner.

pFIP was able to pass above the simple Marcel system’s baseline, but was not as strong a predictor as the other two systems tested.

Based on these results, if one were to ask which ERA estimator projects future ERA the most successfully I’d feel fairly safe to say pFIP.

However, if one really wanted to project future ERA with the most success at this point my answer would be to use a projection system.

I created a Google Doc with pFIP’s 2013 ERA projections alongside the projections of four other systems (Marcel, Oliver, ZiPS and Steamer) for any pitcher who threw at least 100 innings in 2012.

For what it is worth, I listed the spread of each systems’s projections below:

Predictor STDEV
pFIP 0.463
Marcel 0.495
Steamer 0.370
Oliver 0.473
ZiPS 0.623

References & Resources
All data comes courtesy of FanGraphs

Newest Most Voted
Inline Feedbacks
View all comments
11 years ago

Great job, Glenn.  You’ve done a really nice job of separating FIP and pFIP, and how one is descriptive and the other predictive. Maybe people will start using them that way.

Don’t feel bad. Tango has humbled us all at one point or another.

11 years ago

ZiPS and Marcel use multi-years of performance, whereas pFIP is intentionally only using one.

In order to improve the predictability of pFIP, simply use more years.  And once you do that, you’ll likely weight more recent seasons more.

And once you do that, you should approach ZiPS.

I’m also surprised you have Marcel as low as you did.

Glenn DuPaul
11 years ago


Even if people don’t use FIP and pFIP in the “descriptive”/“predictive manner, hopefully some people will be more aware of the differences in weight when trying to predict instead of describe performance.


I was also surprised to see Marcel as low as it was, but that may have just been an issue with the sample. 

I’ve attempted to use three years of data with pFIP before and it really hasn’t added any predictive value.  Maybe the .5/.35/.15 weights that I was using were incorrect. I’m really not sure.

Thanks to you both for the comments though.

11 years ago

ZIPS and probably the other projection systems (other than marcel) “know” whether certain pitchers will be changing home ballparks or leagues in the upcoming season and makes an adjustment.  In scrolling through the excel sheet, it seems like some of the larger discrepancies between ZIPS and pFIP involve those pitchers.