Another Look at DIPS

The discovery of Defense Independent Pitching Statistics (DIPS) theory by Voros McCracken dramatically changed the way we evaluate pitchers. DIPS theory rests on the premise that the stats a pitcher generates without the help of his fielders contain nearly all of the information needed to predict pitcher success; therefore, knowing a pitcher’s tendency to allow hits on balls in play tells us very little. The strong-form of the argument (preventing hits on balls in play is not a skill) is not held by all, but the weak form (typically, pitchers have very little skill in preventing balls in play) has strong support within the sabermetric community. I decided to take a closer look at DIPS focusing on recent baseball history, and in this article I present what I found. Though many of my results mirror old findings, I did find a few interesting things along the way.

I’m going to separate my analysis into two parts. First, I analyze the predictive power of DIPS. Sabermetricians use DIPS to predict performances of players in the future and to judge players in the present. Forecasting systems rely heavily on DIPS for obvious reasons, and an in-season ERA predicted via DIPS metrics (such as FIP) lets us judge how well a pitcher is currently performing without the noise of ERA. Second, I dig a little deeper to examine how much control, if any, pitchers have over hits on balls in play.

Part I: Repeatable Performance as a Measure of Skill

Simple Correlations

What can we learn about a pitcher from his stats from year-to-year? My first step in answering this question was to determine the correlation of individual pitching metrics over time. The stronger the correlation, the more likely it is that the statistic is useful in predicting future performance. A repeated performance from season-to-season may also indicate some individual skill possessed by the pitcher in this area. So, I estimated the season-to-season correlations of ERA (park-corrected using 3-year pitcher park factors), strikeout-rate (K9), walk-rate (BB9), home run-rate (HR9), and batting average on balls in play (BABIP) — all rates are per 9 innings. I looked at seasons from 1980-2004 in which pitchers threw more than 100 innings for two consecutive seasons as my sample. This gave me over 500 pitchers and more than 2000 pitcher seasons to examine.

The scatter plots below each include a regression line indicating the best fit of the impact of the previous year’s stat on the current year’s stat. The R2 tells us the percent of the change in the current year’s stat that can be explained by that same statistic from the previous year. The first stat is the most common one used to evaluate pitchers: ERA.

image
http://bradbury.sewanee.edu/tht/erapfr.png

It’s obvious to see why sabermetricians have been down on ERA for so long. Without the regression line it would be hard to tell if there was much of a relationship at all. We know that good pitchers tend to have good ERAs from year-to-year, and the opposite with bad pitchers, but for nailing down precisely how good or bad a pitcher will be from year-to-year, ERA is not very helpful. The ERA from the previous season explained only about 13% of variance of the following season’s ERA. Next, I moved on to the seeming source of the problem: BABIP.

image
http://bradbury.sewanee.edu/tht/babipr.png

If pitchers don’t have much ability to prevent hits on balls in play, the correlation across seasons should be weak, and the R2 of 0.06 seems to support this central tenet of DIPS. From the season-to-season correlation, pitchers did not seem to have much special ability to prevent hits on balls-in-play. It’s interesting to note that the reported R2 is generated from a bare bones regression that does not control from some important omitted factors. One particular factor, the defense behind a player that doesn’t change teams, is probably heavily biasing that paltry 6% upwards. Next, I examined the holy trinity of DIPS: strikeouts, walks, and home runs.

image
http://bradbury.sewanee.edu/tht/k9r.png

Everyone evaluates pitchers by strikeouts, and they should. No variable produced by pitchers had more constancy over time than the strikeout-rate. The strikeout-rate from the previous year explained over 60% of the variance of the following year’s strikeout-rate. What about walks?

image
http://bradbury.sewanee.edu/tht/bb9r.png

Though walks from season-to-season were not as predicable as strikeouts, the correlation was quite strong, with an R2 of 0.42. And homers?

image
http://bradbury.sewanee.edu/tht/hr9r.png

Well, the correlation for home runs was about half of what it was for walks, but the recent past revealed nearly twice as much about the future for home runs than it did for plain old ERA.

So far, DIPS theory seems to hold up quite well. In fact, compared to McCracken’s initial estimates, the correlations are quite similar. But, so what if some metrics are more correlated than others over time? Strikeout-rates may be strongly correlated from season-to-season, but what impact do they have on pitcher run prevention? Next, I wanted to see the impact of each metric on predicting ERA. Again, I found that DIPS theory contains some powerful truths.

A Hardball Times Update
Goodbye for now.

Using DIPS to Predict ERA in the Present

In order to figure out the ability of DIPS to predict future ERA I needed to establish a baseline impact for these stats during a current season on the current ERA of the pitcher. I employed a linear regression technique to estimate the impact of different pitching statistics on ERA. This technique is designed to handle many problems of multiple observations of individuals over time — the regression results were estimated using random effects and corrected for first-order serial correlation.

To begin, I estimated the impact of the big-3 (K9, BB9, HR9), plus the hit-batter-rate (HBP9) DIPS components and BABIP, on ERA, while controlling for the defense of the team, the age of the pitcher , the league of the pitcher, and the season. I used the pitcher’s team seasonal BABIP to proxy defense, assumed the impact of age on ERA to be U-shaped (quadratic), and used indicator variables equal to 1 or 0 to identify the league and year in which the pitcher’s stats were posted. Here are the unit and percentage impacts of the variables on a pitchers ERA (again, park-corrected).

Table 1. The Impact of Current Season’s Peripheral Stats on ERA

Stat    Unit        PercentageK91     -0.17       -0.24%BB91     0.30        0.23%HR91     1.42        0.32%HBP91    0.34        0.02%BABIP1   18.56       1.31%R2 = 0.77Superscripts for levels of statistical significance:
5 = 5% significance
1 = 1% significance

From the unit impact we can see the effect of a one-unit change in the pitching statistic on ERA. In this sample, an increase of one strikeout per nine innings lowered a pitcher’s ERA by about 0.17. The percentage impact (or elasticity) tells us the percentage change of ERA in response to a 1% change in the statistic. So, a 1% increase in K9 lowered ERA by 0.24% (the percent changes are calculated at the average values of the statistic and the ERA). The percentage impact helps us judge the impact of the different metrics relative to their normal values. For example, the unit impact of every walk is nearly twice that of a strikeout; however, in terms of the average number of walks and strikeouts their percentage impacts on ERA are nearly identical.

Differences in the variables regression, as a whole, explained 77% of the variance of pitcher ERAs, which is quite good. All of the estimated impacts were “statistically significant,” meaning there is less than a 5% chance that these variables have no effect on ERA.

The thing I find most interesting about these estimates is how well they correlated with linear weight values for these events. Walks, HBPs, and home runs were in line with linear weights. Strikeouts were a little low, but close; however, this estimate was for earned run prevention. Strikeouts ought to have some additional run-suppressing impact on unearned runs that may account for the difference. The large impact of BABIP is quite interesting. A standard deviation increase in BABIP (.023) raised a pitcher’s ERA by about 0.43 (approximately 10% of the average ERA for the sample). If pitchers have little effect over balls in play, then a random fluctuation of BABIP can influence a pitcher’s ERA quite a bit. I’m quite happy with these estimates, and they provide a fine baseline to evaluate how well the DIPS statistics predict future run prevention.

Using DIPS to Predict ERA in the Future

Predicting an ERA from a pitcher’s statistics from the prior season was not much different from the exercise above. All of the estimates reported below were estimated using the previous season’s K9, BB9, HR9, and BABIP, while the defense, age, and league control variables were from the current year.

Table 2. The Impact of Previous Season’s Peripheral Stats on ERA

Stat    Unit        PercentageK91     -0.18       -0.25%BB91     0.13        0.10%HR91     0.18        0.04%BABIP 1 -1.96       -0.14%R2 = 0.30

All of the reported estimates were statistically significant (HBP9 did not seem to be important, so I dropped it from the regression).

The result for strikeouts is quite interesting. The previous year’s strikeout-rate impacted the current season’s ERA about as much as the current season’s strikeout-rate (see Table 1 for comparison). This is not totally surprising, because strikeouts were strongly correlated from year-to-year, but I did not expect such a strong relationship. While walks and homers were important, they were not as consistent predictors of ERA as strikeouts.

BABIP had a statistically significant impact on ERA, but the effect was small and in the opposite direction that one would expect. Rather than thinking there is some inverse relationship between BABIP from year-to-year, this is more likely derived from a few extremely high and low BABIP seasons that typically regress to the mean the following year. When I estimated the model using a median regression technique that minimizes the impact of extreme values, BABIP was no longer statistically significant.

But how does the DIPS prediction stand up to a prediction based on the previous season’s ERA? Quite well.

Table 3. The Impact of Previous Season’s ERA on Current Season’s ERA

Stat        Unit       PercentageLag ERA     0.005      0.0045%R2 = 0.18

Using all of the same control variables in the DIPS-only model above, I found a very weak relationship between the previous year’s ERA and in ERA the following year. In reality the effect was nothing since the estimate was not statistically different from zero. The R2 was not quite half of the DIPS-only model. But, I wanted to look a little deeper. Maybe, after controlling for the impact of the DIPS components, ERA contains some extra information about a pitcher’s future performance. Possibly, ERA from the previous year could capture some sort of clutch ability to prevent runs.

Table 4. The Impact of Previous Season’s Peripheral Stats and ERA

on the Current Season’s ERA

Stat        Unit        PercentageERA1        -0.11       -0.11%K91         -0.19       -0.19%BB91         0.15        0.11%HR91         0.26        0.06%BABIP       -0.68       -0.05%R2 = 0.29

When I included the lag of ERA in the full model with the DIPS variables and BABIP, ERA did become statistically significant … in the wrong direction. However, the impact was tiny; and, as it was for BABIP, it was probably the result of a few outlying extreme values regressing back to the mean.

So, when it came to predicting pitching success, DIPS contained the most consistent information. This is no shock. But, I’m not sure if these findings necessarily mean pitchers do not have control over balls in play. I’ll explain why in Part II.

Part II: Pitcher Control Over Hits on Balls in Play

Now that I have confirmed the predictive value of DIPS, there is another question to answer. Is it possible that pitchers do have the ability to affect hits on balls in play, but that this influence is so strongly correlated with the DIPS that it is masked? Multiple regression analysis identifies a correlation between the predicting and predicted variables included in the model, but it does not tell us why. If a pitcher strikes out a lot of batters it does not necessarily mean that the corresponding effect on ERA comes solely through the direct impact of strikeouts. The correlation between strikeouts and ERA could reflect a pitcher’s ability to affect hits on balls in plays in addition to the direct effect on limiting balls in play. If strikeout pitchers cause weak groundouts and walk-prone pitchers serve up more line drives, these factors will be captured in the weights assigned to strikeouts and walks in a multiple regression estimation.

For all practical purposes, this possibility is irrelevant — if DIPS tells us all we need to know about run prevention, it doesn’t matter why — but I wanted to see if it was true. First, I estimated the impact of the DIPS components on a pitcher’s BABIP in the same season.

Table 5. The Impact of Current Season’s Peripheral Stats on BABIP

Stat        Unit        PercentageK9          -0.0004     -0.0087%BB95         0.0011      0.0114%HR9         -0.0014     -0.0044%HBP9        -0.0005     -0.0004%R2 = 0.28

It turns out that within the same season only walks were correlated with BABIP at a statistically significant level. However the effect was tiny in both unit and percentage terms; every walk per game increased the BABIP of a pitcher by about 0.001. Theoretically, I’m not too surprised by this. A pitcher who walks more batters is going to be more likely to place balls over the plate that can be hit hard, plus he may be behind in the count often and have to throw a meatball. But, the effect was minuscule and largely irrelevant: the effect of each walk on ERA through BABIP was about 0.009 earned runs per game.

What about the possibility of all of the DIPS from the previous year influencing the present year’s BABIP? Maybe, the previous year’s BABIP doesn’t give us much information about the following year’s BABIP, but the DIPS do because they are correlated with a pitcher’s ability to affect hits on balls in play. If this is the case, then it’s possible for a pitcher to control BABIP through his DIPS. And in a multivariate regression on ERA, like I ran in Part I, the effect would be captured by the DIPS variables. Therefore, I estimated the impact of the previous year’s pitching statistics on the following season’s BABIP.

Table 6. The Impact of Previous Season’s Peripheral Stats on BABIP

Stat        Unit        PercentageK91         -0.00172    -0.035%BB9          0.00013     0.001%HR95        -0.00347    -0.011%BABIP5      -0.04148    -0.041%R2 = 0.30

The results confirmed something startling in the magnitude and statistical significance of the predicting variables: differences in pitcher control over hits on balls in play were somewhat predictable from past performance. But, that information is not in the statistic we would think to look at first, BABIP. This ability has been hidden due to its correlation with DIPS metrics. It turns out that, in fact, the strikeout and home run rates were inversely related to BABIP in the following season. Though it’s not widely discussed, Voros McCracken also found correlations between both strikeouts and home runs with a pitcher’s future BABIP. And in his DIPS 2.0 article, he adjusted for pitcher influence in this area. On strikeouts he writes,

Looking at the numbers over and over and over again, it becomes clear that a pitchers strikeout rate during a single season is a bit better predictor of his hits in play the following year than his own hits per balls in play. This is there and it’s real.

And on home runs he states,

While a shaky relationship, it appears that the more Home Runs a pitcher gives up, the fewer hits per balls in play he gives up.

The effect for strikeouts seems a bit obvious. The fear of strikeouts possibly induces hitters to take weaker protective swings to stay alive, and thus yields softer hits that are more likely to result in outs. The effect of home runs seems a bit counterintuitive at first, but it’s capturing the effect of the ground-ball-to-fly-ball ratio. Pitchers who give up more fly balls are likely to give up home runs, but also produce more outs, as fly balls are more likely to yield outs than ground balls.

But just because something is statistically significant does not mean it is practically significant. Using the estimate of the impact of the predicting variables reported in Table 6 and my earlier estimate of the impact of BABIP on ERA (18.56) in Table 1, I was able to assign an earned run value to strikeout and home run prevention through balls in play. The effect of home runs on BABIP is very small (just as McCracken found) so I won’t discuss it further, but the impact of strikeouts was large enough to continue investigating. For every one-strikeout increase per game, the BABIP decreases by 0.00172. Multiply that by 18.56 and every strikeout is worth 0.03 earned runs per game. That’s small, right? Well, yes and no. Let’s use Randy Johnson’s 2004 as an example.

In 2004, Randy Johnson struck out 10.6 batters per game. According to the estimates above, this rate would lower Johnson’s expected 2005 BABIP by 0.018 (10.6 x 0.00172 = 0.018) and his ERA by 0.34 (18.56 x 0.018 = 0.34); an impact on earned run prevention equal to about two additional strikeouts (2 x 0.17 = 0.34) according to the estimate in Table 1. This translates to 9.3 earned runs saved on decreasing hits on balls in play over the number of innings he pitched in 2004 (0.34 x [245.67/9] = 9.3), which is 13% of his 2004 total of 71 earned runs.

A pitcher with the average strikeout-rate for the sample would gain the ERA benefit of about one extra strikeout per game through his effect on BABIP. This would lower his predicted ERA by 0.18 and result in 3.25 fewer earned runs a season (about 4%). Johnson saves approximately 6 earned runs per season more than the average strikeout pitcher through his ability to prevent hits on balls in play. That is a very real effect. So, why doesn’t BABIP correlate very well from year-to-year when strikeouts do? Well, there’s just a lot more noise generated by random bounces from year-to-year in BABIP than there is in strikeouts.

The effect that pitchers have over hits on balls in play is small compared to the effects of the other DIPS metrics; however, it is large enough to tell us that pitchers do have the ability to prevent hits on balls in play. So where does this leave us? Well, it turns out that though pitchers do seem to have the ability to prevent hits on balls in play, it does not alter the predictive element DIPS theory one bit. Why not? Because that ability is captured in DIPS statistics. Those who are comfortable evaluating pitchers using DIPS can continue to feel comfortable doing so.

These findings also fit with some recent research by Tom Tippett. In Can Pitchers Prevent Hits on Balls in Play?, Tippett looks at a rather large sample of pitchers’ BABIP over their careers and finds that several pitchers seemed to have had a consistent impact over hits on balls in play. It would be an interesting project to see how much of this difference is predicted by pitcher strikeout rates.

Conclusion
In summary, DIPS is right. Knowing DIPS can tell you more about a pitcher’s future performance than his previous ERA. While pitchers may have some ability to prevent hits on balls in play, the effect is small. And any effect a pitcher does have is reflected within DIPS metrics. Other studies have shown that pitchers do seem to be able to influence certain hit-types, such as ground balls, fly balls, and line drives, but the effect of these tendencies on run-prevention is ambiguous. And it may be that these tendencies may be correlated with DIPS. As new hit-type data comes in this year to The Hardball Times, hopefully, the impact of hit-type on run prevention will become clearer.

References & Resources
For a list of links relating to DIPS research on the Internet visit Jay Jaffe’s Defense Independent Pitching Statistics for 2004 page at Futility Infielder.

I wish to thank Studes and Greg Tamer for excellent suggestions.


Comments are closed.