Moving past DIPS by Colin Wyers July 16, 2009 Defense Independent Pitching Stats basically states that pitchers had little control over what happened to balls that were put in play. Since Voros McCracken proposed it, there has been much controversy about the degree to which pitchers actually can control BIPs by doing things like throwing knuckleballs and inducing pop-ups. However, less attention has been paid to how DIPS has been applied and translated into DIPS ERA and later, FIP. What the past decade of debate has shown is that there is actually very little variation between pitchers and their ability to control balls in play. For example, Tim Wakefield, the premier knuckleballer of our age, has a career BABIP of .275, compared to the average of (roughly) .290. In 8,933 career BIP, that works out to: (.290-.275) * 8933 = 133.995 That gives us about a quarter of a hit per game, or 7.9 hits a season, for a knuckleballer like Wakefield. For his career, if you estimate about .7 runs per hit, Wakefield’s ability to prevent hits on balls in play has lowered his ERA by roughly, oh, .30. (This is all back-of-the-envelope math meant mostly for illustration; we could obviously do a lot better than this but it works for a rough estimate.) Which, of course, isn’t nothing. But Wakefield is one of the most extreme examples of a pitcher being able to affect his BABIP. However, the variation between his ability to control BIP and the average pitcher’s could be small compared to issues in translating BABIP to runs . Just along for the ride It’s important to recognize what we bought into along with DIPS theory. McCracken at the time published a method of estimated a pitcher’s defense-independent ERA that used reconstructing a pitcher’s batting line using the league-average BABIP in place of his own, and then using Extrapolated Runs to convert that into runs. Voros and others have revised this work numerous times; the simplest and most popular implementation of a DIPS measure of estimated ERA is FIP. What we all bought into along with McCracken’s theory on defense was, essentially, Bill James’ Component ERA, typically abbreviated as ERC. Almost every DIPS-like measure of performance has resorted to the use of some sort of component ERA to figure out a pitcher’s defense independent performance. For some reason, most of the controversy surrounding the use of DIPS has focused around Voros’s conclusions on balls in play, when really it’s the use of component ERAs that warrants further examination. So what problems have we (largely without realizing or considering it) brought into our DIPS analysis with component ERAs? Linearity. Most (but not all) component ERAs are linear. (ERC itself isn’t, I should note.) FIP is certainly linear. What I mean is this: FIP treats, for instance, a home run allowed by Pedro Martinez as having the same run value as one allowed by Glendon Rusch. This simply isn’t the case; Pedro allows fewer baserunners and therefor fewer runs per home run. This artificially “caps” the high and low end of the FIP range as smaller than the actual range of performance of major league pitchers. Situational pitching. Guy bad at pitching out of the stretch? Not accounted for. Able to dial up his fastball a bit and get an extra strikeout in a crucial situation? Not included. In other words, component ERA measures treat pitchers as though they all approach situations exactly the same. Sequencing. In real life, it matters if a guy gives up a walk before a homer or a homer before a walk; in component ERA measures they all look the same. One could of course argue that there is an element of “luck” (or for you pedants out there, “observed variation around an estimated level of true talent performance”) to sequencing and situational pitching. But that notion has essentially come along for the ride with DIPS theory, and there’s nothing in McCracken’s research to suggest that they’re any more subject to “luck” than strikeout rates or walk rates or home run rates. Pitching to the situation Let’s examine one aspect of pitching not addressed by DIPS theory and ignored by component ERAs: situational pitching. We’re going to study performance between 1989 and 1999, which is the longest period for which freely-available play-by-play data is detailed enough for this kind of study. Let’s look at the league averages to start: RUNNERS PA BB K HR GB FB LD PU EMPTY 1002156 0.09 0.16 0.03 0.45 0.25 0.21 0.09 FIRST 519630 0.08 0.15 0.03 0.47 0.24 0.20 0.08 LOAD 44236 0.07 0.17 0.03 0.45 0.26 0.20 0.09 SCORE 262142 0.17 0.16 0.02 0.48 0.24 0.20 0.09 ALL 1828164 0.10 0.16 0.02 0.46 0.25 0.20 0.09 For the sake of clarity I have compressed the 24 distinct base-out states into only four states, ignoring the number of outs completely. This is for illustrative purposes, and may not reflect the correct way to group these for substantive analysis. The first column represents the runners on base: Empty means there are no baserunners. Loaded means, well, the bases are loaded. First indicates any situation where there is a runner on first except for when the bases are loaded. Score indicates any other situation – that is, when there are runners in scoring position but first base is open. All refers to all situations. For these purposes I have included hit by pitch and intentional walks in BB; BB, K and HR are per plate appearance and the batted ball types (ground balls, fly balls, line drives and popups) are per batted ball. Now, for the interests of clarity, let’s look at the figures divided by the overall average—in other words, the relative difference between what a pitcher does in that situation compared to what he does in all situations: RUNNERS PA BB K HR GB FB LD PU EMPTY 1002156 0.91 1.03 1.04 0.98 1.02 1.01 1.01 FIRST 519630 0.82 0.93 1.03 1.02 0.97 0.99 0.98 LOAD 44236 0.78 1.05 1.05 0.97 1.04 1.01 1.06 SCORE 262142 1.74 1.01 0.79 1.03 0.96 0.97 1.00 That should make it clearer as to how a pitcher (and the hitters he faces, it should be noted) change approach based on the situation. Look at the dramatic differences in walk rate, for instance, especially with the bases loaded. Now let’s look at some individual pitchers, to see how they might approach situations differently. I have put the actual rates for these pitchers in the “All” group; the rest of the figures are the pitcher’s performance relative to himself, not the league. That’s an important distinction. And again, this is only from 1989 to 1999. Also, a caution—this is exploratory analysis, just a casual stroll through the data. Please don’t become too attached to one particular data point or one particular pitcher. Let’s start with Pedro Martinez: RUNNERS PA BB K HR GB FB LD PU EMPTY 3361 0.98 1.00 1.14 0.96 1.04 0.99 1.08 FIRST 1362 0.79 0.94 0.56 1.06 0.93 1.05 0.84 LOAD 85 1.11 1.34 2.42 1.02 0.78 1.59 0.40 SCORE 669 1.51 1.08 0.99 1.09 0.96 0.87 0.99 ALL 5477 0.08 0.28 0.02 0.43 0.26 0.20 0.11 There’s certainly a lot of variance here, compared to what we saw for the league average. Some of this is obviously noise, especially in the loaded group; there were only 85 plate appearances in the sample with the bags juiced. But there are still some interesting things going on here. One thing to note is that Pedro seems to be getting a lot more grounders with runners on. This is important; let’s compare the value of the various types of outs when there are runners on or not: EVENT EMPTY MEN_ON Strikeout -0.18 -0.42 Flyout -0.19 -0.34 Groundout -0.19 -0.41 Lineout -0.19 -0.38 Popup -0.19 -0.42 With the bases empty, it simply doesn’t matter what sort of an out you get; when you start adding baserunners, it starts to matter a great deal what kind of outs you are getting. If Pedro is getting more of his ground balls with men on, then his ground balls are more valuable than the average pitchers’ because he is getting them at more opportune moments. Now let’s compare Pedro to a very different sort of pitcher; a guy who doesn’t get a lot of strikeouts, like Kirk Rueter: RUNNERS PA BB K HR GB FB LD PU EMPTY 2215 0.82 0.98 0.99 0.97 1.02 1.05 0.99 FIRST 1045 0.88 1.01 1.29 1.03 1.01 0.95 0.92 LOAD 57 0.78 1.70 0.64 0.98 1.44 0.60 0.75 SCORE 446 2.23 0.96 0.41 1.09 0.82 0.89 1.29 ALL 3763 0.07 0.12 0.03 0.46 0.25 0.20 0.09 Again, we can conclude too much from sample data. But it certainly looks like Reuter likes to pitch around guys when he has runners on and first base open. And he goes looking for more groundouts and popups in those situations as well. As we just established, outs that are much more valuable than others when men are on base. What we need to do now is answer two key questions: How much of situational pitching is skill? How can we use a pitcher’s situational tendencies to predict his ERA? That, unfortunately, will have to wait. References & Resources The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at “www.retrosheet.org“. I’ve studied the accuracy of various component ERA systems before, here and here. Since I know I’ll be asked, here’s Greg Maddux: RUNNERS PA BB K HR GB FB LD PU EMPTY 6356 0.82 1.06 0.90 1.01 0.94 1.03 0.95 FIRST 2619 0.65 0.87 1.27 0.98 1.08 0.98 1.00 LOAD 150 0.72 1.00 1.71 0.91 1.23 0.82 1.85 SCORE 1438 2.47 0.99 0.89 0.98 1.11 0.93 1.16 ALL 10563 0.06 0.18 0.01 0.60 0.16 0.19 0.06