Predicting double play rate

by Dan Turkenkopf
July 16, 2009

It seems pretty intuitive, right? Big, slow sluggers will hit into more double plays than speedsters.

They hit the ball harder, so the ball gets to the infielders a lot faster. And they obviously don’t run as well, so they’re easier to double up. But sluggers also are less likely to put the ball in play and less likely to hit a ground ball when they do put the ball in play.

So is our theory really true? Do power hitters hit into more double plays than weaker hitter who might be faster?

Methodology

The approach I used is a pretty simple one. I figured the double play rate for every player season since 1954 that had more than 50 opportunities, where an opportunity is a plate appearance with a runner on first and fewer than two outs. This left me with 13,169 player seasons. For each of those player seasons, determine isolated power (slugging percentage – batting average), isolated walk rate (on base average – batting average) and speed score. I calculated speed score the same way Fangraphs does; using stolen base percentage, stolen base frequency, triple rate and runs scored percentage.

Using isolated power (isoP), isolated walk rate (isoW), speed score and a dummy variable for handedness (0 for right handed, 1 for left handed and .5 for switch hitters) as my independent variables, I ran a linear regression against double play rate.

This approach has some possible concerns though. The most problematic is that handedness, isoP, isoW and speed score might not be completely independent.

In fact, only two of the relationships are correlated even somewhat strongly. Isolated power and isolated walk rate have a 0.31 correlation (where a number closer to one or negative means represents a stronger relationship), while batting hand and isolated walk rate have a 0.15 correlation. Apparently lefties have a higher isolated walk rate than do righties.

The full set of correlations is:

isoP / isoW	0.31
isoP / speed score	-0.02
isoW/ speed score	-0.10
handedness / speed score	0.06
handedness / isoP	0.02
handedness / isoW	0.15

Also, I’d really like to include ground ball/fly ball ratio but I don’t have reliable data for the entire Retrosheet era. Finally, it’s entirely possible that the relationships between skills have changed over time, and we might be better suited looking at smaller time period to find the proper equation.

Results

The equation to predict double play rate, or the chance of a batter grounding into a double play when faced with at least a runner on first and less than two outs, according to my regression is:

gidpRate = 0.215 * isoP + 0.529 * isoW + 0.009 * speed score – 0.015 * (0 if batter is right handed, 0.5 if batter is a switch hitter, 1 if batter is left handed).

All the variables are significant at the 99 percent confidence level, and the entire formula describes roughly 76 percent of the variation in double play rate.

Interestingly enough, power, batting eye and speed all contribute positively to the rate of double plays. Being left handed is about the only advantage. Of course, since you rarely find players who are fast power hitters with good eyes, the formula probably doesn’t do a good job of capturing the real interaction between those skill sets.

Now that the hard work is out of the way, let’s look at the fun stuff.

Most expected double plays

Player	Season	Expected DPs
Barry Bonds	2001	37
Jeff Bagwell	1999	36
Barry Bonds	1998	35
Jeff Bagwell	1996	33
Jeff Bagwell	2000	32
Willie Mays	1962	32
Jeff Bagwell	2001	32
Sammy Sosa	2001	31
Alex Rodriguez	2007	31
Mark McGwire	1998	31
Larry Walker	1997	31

Wow, that’s definitely the Jeff Bagwell list. Basically, the better hitter you are, the more double plays you’re expected to hit into. This causes me some concern. More on that later.

Best at avoiding the double play

Player	Season	Expected DPs	Actual DPs	Delta
Barry Bonds	2001	37	5	32
Barry Bonds	2001	37	5	32
Joe Morgan	1976	28	2	25
Sammy Sosa	2001	31	6	25
Joe Morgan	1975	27	3	24
Mickey Mantle	1955	28	4	24
Jimmy Wynn	1969	29	5	24
Mickey Mantle	1961	25	2	23
Barry Bonds	2004	28	5	23
Mickey Mantle	1956	27	4	23
Barry Bonds	2002	27	4	23

Not surprisingly, there are lot of the players we expected the formula to handle poorly: those who have good eyes, good power and fairly good speed. At this point, I’m thinking I probably should have included contact rate in this regression as well, although that will likely raise problems because of its relationship to isolated power.

Now the worst seasons:

Worst at avoiding the double play

Player	Season	Expected DPs	Actual DPs	Delta
Brad Ausmus	2002	9	30	-21
John Bateman	1971	7	27	-20
A.J. Pierzynski	2004	7	27	-20
Jerry Adair	1969	5	24	-19
Miguel Tejada	2008	13	32	-19
Paul Konerko	2003	9	28	-19
Tony Armas	1983	12	31	-19
Sean Casey	2005	9	27	-18
Ken Reitz	1976	6	24	-18
Al Oliver	1984	5	23	-18
Ted Simmons	1973	11	29	-18

Looking at this list, I’m thinking speed isn’t being considered enough. Most of the players who miss on the low end are quite slow. Perhaps speed score isn’t the best way to estimate speed of a player. It definitely doesn’t seem to be a normal distribution which might cause problems when using it as part of a regression.

The wrapup

I’ve got a lot of misgivings about the usefulness of these results. I think as they stand right now, they mostly prove the hypothesis that the stereotypical beer league softball player (a.k.a. the Moneyball player) is expected to hit into more double plays than the weaker hitting speedster.

The regression formula seems to go too far though. Those hitters who best combined speed, power and batting eye are predicted to hit into the most double plays. The top hitters in the game dominate both the list of most expected double plays, and the list of best at avoiding expected double plays. I’m thinking the regression equation misses most at the extremes, which calls into question its entire applicability.

At this point, I’m not sure what it’s really useful for, besides being a potentially interesting piece of data. If we’re trying to predict future double play rate, then we might be better off using the next season’s double play rate as our independent variable. If we’re attempting to predict whether it makes sense to intentionally walk the current batter because we think the next one might hit into a double play, this calculation might help, but perhaps not as much as looking at his actual double play rate. It’s not a value measure, so it can’t be used looking backwards.

Perhaps the best we can hope for is that it sheds some light on the interaction between batter skills and that strength at the plate overcomes any negative that stems from grounding into double plays. Future work on the topic can better account for ground ball / fly ball tendencies and contact rate, as well as speed of the runners on the bases, which may provide a more accurate picture of how double plays really unfold.

11 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Gilbert

16 years ago

Possibly some of the ISO effect has to do with managers, e.g. will they send the runner to avoid DP’s? Maybe less so with the big bopper at the plate. I would also guess that speed score has little to do with how fast the runner gets down the line; e.g. Bonds, Mantle and maybe Sosa can run better than their SB would indicate since their eras did not emphasize it, or they were too valuable to risk basepath injury.

Barry

16 years ago

Doesn’t the small, positive coefficient for speed score just suggest that speedsters hit into more DPs than you would expect from their power/walk numbers? In other words, if you ran the regression without speed score, you would typically under-predict the DP rates of the speedsters due to their low power and/or walk numbers… Another way to think about this is if the slowpokes do hit into more DPs than speedsters, the small coefficient multiplying a small speedscore won’t impact the prediction from isoP and isoW on the slowpokes, but it will add enough with a high speed score to counterbalance the low isoP and/or isoW numbers…

Just a thought…

Larry Mastel

16 years ago

Position in the batting order also plays into this. Look at notorius dp avoider Dick McAulliffe.
Go through his at-bats for a season. He was leading off a lot with no double play possible and otherwise seemed to be in alot of situations where no double play was possible.

Pizza Cutter

16 years ago

Speedier players hit more ground balls… thus they’ll have more of a chance to ground into double plays on that account.

John Walsh

16 years ago

I’m not an expert on regressions, but this model just doesn’t seem to make sense. How can speed have a positive coefficient? This would mean that, all other things being equal, slow runners are better at avoiding the double play than fast runners. How can that be true, all other things being equal?

I think you could make a good model with just three independent variables: GB% (the percentage of plate appearances that result in a ground ball), speed score and handedness.

You can get a pretty good measure of GB% for all the retrosheet period by looking at outs (on balls in play) and whether an assist was credited on the play. There are a few unassisted groundouts, of course, and perhaps GB% on outs it not the same as on all BIP, but this will be a decent enough measure.

Since you are parsing the retrosheet data, you might as well just consider how a batter actually batted (L or R) for a particular plate appearance instead of adding switch-hitting to the handedness variable.

Brandon Tingley

16 years ago

Personally, I’d figure that a batter’s ground ball rate and the frequency with which he batted with players on first would be the most important factors. The guys who beat the predictions also hit a ton of fly balls and, in general, drew a lot of walks.

I imagine you’d get a better fit with less scatter just using those two variables. with the inclusion of speed score as a secondary effect tightening it up even further.

Dan Turkenkopf

16 years ago

@John,

I agree that the speed result is the most confusing to me. I think it might be because of a correlation between speed score and ground ball tendencies.

Ground ball rate is clearly necessary to include in this, although I’m worried about the interactions between ground ball rate and ISO.

John Walsh

16 years ago

@Dan,

Why don’t you drop ISO from the model? I don’t see how ISO should influence GDP rate except via its correlation with GB% and speed.

I suppose how hard GB are hit could also be a factor, but it’s probably minor, especially since there are competing aspects: hard hit GB get to the IF faster, increasing the chance of a GDP, but hard GB also get through the IF more often. In any case, power measures are correlated with GB% and that makes it hard to include it in the model, it seems to me.

Dan Turkenkopf

16 years ago

@Brandon

Since I was measuring double plays / opportunity, I don’t think that frequency batting with runners on first would have much of an impact.

Ground ball rate is, of course, a major factor that really should have been included.

Dan Turkenkopf

16 years ago

@John,

ISO was in the model because of the harder hit ball hypothesis. I was wondering if that counteracted the speed effects.

But the correlation between power and ground ball rate does make this problematic at best. I’ll see if I can figure out ground ball rates and re-run the regression.

I’m thinking that contact rate, ground ball rate and speed score might be more meaningful, although not what I was originally interested in finding.

ksw

16 years ago

nice basic research.
from this departure point, one could explore the following:
—lower strike out batters lead to more runners sent; fewer gidps
—high k rate people hit fewer balls, hence fewer gidp
—score at the time (defensive priorities)—, fewer gidp possibilities
—ground ball throwing ability of the pitcher to power hitters
—power hitters not being walked & then grounding into double play—power hitters walking & not getting chances to ground into double plays
intuitively, and the numbers bear me out—the highest gidp rates are faithfully mapped by a hitter with above average 2b, 3b, and homer rates; batting right handed; lower than expected k/ab given the perceived power; lower than expected w/tab given the perceived power level; batting third, fourth, or fifth, on a decent offensive team, but batting behind not great runners.
start from the observational, and draw a hypothesis from that.
use the numbers to test the hypothesis.
regards,
kevin

BAL	CHW	ATH
BOS	CLE	HOU
NYY	DET	LAA
TBR	KCR	SEA
TOR	MIN	TEX

ATL	CHC	ARI
MIA	CIN	COL
NYM	MIL	LAD
PHI	PIT	SDP
WSN	STL	SFG