Trust but Verify

Last week, I was playing around with creating a BABIP predictor that was simple, intuitive, and calculable with the information available in my spreadsheets. I also stumbled across some brilliant work done by Peter Bendix at THT some time back.

As an offshoot of the BABIP estimator, I decided to start with a baseline reading of how predictable BABIP and batted ball types were, if given a decent sample. (I should note, this is BIS data).

I took all players with at least 400 PA in each season from 2006-2009 (98 players, 392 seasons). I wanted to see how well a simple 5-3-2 weighting of their ’06-’08 BABIP and batted ball data would predict their ’09 BABIP/batted ball (not weighted for PA). The results:


A few things jumped out at me:

–I’m surprised by how well BABIP works at predicting itself. I would have thought we’d see that figure much more towards LD%. That’s somewhat discouraging for me, though… it’s not going to be easy to come up with a BABIP estimator that makes a huge difference in terms of projection accuracy.

–LD%… damn. A lot of this is surely attributable to the well-discussed variability and subjectivity involved in batted ball classification, but even absent that, I’d bet this is just a flaky skill that needs to be heavily regressed.

–I’m also a bit surprised by how strong of a figure we see for HR/FB. The variability of that figure is the basis for xFIP, and sure, on a seasonal level, it deserves to be factored out. But for an SP with 180+ IP in each of the past 3 seasons, a projection using his (park factored) HR/FB as a component in lieu of the league average rate would likely produce superior accuracy.

Also, I checked up on Dave Studeman’s quick-and-dirty method of BABIP prediction; LD% + .12. If you add .12 to the LD% predicted by the ’06-’08 LD%, it correlates at .376 with actual ’09 BABIP. In other words, you’re far better off using plain BABIP.

Newest Most Voted
Inline Feedbacks
View all comments
Detroit Michael
14 years ago

It sounds like for most of the article you are talking about batting statistics but then the penultimate paragraph switches to talk about pitchers.  I found it confusing.  I think the persistence from one year to the next of these statistics is much different if we’re talking about pitchers.

Derek Carty
14 years ago

Hey Adam,
I did a quick bit of research here about BABIP, xBABIP, and other estimators.  Similar findings – i.e. LD%+.120 is no good for forward looking stuff.

Nick Steiner
14 years ago

You only tested hitters right?  That’s why you see a high correlation for HR/FB, it’s not really a “luck” stat for them.  If you tested pitchers, I bet you would find a much weaker relationship between year 1 and year 2. 

Also, are these R or R^2?

Adam Guttridge
14 years ago

R, Nick.

And yes, it would be interesting to test pitcher’s HR/FB. I perhaps ran a bit far with the assumption that if it would be as predictable for hitters as pitchers (like most stats, ie BABIP, are).

And Derek…. that’s wonderful work. The problems with xBABIP and even qxBABIP is that logistically, the’yre awfully difficult to pull of with my sheets. I’ve had some success thus far with GB rate and a speed index. I’m optimistic I’ll be able to put together something that gets it into .7 territory.

Dave Studeman
14 years ago

I don’t know how many times I’ve said this, but my formula, basing BABIP off LD%, was never meant to be used to “predict” future BABIP.  It was meant as a way to judge how lucky/unlucky the hitter or pitcher was in retrospect only. I think I mentioned it just twice in articles and never used it to predict future BABIP.  Dutton and Bendix took it out of context in their study.

This kind of information was studied in detail by David Gassko and JC Bradbury in two old THT Annuals.  I highly recommend people read those.

Adam Guttidge
14 years ago

Sorry if I ended up mischaracteizing your use of the stat, Studes. Yes, I would imagine the Bendix article was my source for that.