Trust but Verify by Adam Guttridge October 21, 2009 Last week, I was playing around with creating a BABIP predictor that was simple, intuitive, and calculable with the information available in my spreadsheets. I also stumbled across some brilliant work done by Peter Bendix at THT some time back. As an offshoot of the BABIP estimator, I decided to start with a baseline reading of how predictable BABIP and batted ball types were, if given a decent sample. (I should note, this is BIS data). I took all players with at least 400 PA in each season from 2006-2009 (98 players, 392 seasons). I wanted to see how well a simple 5-3-2 weighting of their ’06-’08 BABIP and batted ball data would predict their ’09 BABIP/batted ball (not weighted for PA). The results: A few things jumped out at me: –I’m surprised by how well BABIP works at predicting itself. I would have thought we’d see that figure much more towards LD%. That’s somewhat discouraging for me, though… it’s not going to be easy to come up with a BABIP estimator that makes a huge difference in terms of projection accuracy. –LD%… damn. A lot of this is surely attributable to the well-discussed variability and subjectivity involved in batted ball classification, but even absent that, I’d bet this is just a flaky skill that needs to be heavily regressed. –I’m also a bit surprised by how strong of a figure we see for HR/FB. The variability of that figure is the basis for xFIP, and sure, on a seasonal level, it deserves to be factored out. But for an SP with 180+ IP in each of the past 3 seasons, a projection using his (park factored) HR/FB as a component in lieu of the league average rate would likely produce superior accuracy. Also, I checked up on Dave Studeman’s quick-and-dirty method of BABIP prediction; LD% + .12. If you add .12 to the LD% predicted by the ’06-’08 LD%, it correlates at .376 with actual ’09 BABIP. In other words, you’re far better off using plain BABIP.