The Reliability of Intrinsic Batted-Ball Statistics

by Glenn Healey
August 2, 2016

The highest probability for a triple is a ball hit down the first-base line.

The success of a major league baseball team depends in part on its ability to predict the future performance of players. This has led to the development of forecasting systems that can inform personnel decisions which routinely result in player contracts worth tens of millions of dollars.

Most forecasting systems are based on a process that estimates a player’s current talent level and another process that predicts how that talent level will change in the future. The first process generates a set of statistics that represent various player attributes using weighted averages of past observations. These statistics are adjusted to provide the current talent estimate. The second process uses a model for how each statistic changes as a player ages. While most statistics tend to improve for young players and decline for older players there are significant differences in the aging curves for different skills.

Due to the randomness associated with batted ball outcomes, the prediction of a player’s future results on batted balls is often cited as the biggest challenge for a forecasting system. Since about 70 percent of major league plate appearances result in a batted ball, the accuracy of these predictions is critical to system performance. We will investigate how data acquired by technologies like HITf/x and Statcast can be used to address this challenge. Our work follows several previous efforts that have demonstrated the value of HITf/x and Statcast for a range of applications.

Intrinsic Values

In a previous article we presented a method that assigns an intrinsic value to batted balls at contact. This approach separates the intrinsic value of a batted ball from its outcome and, in the process, removes the effects of factors such as the defense, the weather, the ballpark and random luck. The method uses machine learning techniques in conjunction with Sportvision HITf/x data to derive a continuous mapping from batted ball parameters to the probability of outcomes.

Today we’ll examine the use of this approach for forecasting. In case you missed the last article, here’s a quick summary. For a batted ball with an initial speed s, vertical angle v, and horizontal angle h, the system computes the probability of each batted ball outcome.

Suppose, for example, that a right-handed batter hits a line drive to the outfield at a vertical angle of v = +12 degrees and an initial speed of s = 97 miles per hour. The generated probabilities of a single, double, triple and out as a function of the horizontal angle h are:

where h = -45 degrees is the direction toward third base and h= +45 degrees is the direction toward first base. Most of these balls that are not hit near the foul lines result in singles and maxima in the probability of an out occur for balls hit in the direction of an outfielder. The highest probabilities for doubles and triples occur for balls hit near the lines and between the outfielders.

For any batted ball vector (s,v,h), a weighted sum of the event probabilities can be used to compute the expected batting average, slugging percentage, and wOBA. Continuing with the example in the previous plot gives

In this work, we use the wOBA statistic due to its correlation with run value. If we consider all possible batted ball vectors, we obtain a wOBA cube that defines the mapping from (s,v,h) to expected wOBA

The wOBA cube has a significant dependence on the handedness of the batter due to changes in fielder positioning which leads us to build separate wOBA cubes for left and right-handed batters. A batted ball with parameter vector (s,v,h) is assigned an intrinsic value given by the corresponding wOBA cube value. Batted balls may also be assigned an observed value given by the wOBA coefficient for the outcome of the batted ball.

We denote the average of a player’s intrinsic values on batted balls as I and the average of his observed wOBA on batted balls as O. The statistic O has also been referred to as “wOBA on contact” or wOBAcon. Similarly, we can compute I and O for the batted balls allowed by a pitcher. While I provides a context-invariant measure of contact quality, O depends on several factors that are beyond the control of the batter and the pitcher such as fielding, the weather, and the ballpark.

Reliability

The reliability of a statistic is central to its value for forecasting. Reliability is defined as the ratio of the variance of true talent across players for the statistic to the variance of the observed values across players. While the observed values for player statistics are displayed prominently on the scoreboard every night, a player’s true talent level is unknowable to us mortals. Fortunately, people have invented creative ways to estimate reliability which we will use in this work.

Let’s review some important properties of reliability. The observed value of a player statistic is influenced by both the player’s true talent and sources of random variation such as whether a bad hop led to a single or a strong wind helped a home run. As a result, the variance of observed values tends to be larger than the variance of true talent and reliability values are typically less than one.

Sample size matters. When we use a larger sample, the variance of the observed values tends to decrease. We know, for example, that there is a larger spread in player batting averages after one week of the season than at the All-Star break. Reliability increases with sample size.

A Hardball Times Update

by RJ McDaniel

Goodbye for now.

Statistics are not equally reliable. Different statistics have different spreads of true talent and some statistics are more susceptible to random variation in their observed values than others. Thus, different statistics can have a different reliability even if the same sample size is used.

Russell Carleton introduced reliability analysis to the baseball world. His results showed that statistics for batters and pitchers that depend on the outcome of batted balls tend to have a lower reliability for a given sample size than statistics that do not. This occurs in part because variables such as the texture of the infield grass and the ambient weather conditions contribute variation to the observed value of statistics like batting average that depend on batted ball outcomes.

Carleton’s results have motivated the search for outcome-independent batted ball statistics that can provide a given reliability using smaller samples. Other things equal, smaller samples are preferred since larger samples, even if they are available, increase the likelihood of meaningful changes in the talent level of a player within the sample.

Estimating Reliability

Suppose that we are given a data set that contains information about N batted balls for each of P players where the true talent level of players is assumed constant over the data. Split-half methods are a popular way to estimate reliability. These methods partition the data into two halves where each half includes N/2 batted balls for each player. For a statistic S, let x_j denote the value of S for player j over his N/2 batted balls in Half 1 and let y_j denote the value of S for his N/2 batted balls in Half 2. The correlation coefficient r for the P points (x_j,y_j) is a split-half correlation for the data set and is an estimate for the reliability R(N/2) of statistic S for size N/2.

A limitation of using split-half methods is that the estimated R(N/2) can change depending on how the data is partitioned into halves. An alternative approach is to compute Cronbach’s alpha which is an estimate of reliability that does not require partitioning the data. Cronbach’s α gives an estimate of R(N) that is an approximation to the average of all possible split-half correlations that would be computed from a full data set with 2N batted balls for each player. Details on the computation of Cronbach’s α and other methods that we use in this work are available in an Appendix.

We used Cronbach’s α to estimate reliability for the I and O statistics for batters. The analysis considered the 92 players who hit at least 400 batted balls that were tracked by HITf/x in 2014. For values of N ranging from 50 to 400 we computed α(N) for I and O using the first N tracked batted balls of 2014 for each of the 92 batters. The results are

By using measurements at contact, I is immune to the randomness inherent in batted ball outcomes. This contributes to the higher reliabilities for the intrinsic I statistic than for the outcome-based O statistic.

An α(N) curve is often summarized by the value of N for which the estimated reliability crosses 0.5. This value has an important interpretation, which we’ll discuss later, in the context of using reliability values during forecasting. In this case, α(N) reaches 0.5 at 107 batted balls for I and at 248 batted balls for O. This means that a split-half correlation for a batter’s I with 107 batted balls in each half is expected to give a correlation of 0.5.

We repeated the analysis for the 112 pitchers who allowed at least 400 batted balls that were tracked by HITf/x in 2014. Following McCracken’s revolutionary postulation that pitchers have little control over the outcome of batted balls that are not home runs, several researchers have shown that pitchers do have some influence on the expected result of balls in play. Nevertheless, there is a smaller variance in the ability to control contact across pitchers than across batters. Correspondingly, we see smaller α(N) values for the batted ball statistics for pitchers

The estimated reliability values are still relatively small for pitchers at N=400 where we run out of data. Extrapolation is a tricky business and there is no substitute for more data in situations like this. Nevertheless, we used a technique described in the Appendix that applies the Spearman-Brown prophecy formula to a set of points with the largest N values to extend the curves. The result is that the extrapolated R(N) reaches 0.5 at 838 batted balls for I and at 1,268 batted balls for O. This means that a split-half correlation for a pitcher’s I with 838 batted balls in each half is expected to give a correlation of 0.5.

What is the Variance of True Talent in Quality of Contact?

α(N) can be used to calculate the spread of true talent in intrinsic quality of contact for batters and pitchers. We have known that batters have more control over contact, but the new measurement technologies allow us to assign precise values to the variances. The estimated standard deviations in true talent for the I statistic are 35 wOBA points over batters with at least 400 batted balls and 14 wOBA points over pitchers with at least 400 batted balls. Thus, batters have a standard deviation in true talent for controlling intrinsic quality of contact that is 2.5 times larger than for pitchers.

If you’ve made it this far, you’re probably fluent in wOBA. But if you’d like a translation, we can convert wOBA differences to runs by dividing by the wOBA scale factor and multiplying by the number of batted balls. The result is that one standard deviation in true talent for controlling intrinsic quality of contact equals 10.7 runs for batters and 4.3 runs for pitchers per 400 batted balls.

Forecasting

Let’s see how the new statistics can improve the accuracy of forecasts. A goal of forecasting is to use the measured value of a statistic for a player over a set of data to predict his performance for the statistic over unseen data. Suppose that we use N batted balls for each of P players to compute a split-half correlation for statistic S as described previously. For example, if we split the N=400 batted balls for the 92 batters in 2014 into the 200 even-numbered and 200 odd-numbered batted balls and compute the I statistic for each batter for each half we obtain the 92 (x_j,y_j) points

If these points lie exactly on a line, then we have a correlation coefficient of r=1 which would allow perfect prediction of a player’s y_j from his x_j. The points in our plot have a correlation coefficient of r = 0.630 which, as we’ve pointed out, is an approximation to the reliability R(200).

We can generate a prediction for y_j from x_jby using a linear regression model. The regression line shown above in red minimizes the sum of the square of the differences e_jbetween the predicted y_j on the line for each x_jand the actual y_j. Therefore, we can define the best prediction for y_j given x_jas the height of the line when x= x_j.

The standard deviation of the differences e_jis a measure of the typical error in the prediction and is given by

where σ_y is the standard deviation of the y_j values and r is the correlation coefficient. This is equivalent to the well-known result that r² is the fraction of the total variance that is accounted for by the model.

The goal is small errors. We see that decreasing σ_y and increasing r both serve to decrease the error σ_e. As we might expect, σ_yis smaller for the I statistic than the O statistic for both batters and pitchers. We have also seen that α(N), which is the expected r over all split-half partitions of the data, is larger for I than for O once we have enough batted balls. Both of these factors contribute to smaller prediction errors for the I statistic. We show in the Appendix that σ_e is considerably smaller for I than for O as a function of N for both batters and pitchers. For the plot shown above, the standard deviation of the prediction errors is σ_e = 36 wOBA points.

If we repeat the process for the same 400 batted balls for the O statistic we obtain

These points are scattered farther from the regression line with a reduced correlation coefficient of r=0.325 due largely to noise introduced by the dependence of O on batted ball outcomes. The predictions for y_jare significantly less accurate for this case with σ_e of 54 wOBA points.

Regression to the Mean

Regression to the mean is a standard computation used by forecasting systems. Under the often reasonable assumption that the means and variances of the x_j and y_j are the same, the equation for the regression line can be written

where α(N) is Cronbach’s reliability estimate for a sample of size N and μ is the shared mean of the x_j and y_j. Thus, the predicted y_j is a weighted average of the observed x_jand the mean μ where the weighting depends on the reliability estimate α(N) for the statistic.

As the reliability α(N) increases, we place more faith in the observed x_j and regress less in the direction of the mean μ. When α(N) reaches 0.5, the observed x_j and the mean μ are weighted equally in the prediction. We estimated that for the I statistic this crossover point occurs at 107 batted balls for batters and at 838 batted balls for pitchers. For the O statistic, the crossover points occur later at 248 and 1,268 batted balls respectively.

Conclusion

We have analyzed the use of intrinsic batted ball statistics for forecasting. These statistics use HITf/x data to separate the value of a batted ball at contact from confounding factors such as the defense, weather and ballpark that can affect its observed outcome. These factors, in addition to contributing random variation to outcome-based statistics, can also contribute systematic bias, e.g. good fielders lowering a pitcher’s O statistic, which reduces the utility of outcome-based measures for player evaluation.

Our analysis considered the 92 batters and 112 pitchers with at least 400 batted balls that were tracked by HITf/x in 2014. We showed that the ability to control intrinsic quality of contact is a skill with a standard deviation of true talent of 35 wOBA points (or 10.7 runs per 400 batted balls) for batters and 14 wOBA points (or 4.3 runs per 400 batted balls) for pitchers. We estimated that the intrinsic batted ball statistic I reaches a reliability of 0.5 at 107 batted balls for batters and at 838 batted balls for pitchers. The corresponding values for the outcome-based O statistic are larger at 248 and 1268. The study indicates that we can predict a player’s results for I with more accuracy than for O from a given sample of data.

The intrinsic batted ball statistics have the additional advantage of separating components of a player’s value that are intermingled using outcome-based batted ball statistics. Outcome-based batter descriptors such as the O measure, for example, are influenced by a player’s running speed in addition to his batting ability since faster runners are more likely to beat out infield hits or stretch singles into doubles. With the new approach, a model for a player’s offensive value can include a statistic that captures the intrinsic value of his batted balls and another statistic that captures his running speed. The generation of separate statistics to measure distinct skills benefits a forecasting system because these statistics may be regressed and projected individually using their distinct reliability values and aging curves.

While the methods developed in this work allow generation of context-invariant player models and projections, they also enable new ways to incorporate context into forecasts. Given the granularity of HITf/x and Statcast data and the capability of density estimation techniques, the methods could be adopted to design defensive metrics and ballpark models that are a function of the batted ball vector (s,v,h). This would allow a player’s collection of batted balls to be translated to a new environment. A forecasting system equipped with these models could accurately predict, for example, how a given batter might perform in a new ballpark or how a given pitcher might benefit from an improved infield defense.

Acknowledgment

I am grateful to Sportvision and MLB Advanced Media for providing the HITf/x data which made this work possible. I also thank Alan Nathan and Tom Tango for their help. I am happy to acknowledge the assistance of Qi Shi in the preparation of this document.

References & Resources

R. Arthur. (Apr. 13, 2016). The new science of hitting.
B. Baumer and A. Zimbalist. The Sabermetric Revolution: Assessing the growth of analytics in baseball. University of Pennsylvania Press, Philadelphia, 2014.
J.C. Bradbury. (May 24, 2005). Another look at DIPS.
J.C. Bradbury. Peak athletic performance and ageing: Evidence from baseball. Journal of Sports Sciences, 27(6):599–610, 2009.
W. Brown. Some experimental results in the correlation of mental abilities. British Journal of Psychology, 3(3):296–322, October 1910.
R. Carleton. (Apr. 20, 2011). 525,600 minutes: how do you measure a player in a year?
R. Carleton. (July 16, 2012). It’s a small sample size after all.
R. Carleton. (May 9, 2013). Should I worry about my favorite pitcher?
B. Cartwright. What ground balls can tell us about fly balls. The Hardball Times Baseball Annual, 2012, pages 249–254. ACTA Sports, Chicago, 2011.
B. Cartwright. Solving DIPS by deconstructing BABIP. SABR Analytics Conference Phoenix, AZ, March 2016.
L. Cronbach. Coefficient alpha and the internal structure of tests. Psychometrika, 16(3):297–334, 1951.
B. Efron and C. Morris. Stein’s paradox in statistics. Scientific American, 236(5):119–127, 1977.
M. Fast. (Nov. 16, 2011). Who controls how hard the ball is hit?
M. Fast. (Nov. 22, 2011). How does quality of contact relate to BABIP?
G. Healey. (Mar. 17, 2016). The intrinsic value of a batted ball.
G. Healey. Appendix to The Reliability of Intrinsic Batted Ball Statistics, 2016.
P. Jensen. (Jun. 30, 2009). Using HITf/x to measure skill.
M. Lichtman. (Feb. 29, 2004). DIPS revisited.
V. McCracken. (Jan. 23, 2001). Pitching and defense: How much control do hurlers have?
A. Nathan. (Apr. 6, 2016). Going deep on goin’ deep.
A. Nathan. (Dec. 24, 2015). Optimizing the swing, part deux: Paying homage to Teddy Ballgame.
N. Silver. Why was Kevin Maas a bust? Baseball Between the Numbers, pages 253–271. Basic Books, New York, 2006.
C. Spearman. Correlation calculated from faulty data. British Journal of Psychology, 3(3):271–295, October 1910.
M. Swartz. (Dec. 15, 2010). Ground-ballers: better than you think.
M. Swartz. (Mar. 17, 2010). Why SIERA doesn’t throw BABIP out with the bathwater.
T. Tango, M. Lichtman, and A. Dolphin. The Book: Playing the Percentages in Baseball. Potomac Books, Dulles, Va., 2007.
T. Tippett. (Jul. 21, 2003). Can pitchers prevent hits on balls in play?
wOBA and FIP constants.
R. Zeller and E. Carmines. Measurement in the Social Sciences: The Link Between Theory and Data. Cambridge University Press, 1980.

6 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Liam

8 years ago

Love the simplicity of the probability distribution in the beginning with the horizontal launch angle as the variable component and the multiple result types being shown at once.

I would love to see a model that takes into account both vertical (y-axis) and horizontal (x-axis) launch angle keeping batted ball velocity constant and seeing how the probability of an out, single, double, etc. (z-axis) changes. Doing this for initial velocities of everything from, say 60 mph to 115 mph in increments of 5 mph would be very enlightening.

Then of course you could adjust for ball park to create an “ideal” launch angle and velocity for a given result. For example a scorcher of a line drive with Vo=115 mph with a low vertical launch angle and a near -45 degree horizontal launch angle will never leave the park at Fenway, but might result in a home run at other fields at a rate of maybe as high as 40%

Thanks for the data and great article,

LAF

Russell A Carleton

This is the most important (baseball-related) article that I have read all year.

Alan Nathan

Reply to Russell A Carleton

Hope you make it to Saberseminar. Glenn is on the speaking schedule.

not Bill Bean

For your odd/even charts, I’d expect a regression line with a slope closer to one. Also, what is the mean wOBA for your hitter and pitcher samples?

Janice

Interesting topics, thanks.

Lana

It is just a perfect topic for my sports essay. I was about to buy college research papers, but found your article, and now I have a muse t write it by myself. Thank you for that!

BAL	CHW	ATH
BOS	CLE	HOU
NYY	DET	LAA
TBR	KCR	SEA
TOR	MIN	TEX

ATL	CHC	ARI
MIA	CIN	COL
NYM	MIL	LAD
PHI	PIT	SDP
WSN	STL	SFG