Improving Projections with Exit Velocity by William Sapolsky May 2, 2016 When the topic is exit velocity, the conversation begins with Giancarlo Stanton. (via Arturo Pardavila III) Batted-ball exit velocity is all the rage these days. With MLB’s rollout of Statcast last year, exit velocity and other batted-ball metrics have become a part of the casual baseball fan’s vocabulary. Quite simply, it’s cool to be able to see how fast Giancarlo Stanton’s laser-shot home runs leave the bat. And as a sabermetrics geek, batted-ball data appears to be the path forward in the everlasting quest for more accurate player analysis. For the past seven months, Jared Cross and I have been working on a project to dig deeper into the Statcast data and put it to use. As the first fruits of our labor, we are releasing estimates of players’ average exit velocities for the 2012 through 2014 seasons. In addition, we are making a first attempt at adjusting players’ 2016 Steamer projections based on their 2015 average exit velocities. Not only are we excited by the potential uses of exit velocity information, but we believe its introduction signifies the beginning of a new era in baseball projections. Fundamentally, building an accurate player projection system for hitters is about identifying skill and filtering out luck. In baseball, of course, there’s a lot of luck involved. We’ve all seen the scorching line drive that’s caught for an out or the soft dribbler that rolls down the third base line for a double. The key here is to focus on the process of hitting and not the outcome, because that is what the hitter can control. We care not about whether a player ends with a hit, but whether he is carrying out the process that is conducive to good hitting. That is, hitting the ball hard and hitting it squarely. By looking at batted-ball data, we can figure out just that. Batted-ball metrics are defense-independent, and they let us filter out the pesky batted-ball luck we traditionally regress hitters’ BABIP to counteract. We think we also can use them to better predict a hitter’s home run power. As you probably know, these kinds of luck work themselves out as the sample size of the data gets larger. But home run rate only stabilizes after 170 plate appearances or so. BABIP takes a whopping 820 balls in play to stabilize. So traditional outcome-stat-based projection systems need to use a large sample size of data for accuracy. What this means is that the projections for hitters who have had only a few plate appearances or whose skills have changed recently are not going to be very well informed. The goal of implementing batted-ball data is to reduce this minimum sample size necessary to make an accurate estimate of a player’s ability. While Statcast appears to be the future of batted-ball data and can be credited for making our research possible, the system in its current form is not without its flaws. Probably the most publicized issue with Statcast has been gaps in the data—batted balls on which the exit velocity, for whatever reason, is not recorded. It has been found that this problem does not occur randomly, and some types of batted balls are, in fact, more likely to be missed than others. Less well known, but perhaps more troubling, is the number of obviously bogus exit velocity readings scattered throughout the data we have. Take, for example, Noah Syndergaard’s first major league home run last year—Statcast recorded its velocity as 59 mph, which obviously is way off. Combine these bugs/holes with the fact that we only have Statcast data for a single full season, and we want to make our data set more robust. By the end of this season, the story could be much different, as MLB Advanced Media is already improving Statcast data. For now, we decided to get creative. Through FanGraphs, we were given access to two other useful pieces of data: batted-ball distance from Baseball Info Solutions (BIS) and batted-ball hang time from Inside Edge. These data has the advantage of having very few missing data points, and luckily, it dates all the way back to 2012. The one disadvantage to using this data is it only exists on line drives and fly balls, not for ground balls. The good news, though, is that we believe exit velocity on ground balls isn’t as valuable. Players’ ground ball exit velocities in the 2015 Statcast data fluctuated more than that of line drives and fly balls. We ran an experiment in which we split the season into two essentially random halves (odd days and even days) and found that a player’s ground ball exit velocity was only half as predictive from half to half as that of line drives and fly balls. Because of this, we thought we could judge players almost as well just using data for line drives and fly balls. EVEN/ODD DAY EXIT VELOCITY CORRELATIONS Group E-GB Vel O-GB Vel E-notGB Vel O-notGB Vel 0.26 0.18 0.68 E-notGB Vel 0.23 0.19 O-GB vel 0.30 Notes: 1. E=even group, O=odd group.2. According to the 2015 Statcast data, ground balls are hit an average six mph softer than non-ground balls, which means we’re overestimating players’ exit velocities by failing to include grounders. How much we are overestimating depends on the individual player’s groundball rate, but for a league-average player, we think we would be overestimating by about two mph. Here is a graph of hang time, distance, and exit velocity on 2015 line drives and fly balls for which we have Statcast. You can see that if you know the distance and the hang time of a batted ball, you can pretty easily estimate what its velocity would be. This being true, we made a model that would spit out an estimate of a batted ball’s velocity from its hang time and distance. Now we would be able to plug in any batted ball dating back to 2012 for which hang time and distance was recorded and estimate its velocity. If you’re interested, you can see the R code we used to create the model here. Note: When creating our model, we removed batted balls for which Statcast data was suspected to be bogus, as identified by contradictions between the ball’s Statcast exit velocity and its BIS “hard”/“medium”/“soft” classification. We were actually pretty impressed by the accuracy of the model. When we plugged in hang times and distances of batted balls from 2015 on which we knew the actual velocity from Statcast, we found our model’s estimates were off by only about two mph on average. And remember how long traditional outcome stats took to stabilize? Using this newly-derived exit velocity data, we were able to determine that a player’s average exit velocity on line drives and fly balls stabilized at roughly 20 batted balls, which would be reached around 50 plate appearances. Most importantly of all, by estimating exit velocity in this way, we were able both to avoid the bugs associated with Statcast and to derive exit velocity data for seasons before Statcast became available. We’re excited to share with you a table of each player’s average exit velocities for seasons 2012 through 2015, adjusted for park effects (it’s important to correct for park effects when estimating velocity this way—a ball hit at a given velocity at Coors Field will travel farther and hang longer than a ball hit at the same velocity at, say, Minute Maid Park): You can also view the whole sheet here. Having this reliable exit velocity data going back to 2012 is pretty awesome in itself, but let’s be clear — the big question is whether it can help us better evaluate players. We had a feeling it might be pretty useful, but we wanted to know for sure, so we compared players’ average velocities to both their actual stat lines and their Steamer projections. We wanted to see not just whether players with higher velocities hit better in general, but if they outperformed their projections. To start off, we ran a regression analysis comparing players’ Weighted On-Base Average (wOBA) to their average exit velocities from the previous season. (We matched up wOBA from 2013, 2014, and 2015 with exit velocity from 2012, 2013, and 2014, respectively. There were 1,028 players in our sample, each with at least 50 fly balls or line drives in the prior year and at least 50 plate appearances in the projected year.) PREDICTED WOBA FROM PRIOR YEAR EXIT VELOCITY Term Coefficient (standard error) p-value Prior year exit velocity 0.0083 (0.0005) 3*10-53 We found that for each mph of exit velocity a hitter is above league average, we can expect him to put up eight additional points of wOBA the next season, which is pretty big. This tells us that players with higher exit velocities are, in fact, hitting better overall. But we also wanted to know if they were outperforming their projections, so we ran a similar regression but included Steamer projections as a variable. PREDICTING WOBA USING STEAMER AND PREVIOUS SEASON EXIT VELOCITY Term Coefficient (standard error) p-value Steamer 0.754 (0.044) < 2 x 10-16 Prior year exit velocity 0.0028 (0.0006) 3.6 x 10-7 Intercept -0.176 When used in conjunction with Steamer, the impact of exit velocity was still significant, both statistically and practically. We can expect a hitter to outperform his Steamer projected wOBA by roughly three points for each mph of previous-season exit velocity. (We found a similar effect when using either ZiPS projections or an average of ZiPS and Steamer.) Seeing the predictive value of previous-season velocity, we decided to make a table of exit velocity-adjusted Steamer wOBA projections for the 2016 season. This could be seen as a sort of “first taste” of what a batted-ball based projection system for hitter stats could look like: You can also view the whole sheet here. While this rough adjustment should indeed be an improvement over using Steamer alone, we do want to caution that it’s probably not the best way to use exit velocity to adjust projections. Look, for example, at two players with similarly high 2015 exit velocities, David Ortiz and Miguel Sano. Ortiz has been an elite hitter for several seasons, over the course of which his exit velocity always has been high. So Steamer, using his stats from those seasons, will give him a projection that’s already reflective of a hitter with high exit velocity. Sano, on the other hand, has played only part of a season in the major leagues, so his high exit velocity won’t be fully cooked into his Steamer projection. Our adjustment would give Sano and Ortiz about the same increase in wOBA, but in reality, Sano probably deserves a bigger bump up than Ortiz. We also want to caution that a hitter may have an outstanding exit velocity but be far from a perfect hitter. A good example is our 2015 velocity leader, Joey Gallo. We certainly would expect Gallo to do better than a similar player with a less-impressive average exit velocity. But it matters not just how hard you hit the ball, but how often you hit it. Gallo had a crazy 46 percent strikeout rate last season, so while we know he can hit the ball hard when he puts it in play, he won’t have many opportunities to put the ball in play unless he can cut down on the strikeouts. Finally, it’s important to pay attention to other batted-ball metrics besides exit velocity. Average vertical launch angle, whether measured precisely in degrees, or coarsely in terms of flyball, line-drive, pop-up and groundball rates, also plays a role. Regardless of their velocity, balls hit straight up into the air or straight down into the ground usually are the result of poor contact with the bat and are unlikely to produce hits. In fact, it has been found that ground balls as a category are the least valuable type of batted ball for run production. This works against a hitter like Pedro Alvarez, who, while he put up an impressive 2015 average exit velocity, had a high 52 percent groundball rate last season. All in all, exit velocity is only one piece in the puzzle of evaluating hitters. It’s a big step forward, though, and there are more advancements on the way. We expect to see an improvement in both the reliability and scope of Statcast from 2016 on, including the addition of vertical launch angle data on all batted balls. And we’re hoping in the near future to release a more comprehensive batted-ball-based player projection system, one that takes into account a range of factors such as launch angle, handedness, defensive shifts, and running speed. So stay tuned—there’s more batted ball fun right around the corner, and we think the best has yet to come. References & Resources “Estimating Exit Velocity from Hang Time and Distance” R code 2012-2015 Avg. Velocity Data Google Sheet Velocity-Adjusted Steamer wOBA Google Sheet Paul Casella, MLB.com, “Statcast primer: Baseball will never be the same” Tony Blengino, FanGraphs, “The Limitations Of The 2015 StatCast Data” Neil Weinberg, FanGraphs, “Why We Care About BABIP” FanGraphs Library, “Quality of Contact Stats” Max Weinstein, The Hardball Times, “Exploring Batted Ball Run Values and Spray”