Using HITf/x to measure skill by Peter Jensen June 30, 2009 Ever watch a ballgame and see three fielders converge on a pop fly before it ends up dropping for a base hit? Did you think that batter didn’t deserve a hit? Or perhaps the second baseman dove to the shortstop side of second base to catch a screaming line drive and your first thought was “that hitter was robbed.” Well HITf/x was designed for you. Because we now can have measures of a hitter’s or a pitcher’s ability based not on the vagaries of the plays that the fielders did or did not make, but on the quality of the batter’s hit ball. What is HITf/x and how can it help? HITf/x uses the same camera-based technology and video footage that Sportvision uses with PITCHf/x to give accurate pitch speed and flight path information for MLB’s Gameday. As a result, the system is able to provide the same accuracy for the hit ball speed and the initial parameters of the ball’s flight path (the vertical and horizontal angles of the ball as it leaves the bat), as PITCHf/x provides for a pitched ball. I know that some of you are disappointed that HITf/x cannot also tell us accurately where the ball eventually lands and how long it takes to get there, but that would have required additional cameras to cover the entire field. Although coverage like that eventually will happen, the additional information that HITf/x can give us without additional cameras is very useful. Another benefit is that HITf/x can be calculated from the existing video already captured for PITCHf/x analysis over the last two and a half years, so we will have a usable database of information much more quickly. Now, we have HITf/x data only for most of the games in April of 2009, but that is enough to demonstrate its power. The act of batting is a contest between the batter and the pitcher. The batter wants the outcome of the plate appearance to improve his team’s chances of winning and the pitcher wants the opposite. In most cases, producing the most wins for the batter’s team means maximizing the number of runs his team will score during the inning. This is what Runs Created, Linear Weights and the other advanced batting metrics estimate. Until, almost all of these metrics have used the run value of the event outcome—out, single, double, triple, home run, walk, etc.—to determine a hitter’s offensive contribution. But, as we have long known and as I demonstrated above, the event outcome is not always the best measure of a player’s skill. The data from HITf/x, including speed off the bat (SOB), vertical angle (VA), and horizontal angle (HA), give us a better method of describing the skill component of the hit ball outcomes of a batter’s plate appearance than event outcomes. Harry Pavlidis recently took an early look at some of these parameters. The formula The methodology for a skill-based batting metric is relatively simple: Use the usual linear weight values for the non-hit ball events—strikeouts, non-intentional walks. intentional walks and hit-by-pitches—but substitute the average outcome of a hit ball described by its SOB, VA and HA for all hit ball events. I call this metric SDBR, Skill Dependent Batting Runs. The formula is: SDBR = K_LW + NIBB_LW + IBB_LW + HBP_LW + HIT_BALL_FX_LW For the period 2005-2008, K_LW = -.29, NIBB_LW = .32, HBP_LW =.34. The value that has been usually given for an intentional walk has been .17 runs. Here I have calculated the IBB value by a different method (see Valuing the Intentional Walk) to give a value of .09 runs that more accurately reflects the average number of runs that will score after an intentional walk. The HIT_BALL_FX_LW was calculated by dividing the 14,625 non-bunt hit balls in the HITf/x Database into 198 different bins based on each hit ball’s SOB, VA, and HA. {exp:list_maker}For speed off the bat, I defined the bins as 5 mph increments from 80 mph to 100 mph plus a bin for all balls hit less than 80 mph and another for all balls hit above 100 mph. For vertical angles I used 5 degree increments from -5 degrees to 40 degrees plus a bin for less than -5 and another for more than 40. For horizontal angles I used three bins; Pulled, Center and Opposite. {/exp:list_maker} These are obviously arbitrary decisions. There will always be a tradeoff between having too many bins that may be measuring random variations rather than real data differences, and too few bins that miss statistically meaningful differences. Probably the most controversial decision I made is using only three bins for the horizontal angle. This may underestimate the ability of some batters to control the horizontal angle of their hit balls on the pulled side because certain batters may be able to direct their hit balls into gaps. When we have more HITf/x data and if further research proves this to be true, then it may be necessary to incorporate more horizontal angle bins. We also may decide eventually to create separate bins for left-handed and right-handed batters. I opted for a more conservative approach at this time. More HITf/x data also will stabilize the run values for each bin, which are calculated by averaging the linear weight run value of outs, double plays, reached-on-errors, fielder’s choices, infield singles, outfield singles, doubles, triples and home runs that occurred in each bin. The complete HIT_BALL_FX_LW table is included in a spreadsheet you can download at the end of the article. There are only 197 bins because one bin had no hit balls. The results Here’s an example of what you can see in the data. This is the average Linear Weight Value of all hit balls based on the speed of the ball and its horizontal angle off the bat. Speed Pull Center Opposite TOT <80 -0.14 -0.14 -0.10 -0.13 80 - 85 -0.05 -0.12 -0.09 -0.09 85 - 90 -0.03 -0.11 -0.07 -0.07 90 - 95 0.14 -0.03 0.00 0.04 95 - 100 0.27 0.11 0.18 0.18 >100 0.53 0.44 0.36 0.47 TOT 0.08 0.02 -0.02 0.03 As you can see (look in the total column and row), the value of a hit ball goes up as the speed off the bat increases. Also, pulled hits have more value than hits to the opposite field for all but the most slowly hit balls. Over the course of two or three years, a batter’s SDBR will be very close to his traditional linear weight runs. The reason is that all those “robbed” base hits and the “gimme” base hits will cancel each other out in the larger sample size. The advantage of SDBR is that it should stabilize over a much smaller sample size than LW-based runs—possibly in as few as 200 to 300 plate appearances. We won’t know for sure until we have longer runs of HITf/x data, but if SDBR does stabilize more quickly, then it will provide a much more accurate basis for aging studies and player projections, and it will identify actual changes in a player’s skill level more quickly. Another advantage is that the formula for calculating Skill-Dependent Pitching Runs (SDPR) is exactly the same as for SDBR, at least for starting pitchers. The reason is given in the first sentence of the third paragraph: “The act of batting is a contest between the batter and the pitcher.” When we define the result of that contest in a way that excludes the fielders’ contributions, as SDBR and SDPR do, then the runs that the batter receives when he wins the “contest” are exactly the same as the runs the pitcher loses, and vice versa. You probably recognize the similarity of the SDPR/SDBR formula to the more advanced formulas for pitching value that have been derived from Voros McCracken’s DIPS theory. SDPR values a pitcher’s actual strikeouts, non-intentional walks, intentional walks, and HBPs just like DIPS, FIP, tRA, xFIP, and LIPS do.A Hardball Times Updateby RJ McDanielGoodbye for now. The difference between SDRP and those formulas is in how SDRP values hit balls in play and HRs. SDRP values both by their linear weights determined by their initial Hitf/x parameters. The other formulas use various methods based on event outcomes. The question of whether to give any predictive value to a pitcher’s balls in play has been controversial since Voros introduced the DIPS concept. When more data become available through Hitf/x, SDPR should be able to provide a definitive answer to the controversy. For relief pitchers, SDPR provides a good basis for projected value because it accurately defines a pitcher’s skill in runs. However, to project his overall future value to his team, it is necessary to adjust his SDPR to account for the leverage of the situations in which he will be used. This can be done by multiplying his SDPR by the average leverage value of the role in which he will be used using Tom Tango’s Leverage Index. In closing, here are a few lists made possible by the HITf/x data. These were April’s “luckiest” batters according to the HITf/x data. Note that the list includes some of the best groundball hitters in the majors. First Last Diff Carl Crawford 5.9 Akinori Iwamura 5.8 Kevin Youkilis 5.8 Adam Jones 5.7 Denard Span 5.7 Nyjer Morgan 5.6 Chris Getz 5.1 Jason Kubel 4.8 Chase Utley 4.6 Brad Hawpe 4.5 And here are the “unluckiest” major league batters. First Last Diff Brian Giles -9.7 J.J. Hardy -6.9 Carlos Guillen -6.4 Nelson Cruz -5.8 Grady Sizemore -5.6 Brandon Phillips -5.4 Adrian Beltre -4.7 Yunel Escobar -4.7 Randy Winn -4.7 Russell Martin -4.2 You knew Brian Giles wasn’t that bad, right? Now, onto the pitchers. First up, the “luckiest” hurlers: First Last Diff Tim Wakefield -8.9 Glen Perkins -7.6 Ross Ohlendorf -7.1 John Maine -6.0 Jair Jurrjens -5.9 Joba Chamberlain -5.6 Kevin Millwood -5.6 James Shields -5.6 Koji Uehara -5.6 Zack Greinke -5.3 You can see DIPS theory at work here, as successful major league knuckleballers tend to have more favorable outcomes on batted balls. Finally, the “unlucky” ones. First Last Diff Vicente Padilla 5.4 Brett Myers 4.8 Jon Lester 4.6 Mark Hendrickson 4.3 Tim Lincecum 4.0 Scott Olsen 4.0 Aaron Cook 3.9 Andy Sonnanstine 3.9 Kevin Slowey 3.8 Ian Snell 3.2 I wouldn’t feel right if Ian Snell weren’t on this list. References & ResourcesSDPR and SDBR are included for players in this spreadsheet. The sum of a player’s K, NIBB, IBB and HBP linear weights are shown as his FIP_LW for both pitchers and batters. Also shown is a player’s EVENT_LW minus his HIT_BALL_FX_LW which gives an indication were either lucky or unlucky in April (positive numbers being lucky for batters and unlucky for pitchers). This is an extremely small sample size and has not much more meaning than any other one-month sample. It is shown to illustrate the methodology only. The information here was provided by Sportvision and MLB.com’s Gameday for research purposes only. Sportvision and MLB.com retain all copyright rights.