Using HITf/x to measure skill
Ever watch a ballgame and see three fielders converge on a pop fly before it ends up dropping for a base hit? Did you think that batter didn’t deserve a hit? Or perhaps the second baseman dove to the shortstop side of second base to catch a screaming line drive and your first thought was “that hitter was robbed.” Well HITf/x was designed for you. Because we now can have measures of a hitter’s or a pitcher’s ability based not on the vagaries of the plays that the fielders did or did not make, but on the quality of the batter’s hit ball.
What is HITf/x and how can it help?
HITf/x uses the same camera-based technology and video footage that Sportvision uses with PITCHf/x to give accurate pitch speed and flight path information for MLB’s Gameday. As a result, the system is able to provide the same accuracy for the hit ball speed and the initial parameters of the ball’s flight path (the vertical and horizontal angles of the ball as it leaves the bat), as PITCHf/x provides for a pitched ball.
I know that some of you are disappointed that HITf/x cannot also tell us accurately where the ball eventually lands and how long it takes to get there, but that would have required additional cameras to cover the entire field. Although coverage like that eventually will happen, the additional information that HITf/x can give us without additional cameras is very useful.
Another benefit is that HITf/x can be calculated from the existing video already captured for PITCHf/x analysis over the last two and a half years, so we will have a usable database of information much more quickly. Now, we have HITf/x data only for most of the games in April of 2009, but that is enough to demonstrate its power.
The act of batting is a contest between the batter and the pitcher. The batter wants the outcome of the plate appearance to improve his team’s chances of winning and the pitcher wants the opposite. In most cases, producing the most wins for the batter’s team means maximizing the number of runs his team will score during the inning. This is what Runs Created, Linear Weights and the other advanced batting metrics estimate.
Until, almost all of these metrics have used the run value of the event outcome—out, single, double, triple, home run, walk, etc.—to determine a hitter’s offensive contribution. But, as we have long known and as I demonstrated above, the event outcome is not always the best measure of a player’s skill. The data from HITf/x, including speed off the bat (SOB), vertical angle (VA), and horizontal angle (HA), give us a better method of describing the skill component of the hit ball outcomes of a batter’s plate appearance than event outcomes. Harry Pavlidis recently took an early look at some of these parameters.
The formula
The methodology for a skill-based batting metric is relatively simple: Use the usual linear weight values for the non-hit ball events—strikeouts, non-intentional walks. intentional walks and hit-by-pitches—but substitute the average outcome of a hit ball described by its SOB, VA and HA for all hit ball events. I call this metric SDBR, Skill Dependent Batting Runs. The formula is:
SDBR = K_LW + NIBB_LW + IBB_LW + HBP_LW + HIT_BALL_FX_LW
For the period 2005-2008, K_LW = -.29, NIBB_LW = .32, HBP_LW =.34. The value that has been usually given for an intentional walk has been .17 runs. Here I have calculated the IBB value by a different method (see Valuing the Intentional Walk) to give a value of .09 runs that more accurately reflects the average number of runs that will score after an intentional walk.
The HIT_BALL_FX_LW was calculated by dividing the 14,625 non-bunt hit balls in the HITf/x Database into 198 different bins based on each hit ball’s SOB, VA, and HA.
{exp:list_maker}For speed off the bat, I defined the bins as 5 mph increments from 80 mph to 100 mph plus a bin for all balls hit less than 80 mph and another for all balls hit above 100 mph.
For vertical angles I used 5 degree increments from -5 degrees to 40 degrees plus a bin for less than -5 and another for more than 40.
For horizontal angles I used three bins; Pulled, Center and Opposite. {/exp:list_maker}
These are obviously arbitrary decisions. There will always be a tradeoff between having too many bins that may be measuring random variations rather than real data differences, and too few bins that miss statistically meaningful differences. Probably the most controversial decision I made is using only three bins for the horizontal angle. This may underestimate the ability of some batters to control the horizontal angle of their hit balls on the pulled side because certain batters may be able to direct their hit balls into gaps. When we have more HITf/x data and if further research proves this to be true, then it may be necessary to incorporate more horizontal angle bins.
We also may decide eventually to create separate bins for left-handed and right-handed batters. I opted for a more conservative approach at this time. More HITf/x data also will stabilize the run values for each bin, which are calculated by averaging the linear weight run value of outs, double plays, reached-on-errors, fielder’s choices, infield singles, outfield singles, doubles, triples and home runs that occurred in each bin. The complete HIT_BALL_FX_LW table is included in a spreadsheet you can download at the end of the article. There are only 197 bins because one bin had no hit balls.
The results
Here’s an example of what you can see in the data. This is the average Linear Weight Value of all hit balls based on the speed of the ball and its horizontal angle off the bat.
Speed Pull Center Opposite TOT <80 -0.14 -0.14 -0.10 -0.13 80 - 85 -0.05 -0.12 -0.09 -0.09 85 - 90 -0.03 -0.11 -0.07 -0.07 90 - 95 0.14 -0.03 0.00 0.04 95 - 100 0.27 0.11 0.18 0.18 >100 0.53 0.44 0.36 0.47 TOT 0.08 0.02 -0.02 0.03
As you can see (look in the total column and row), the value of a hit ball goes up as the speed off the bat increases. Also, pulled hits have more value than hits to the opposite field for all but the most slowly hit balls.
Over the course of two or three years, a batter’s SDBR will be very close to his traditional linear weight runs. The reason is that all those “robbed” base hits and the “gimme” base hits will cancel each other out in the larger sample size. The advantage of SDBR is that it should stabilize over a much smaller sample size than LW-based runs—possibly in as few as 200 to 300 plate appearances. We won’t know for sure until we have longer runs of HITf/x data, but if SDBR does stabilize more quickly, then it will provide a much more accurate basis for aging studies and player projections, and it will identify actual changes in a player’s skill level more quickly.
Another advantage is that the formula for calculating Skill-Dependent Pitching Runs (SDPR) is exactly the same as for SDBR, at least for starting pitchers. The reason is given in the first sentence of the third paragraph: “The act of batting is a contest between the batter and the pitcher.” When we define the result of that contest in a way that excludes the fielders’ contributions, as SDBR and SDPR do, then the runs that the batter receives when he wins the “contest” are exactly the same as the runs the pitcher loses, and vice versa.
You probably recognize the similarity of the SDPR/SDBR formula to the more advanced formulas for pitching value that have been derived from Voros McCracken’s DIPS theory. SDPR values a pitcher’s actual strikeouts, non-intentional walks, intentional walks, and HBPs just like DIPS, FIP, tRA, xFIP, and LIPS do.
The difference between SDRP and those formulas is in how SDRP values hit balls in play and HRs. SDRP values both by their linear weights determined by their initial Hitf/x parameters. The other formulas use various methods based on event outcomes. The question of whether to give any predictive value to a pitcher’s balls in play has been controversial since Voros introduced the DIPS concept. When more data become available through Hitf/x, SDPR should be able to provide a definitive answer to the controversy.
For relief pitchers, SDPR provides a good basis for projected value because it accurately defines a pitcher’s skill in runs. However, to project his overall future value to his team, it is necessary to adjust his SDPR to account for the leverage of the situations in which he will be used. This can be done by multiplying his SDPR by the average leverage value of the role in which he will be used using Tom Tango’s Leverage Index.
In closing, here are a few lists made possible by the HITf/x data. These were April’s “luckiest” batters according to the HITf/x data. Note that the list includes some of the best groundball hitters in the majors.
First Last Diff Carl Crawford 5.9 Akinori Iwamura 5.8 Kevin Youkilis 5.8 Adam Jones 5.7 Denard Span 5.7 Nyjer Morgan 5.6 Chris Getz 5.1 Jason Kubel 4.8 Chase Utley 4.6 Brad Hawpe 4.5
And here are the “unluckiest” major league batters.
First Last Diff Brian Giles -9.7 J.J. Hardy -6.9 Carlos Guillen -6.4 Nelson Cruz -5.8 Grady Sizemore -5.6 Brandon Phillips -5.4 Adrian Beltre -4.7 Yunel Escobar -4.7 Randy Winn -4.7 Russell Martin -4.2
You knew Brian Giles wasn’t that bad, right? Now, onto the pitchers. First up, the “luckiest” hurlers:
First Last Diff Tim Wakefield -8.9 Glen Perkins -7.6 Ross Ohlendorf -7.1 John Maine -6.0 Jair Jurrjens -5.9 Joba Chamberlain -5.6 Kevin Millwood -5.6 James Shields -5.6 Koji Uehara -5.6 Zack Greinke -5.3
You can see DIPS theory at work here, as successful major league knuckleballers tend to have more favorable outcomes on batted balls. Finally, the “unlucky” ones.
First Last Diff Vicente Padilla 5.4 Brett Myers 4.8 Jon Lester 4.6 Mark Hendrickson 4.3 Tim Lincecum 4.0 Scott Olsen 4.0 Aaron Cook 3.9 Andy Sonnanstine 3.9 Kevin Slowey 3.8 Ian Snell 3.2
I wouldn’t feel right if Ian Snell weren’t on this list.
References & Resources
SDPR and SDBR are included for players in this spreadsheet. The sum of a player’s K, NIBB, IBB and HBP linear weights are shown as his FIP_LW for both pitchers and batters. Also shown is a player’s EVENT_LW minus his HIT_BALL_FX_LW which gives an indication were either lucky or unlucky in April (positive numbers being lucky for batters and unlucky for pitchers). This is an extremely small sample size and has not much more meaning than any other one-month sample. It is shown to illustrate the methodology only.
The information here was provided by Sportvision and MLB.com’s Gameday for research purposes only. Sportvision and MLB.com retain all copyright rights.
Good stuff Peter.
Could you tell us your HA bins for pulled/center/opposite?
I would think that one key skill component missing from the equation is the batters speed. This might make speedsters luckier in your results since a grounder to the left side may result in a single instead an out as it would for the average player.
JBrew – You are certainly correct that speed is a batter skill that needs to be accounted for. I considered it, but there were several reasons why I decided not to include it in this study. I was concerned about the play you described, the swinging bunt slow roller to the left side that a speedy batter beats out for a hit, but I was more concerned about the extra linear weight that the speedster generates on his hard hit balls by turning singles into doubles and doubles into triples. The reason that is more troubling to me is that a batter is trying to hit the ball hard at a vertical angle that will make it a hit rather than an out. Few batters are trying to hit a slow roller. So incorporating all the advantages of speed would have meant at least doubling the number of bins and there just wasn’t enough data to give reliable linear weights for that many bins.
Second, I don’t know of a good way of defining speed for batters. I think the speed scores that traditionally have been used have flaws and I haven’t studied the issue enough to come up with a better method.
But it is a good idea to include speed when we have enough data and a definition of speed in which we all can feel confident.
I always love it when we work to apply statistical analysis and physics to baseball. I keep waiting for GM’s to use this data when evaluating free agents so they can see who had a great year because of legitimately better play and who happened to fall into a couple of lucky plays that lowered an ERA, added a win or enhanced an RBI total.
Along with Joe Posnanski’s article about the prime years of a player’s career I can this information can be combined to enhance young player scouting techniques.
I’d be interested moving forward to see if there is not a way to analyze a young pitcher and his mechanics to predict injury rates. Measuring arm angle, arm speed, pitch speed, landing distance from the rubber etc. all should allow an overall predictor of who has mechanics that lend themselves to longevity (and hence who should be invested in).
I think a good definition of speed would be time to first base. The only fuzzy parts there are the relative speed of each of the paths (including wind) in different parks which is perhaps negligible on dry days. A separate rating would likely be needed for switch hitters.
Then a study using HITf/x would be able to determine more defined valuations for hitters (and pitchers) once we know the effect of speed on various batted ball types.
Great work. This is very important stuff.
About Tim Wakefield. There were 64 balls in play captured by the system. 52 outs, 9 singles, 2 doubles. The remaining one appears to have been a ROE. I’ll throw out the ROE since I’m not sure how you dealt with it. The AVG/SLG is then .175/.206. That’s unsustainable, even for him. The average for all pitchers is .323/.507. (Remember that this includes HR.)
The reason knuckleballers beat DIPS is that balls tend to be hit more weakly off them. (Right?) If that’s so, HITf/x should capture the effect. Seeing Wakefield at the top of the list makes me fear that the effect isn’t being fully captured. On the other hand, the .175/.206 line is reassuring: maybe he really did get that lucky.
Out of the luckiest hitters, 8 out of 10 are lefties. Of the unluckiest hitters, only 2 are lefties (2 switch hitters as well). Wouldn’t a left-handed batter have an intrinsic speed advantage with this type of analysis? One way to account for speed would be to see if left-handed batters as an aggregate had a higher SDBR and adjust the formula accordingly.
Regarding pulled hits having higher values:
Groundballs tend to be pulled while opposite field hits tend to be flyballs. The angle of the bat through the hitting zone causes this. You used separate bins for speed off the bat but not for launch angle. Grounders turn into base hits (and errors) at a higher rate than flyballs. The value of the opposite field hits is probably reduced due to a greater percentage of flyball outs.
stevebogus – Check the linked spreadsheet in the article. I have bins for every 5 degrees of vertical angle from -5 degrees to 40 degrees plus a bin for below -5 and above 40. And your hypothesis is also wrong. You are correct that a much larger percentage of balls that are pulled are ground balls. But BABIP on pulled ground balls is only .203 where on opposite field GBs it is .281. Overall, BABIP for pulled balls is .283 and for opposite field balls is .282. The greater linear weight value of pulled balls is almost entirely due to the disproportionate number of home runs that are pulled, 5 times the rate of opposite field HRs.