Fixing Batted-Ball Statistics with Statcast
Baseball analysts love to talk about pitchers’ (and even sometimes hitters’) batted-ball statistics—the percentage of balls in play that are line drives (LD%), fly balls (FB%), and ground balls (GB%). These statistics appear ideal because they seem less noisy than hits, for example, since they aren’t affected by the positioning/ability of the fielders, speed of the runner, ballpark dimensions, etc. In addition to that, they seem to be highly indicative of the outcome of the batted ball. Just look at this graph comparing the batting average of the three batted-ball types at each exit velocity.
Note: All data is from the 2015 season courtesy of Baseball Savant unless otherwise specified.
The curves are distinct, and they change in a seemingly reasonable, consistent manner. Consequently, many have thought batted-ball statistics can lead to further development of Voros McCracken’s DIPS Theory. Perhaps pitchers can control their batted-ball statistics, and that in turn can determine their batting average on balls in play. This is just one of the many theories on the effects of different batted-ball tendencies.
Additionally, this led to the reinforcement of a hierarchy that baseball players, coaches, and fans have recognized without ever looking at a spreadsheet. Line drives are the optimal hit for a batter, followed by fly balls, then ground balls, then pop-ups.
FanGraphs described this hierarchy in the following table used as part of their glossary definition of batted-ball statistics.
Type | AVG | ISO | wOBA |
GB | .239 | .020 | .220 |
LD | .685 | .190 | .684 |
FB | .207 | .378 | .335 |
The problem with all of this is the subjectivity with which batted balls are classified as fly balls or line drives. They are classified by an official scorer for MLB and by a stringer for companies like Baseball Info Solutions (BIS), who do so with the naked eye. Overall, they do a good job, but they’ve fallen victim to the following bias: We as baseball enthusiasts have been trained to believe that line drives are normally hits, which isn’t necessarily wrong. But this assumption has led to scorers/stringers inferring the converse—that hits are normally line drives. Basically, in some instances they’ve stopped measuring the actual batted-ball type to instead measure the outcome—which can lead to huge problems when using batted-ball statistics to glean insights.
Batted-Ball Classifications Don’t Measure Ball-in-Play Type
Let’s look again at that graph. I’ve taken away the groundball line because that’s not going to be as important.
What’s interesting about this graph is how similar the two curves are to one another. The fly balls consistently yield a lower batting average, but they seem to map one another the whole way. This doesn’t necessarily mean that the batted balls were poorly classified, because the lower batting average could be a result of the different flight path. However, when you look at the graph below, you can see there is virtually no difference between the distance fly balls and line drives travel at the majority of exit velocities.
I checked to make sure this wasn’t purely a result of physics—that the trajectory of a ball affects its hit distance. Dr. Alan M. Nathan of the University of Illinois has a fantastic site titled The Physics of Baseball that includes a trajectory calculator. I plugged in different launch angles and exit velocities and found that across all exit velocities, balls hit at a high launch angle (i.e. fly balls) should theoretically land further than balls hit at a low launch angle (i.e. line drives).
One easy way to examine the classification of batted balls from a more quantitative perspective is using launch angle. As part of their explanation of the launch angle statistic, MLB Advanced Media (MLBAM) gave their guidelines for how the two correlate:
Launch Angle | Batted Ball Type |
< 10˚ | Ground Ball |
10˚ – 25˚ | Line Drive |
25˚ – 50˚ | Fly Ball |
> 50˚ | Pop Fly |
This graph shows the number of balls scored as line drives (in red) and as fly balls (in blue) at each Launch Angle with the overlap in purple. The yellow vertical line marks a Launch Angle of 25˚, where batted balls theoretically should turn from line drives to fly balls.
The reason this graph points to flawed classifications is the lack of a clear line where the line drives stop and the fly balls begin. The balance shifts at 26˚, but it’s not at all abrupt. This means a significant portion of batted ball were hit at the same angle–and thus, likely were extremely similar–but were classified differently for no apparent reason other than the discretion of the official scorer. Some overlap between the red and blue would be understandable, but not in this way, where, for the middle three launch angles it’s practically fifty-fifty.
Because launch angle seems like a reasonable tool for classification, let’s see what happens when we reclassify everything based on its launch angle. I used MLBAM’s 25˚ cutoff to reproduce the two graphs we looked at earlier comparing batting averages and hit distances.
These are very different from what we saw using the official scorers’ classifications, so we’re clearly on to something.
The next logical question is how to determine what’s a line drive and what’s a fly ball. Is a simple 25˚ cutoff the most accurate way?
Don’t Use Launch Angle, Use Arc Angle
Let’s start looking at this anecdotally. Below are videos of three different batted balls, and I want you to try to judge if they should be scored as fly balls or line drives.
Regardless of how exactly you classify them, it’s hard to dispute the fact that the three batted balls are quite different from one another. Not just that they travel different distances, but their trajectories are different. What’s practically inarguable is that they are not all as much a line drive or fly ball as the others. The problem with using launch angle and a 25˚ cutoff is that it would regard each of those three batted balls as the same…exactly the same. I don’t just mean that they would all fall below the 25˚ threshold and be labeled line drives (which they would), but they also would be labeled exactly as much a line drive as one another, since their launch angles come in at 23.02˚ for the home run, 23.35˚ for the out to left field, and 23.12˚ for the single up the middle.
With that in mind, I want to give you a little more food for thought. To this point I’ve been a little harsh on the official scorers, but they obviously are quite skilled at what they do. I just think they are given an impossible task. They obviously aren’t just flipping a coin up in the press box (although it can look that way when they call this a line drive). Consequently, their classifications can give valuable insight. Let’s have a look at what batted balls they score as fly balls.
What’s interesting, as I indicated with the black line, is how the minimum (non-outlier) launch angle of balls scored a fly ball steadily decreases as the exit velocity increases.
It’s now clear that we need something more complex than a 25˚ launch angle cutoff to determine if a batted ball is a line drive or a fly ball. That something is arc angle. The vertex of arc angle is the ball’s highest point, and the angle goes from where the ball makes contact with the bat to where the ball lands on the ground.
Let’s have a look at arc angle in action. Remember those three hits from the beginning of this section that launch angle failed to distinguish between? Here’s how they stack up in terms of arc angle.
Hit Number | Arc Angle |
Hit 1 (Out to LF) | 143.95˚ |
Hit 2 (Gallo Moonshot) | 137.57˚ |
Hit 3 (The single) | 147.79˚ |
Below are the trajectory graphs from The Physics of Baseball:
These numbers show a lot of good signs for arc angle. Most importantly, there is a significant difference between the three hits’ arc angles, as there should be since the videos show clearly different trajectories. The calculations also show arc angle is not merely an inflated version of launch angle because the hits, ordered by Llunch angle, went Hit 2, Hit 3, Hit 1; whereas when ordered by arc angle they went Hit 2, Hit 1, Hit 3.
With arc angle, the greater the angle, the more line drive-like. The smaller the angle the more flyball-like (because 180˚ would be perfectly flat, and 0˚ would be a pop-up to the catcher).
So according to arc angle, the single is the most line drive-ish, the out to left field is in the middle, and the home run the most fly ball-ish. Most people I’ve talked to agree that the single is more of a line drive than the out to left field, but I’ve received mixed responses on the home run. You would expect the home run to be the most overwhelmingly viewed as a fly ball because its arc angle indicates as such. I think the main reason why this isn’t the case is because we didn’t get to see its full flight path. The ball was stopped by the outfield bleachers, meaning we only saw its rocket ascent. Had we been able to see it fall back to the ground at a very sharp angle, it likely would have been easier to tell that it’s the most fly ball-like of the group.
The last thing I want to do before I discuss next steps is talk about what arc angle says about official scorers’ classifications and batting average.
One of my justifications for using a classification method more complex than a 25˚ cutoff was a graph that showed how the minimum launch angle at which the official scorers would score a fly ball steadily decreased as the exit velocity increased. Now that we have a measure of how fly ball-like a given batted ball is, we can test this theory—that at a constant launch angle, a ball hit harder has a more fly ball-like trajectory.
In the graph below, each line is a certain launch angle (starting at 10˚ and increasing by 5˚ from top to bottom). What you see is that as the exit velocity increases (left to right), the arc angle steadily decreases, and decreasing arc angle means the ball is more of a fly ball. Or in fewer words, yes, the harder a ball is hit (given a constant launch angle) the more it looks like (and is) a fly ball.
Let’s look at how official scoring compares with arc angles. Here’s a look at the number of batted balls scored a line drive or fly ball at each arc angle.
Two things are clear from this graph. First, arc angle is a good measure of if a batted ball is a fly ball or a line drive because at each arc angle above 134˚ the majority of batted balls are line drives, and at arc angles below 134˚ a majority are fly balls. Second, there are, in fact, a lot of fly balls being misclassified as line drives and vice versa. To investigate this, let’s look at the difference between how hits and outs are classified.
The arc angle at which hits start being called fly balls the majority of time is 128˚, while this number is 137˚ for outs, indicating that the batted balls’ outcomes have a pretty clear impact on their classification. All that being said, at this point, it matters less why scorers make certain classification errors more often. What’s more important is that naked eye decisions by official scorers are not accurate enough, and arc angle can solve this problem.
Unfortunately, I don’t have data on which batted balls Baseball Info Solutions calls line drives and which fly balls, so I can’t make a similar comparison. That being said, I’m led to believe they suffer from this same issue for two reasons.
First, the issue here isn’t that the scorers/stringers are inept, it’s that they’re human. FanGraphs even acknowledges this on their glossary page for batted-ball statistics (the data for which they receive from BIS). Second, on that same page they gave what they calculated to be the batting average for line drives and fly balls in play (i.e. excluding home runs) in the 2014 season. My data are from 2015, but batting average on balls in play (BABIP) changes very little from year to year, so I should be able to make a meaningful comparison using my own data and an arc angle cut point.
The batting average using an arc angle cut point depends (obviously) on what that cut point is. I tested 128˚ (the apparent cut point based on the graph of hits), 137˚ (the apparent cut point based on the graph of outs), and 135˚ (the apparent cut point based on the graph using all data).
Arc Angle Cut Point | Line Drives Batting Avg. | Fly Balls Batting Avg. |
Baseball Info Solutions | .685 | .207 |
128˚ | .643 | .042 |
135˚ | .653 | .078 |
137˚ | .658 | .094 |
Across all cut points we see a significant discrepancy between BIS’s batting averages and those found using arc angle. This now makes it abundantly clear that no matter who’s scoring or stringing, the task of classifying batted balls is simply too difficult for the naked eye alone, and it’s time we moved to using arc angle instead.
The whole point of batted-ball statistics is to get less noisy numbers that are still meaningful. So now that we have an objective measure of batted-ball type, let’s look at how it predicts batting average.
The overall upward trend makes sense because the more arc a ball has, the better the chance that a fielder is able to get under it. What might come as a surprise is the decreasing batting average seen across the last twenty or so arc angles. The short answer for why this happens? Those balls were hit so flat they were caught by infielders. (See the graph below.) The long answer? That gets into the classification of ground balls and will have to wait for another time.
Another interesting way to look at how arc angle corresponds with batting average is by breaking arc angle down into its two core variables: launch angle and exit velocity. Below is a contour plot showing how batting average and arc angle depend on launch angle and exit velocity. Each of the curves on the graph represent constant arc angle. The furthest left curve is an arc angle of 170˚, and the values decrease by 10˚ from left to right. I made the middle curve, representing a 140˚ arc angle, blue for convenience.
Where We Go From Here
The most obvious and important next step is to start collecting arc angle data just as we do with stats like launch angle and exit velocity. Once we have accurate arc angles for all batted balls, we can start researching aspects of arc angle. For example, we can see how arc angle correlates with stats like batting average, ISO, wOBA, and the like. If we find strong correlation, we can start looking at the stability of arc angle distributions for both hitters and pitchers, perhaps ultimately using arc angle for evaluation the way batted-ball statistics are currently.
Speaking of these classifications, we’ll also need to decide whether arc angle should be used to classify batted balls into the traditional groundball, line drive, and flyball categories. On the one hand, this could be accomplished fairly easily by looking at video from measured batted balls and determining a cut point. On the other, perhaps having just three categories is overly simplistic. In an article for Beyond the Box Score, Chris Moran pointed out that “there are plenty of in-between hit types, and smashing all batted balls into one of three categories inevitably loses something in translation,” citing a “ground ball” by Bryce Harper that went to the right field fence as an example. (Click on the link above to see the video.)
Amidst all these questions, what’s clear is that arc angle should give us stronger descriptive data that hopefully are predictive as well. What it will show us, nobody knows…yet.
References & Resources
- I could not possibly end this piece without thanking Professor Alan M. Nathan and Daren Willman. Without the help of data and trajectory graphs from The Physics of Baseball and Baseball Savant as well as comments and suggestions from Professor Nathan himself, this study would not have been nearly as complete or as precise as it is.
Outstanding work. Hey, Dewan, you paying attention?
Nice work, Micah. You mention toward the end “Another interesting way to look at how arc angle corresponds to batting average is to look is by breaking arc angle down to its two core variables: launch angle and exit velocity.” That breakdown was something that was on my mind soon after you introduced arc angle for *classifying* batted balls. Aren’t those the only two variables affecting arc angle? And if so, aren’t they easier to measure than arc angle, since arc angle requires the launch and landing locations, the former being a couple feet off the ground and the latter sometimes being obscured by catches, the wall, or home run bleachers?
Granted, using this information to predict batting average would kind of make the classification a moot point, but I would then think that several other variables come into play (batted ball direction, ballpark dimensions, etc.). Still, I think classification has its uses in discussions and sports articles, much like “statistically significant” is spoken of instead specific p-values.
Adam…Micah shows the relationship between arc angle and the combination of exit speed and launch angle in the article. The curves come from a calculation using my Trajectory Calculator, with “typical” input parameters for drag and lift, both of which depend on air density. If the actual parameters differ from the typical ones (and they often do), then the arc angle will come out different. I personally have not done a sensitivity study to figure out how much variation in the arc angle might be expected. It would be a good study to do. On the other hand, Statcast/Trackman has the ability to tell us all the information needed to find the arc angle, since the full trajectory is measured (usually). However, that information is not publicly available.
This is great, definitely shows that current classifications are inherently biased (to a large degree).
But I’m not convinced that arc angle is a better replacement. Essentially, arc angle depends inherently on the actual trajectory of a batted ball, and as Professor Nathan points out in an above comment, there are far more variables that go into the trajectory and thus arc angle than just launch angle and exit velocity (atmospheric density, ball spin, wind, air currents due to the turbulent nature of a ballpark, etc). For example, I bet you would find that arc angles are biased lower in Coors Park than in others (at least in the “center” of the data which is most common and relevant), and so I would argue that arc angle isn’t really a better replacement to human classification (completely different sort of measurement error, but a measurement error all the same).
The whole idea of using batted ball statistics is to look at the hitter’s “contact quality”(http://www.fangraphs.com/blogs/team-ball-in-play-analysis-an-overview/) — and in an ideal world, the contact quality is determined as soon as the ball is no longer in contact with the bat, independent of the *actual* trajectory and fielding outcome.
I suppose there’s really two different factors involved. Contact quality numbers should be actual-trajectory-independent, but BIP classification should be actual-trajectory-dependent, since that’s what the fielder has to deal with. If we’re looking at hitters and how well they’re hitting, we should look only at launch angle and speed, and use something like the Trajectory Calculator to get a normalized trajectory/arc angle to classify the contact, while the BIP classification for play by plays should indeed use the actual arc angle.
Oh god the avatar.
Please don’t use email addresses to search for avatars/other personal information…. no reason whatsoever to do so, or at least no reason to display such things next to my comment.
Interesting thoughts. I agree that arc angle mops up a lot of variables including some that the hitter cannot control. That being said a baseball field is just about the furthest thing from a vacuum and I think we need to just embrace that. Basically all measurements whether we’re talking about launch angle, exit velocity, or pitch speed are what did happened as opposed to what would have happened. Any adjustment for conditions typically happens retrospectively (like with wRC+).
This article seems like a backward idea to me. Finding a better way to classify line drives and fly balls using Statcast? The whole idea of Hit Fx, Field Fx and now Statcast was to have a way to describe hit balls numerically without grouping them in the existing line drive, fly ball, pop up arbitrary and subject to bias classifications. The only useful classification is that of ground ball and its definition needs to be strictly defined as a ball that hits the ground before it passes the nearest infielder.
There will always be a place for the terms ground ball, line drive, fly ball and popup in baseball when fans get together to describe the game winning hit they have just seen or an announcer describes the spectacular diving catch that ended a rally. But those descriptions should remain subjective. For baseball analysts those terms are already passe and will not likely have any use in the future. And, for the record, the MLBAM stringer (or the BIS video observer) makes the hit ball classifications not the official scorer.
Yes, as I point out at the end three categories is overly simplistic but
1. If you’re going to use GB%/LD%/FB% (which many still do) they might as well be accurate.
2. Trajectory deserves the same granular make-over that other facets of the game have received (route efficiency, exit velocity, max speed, etc.) and arc angle should be a good vehicle to do that. This would allow for the quantification of pitchers’ tendencies to give up different types of hits (as Tango/Lichtman/Dolphin were trying to do way back in ’07 in The Book by dividing pitchers into ground ball and fly ball groups). It could also give info on Ichiro-like players’ ability to slap hits, and more.
This piece started out as a look at a seemingly outdated piece of data (classifications) but arc angle’s ability to fix that is just the tip of the iceberg.
Trajectory deserves the same granular make-over that other facets of the game have received (route efficiency, exit velocity, max speed, etc.) and arc angle should be a good vehicle to do that.
The things that you mention above are all quantities that Statcast measures directly. Arc angle is not in the way you have calculated it using Alan Nathan;s trajectory program. Statcast does measure max ball height and hang time and when max height is releases to the public you could calculate an arc angle from that. But you could also calculate ratio of hit ball distance to hit ball max height which has previously been proposed as a non subjective definition for line drives and fly ball differentiation and is a much simpler concept. And when hang time becomes public another simple metric would be distance divided by hang time. I am pretty sure distance divided by hang time is already being used by BIS to differentiate between liners, fliners, and fly balls.
natha
Right on. The numbers I present in the piece should be very close, but from the beginning my plan was to calculate arc angle using the point actual three points that make the angle (point of contact, apex, point of “landing”) where landing is the same height as the height at contact
Hence my second-to-last paragraph “The most obvious and important next step is to start collecting arc angle data just as we do with stats like launch angle and exit velocity.”
Then comes me, that guy who understands, but wants the data. I’m not in line with the thinking above I’m 100% interested in your remark to a possible BA correlation.
yes, a more accurate “LD” allows us to only have to use LD/GB/FB instead of searching Statcast.
Most sites give the BIP classifications but going forward, possibly, they will be based on arc angle verse official scorer.
I came looking for an exit velo to wOBA table, but this was very informative and better yet, well written. Thank you