The Lurking Error in Statcast Pitch Data

Velocity readings at Target Field decreased from 2016 to 2017 more than any other stadium. (via Michael Hicks)

For a decade, the baseball analytics community has been granted the keys to a tremendous resource: the pitch data from MLB.com’s Gameday application. A product of Sportvision’s PITCHf/x camera system through 2016, this repository provided pitch velocity, location and movement readings for the first time. Also included was a nine-parameter fit of acceleration, release point and velocity that enabled the full reconstruction of every pitch’s flight path.

As far forward as this pushed sabermetric research, this wellspring of data wasn’t without pitfalls. Park biases were known to warp the PITCHf/x readings. For an analysis to be unsullied by inconsistent data, offsets needed to be diagnosed and ameliorated.

In 2017, the status quo changed as a new source began generating the pitch data. Major League Baseball Advanced Media transitioned away from Sportvision’s cameras in favor of Trackman’s radar as part of its wider Statcast rollout. If you compare certain tech specs of the two systems, it’s easy to understand what made the switch appealing.

The PITCHf/x cameras began tracking each pitch at 50 feet from home plate (an underestimate of release distance), taking 20 images through the ball’s flight to find a best-fit trajectory. Trackman captures an awe-inspiring thousands of measurements per second, tracing a pitch’s entire journey from the pitcher’s hand to the catcher’s glove. With these cutting-edge radar systems in place at all 30 parks, one would think more precise pitch data would be reported, with fewer of the error issues presented by PITCHf/x.

But a month after the transition, one study showed the opposite. In April 2017, FiveThirtyEight’s Rob Arthur wrote that Statcast was having more trouble than PITCHf/x did accurately determining pitch location and movement. That analysis presented a good early look at the new system’s pitch-tracking problems, but there are more stones to unturn. What was the magnitude of the errors throughout the 2017 season? How accurate were the pitch velocities tracked by Statcast last year? Does a full-year look at the 2017 season reveal smaller biases than the April estimates? And to what extent did offsets change within each park?

Set-Up

Biases in pitch data often are identified via park-to-park differences. Full-year stadium offset estimates are a good starting point, and they’ll be included in this analysis. But we’ll dig deeper, since we know error varies within the parks themselves. Earlier in the PITCHf/x era, Mike Fast found bias figures shift throughout the season. Specifically, his work from 2011 showed that home stand-length error fixes captured the rolling changes to camera calibration while reducing volatile per-game estimates.

Using that takeaway, I calculated park error both per park-year and per park-home stand for the 2015–2017 seasons. That will allow us to juxtapose the accuracy of the 2015–2016 PITCHf/x data against that of 2017 Trackman readings.

With the plan on time frames set, I found a pitcher’s average velocity, location, and dragless movement figures during each home stand. To maintain similar conditions, averages were broken up by batter handedness, ball-strike count, pitch type and temperature. (Values for several of those variables were binned to create more robust comparisons. Details on formulation are in the appendix.)

I matched those subsets with their full-year average baselines across all other major league parks and computed the differences between pairings. Matches needed at least four pitches in both the given subset and corresponding baseline to be included in the full bias calculations. The final error estimates were the average of the differentials, weighted by pitches thrown.

Pitch Error Distributions and Averages

To assess the extent of per-home stand bias, we’ll examine smoothed density distributions of absolute error. Curves are separated by season and are composed of home stands with at least 700 matched pitches. To concurrently take stock of overall trends, the average yearly error is presented with color-coded dashed lines.

First we’ll evaluate velocity. Did the switch to Trackman radar heighten the accuracy of pitch speed readings?

The answer is yes. The red curve’s apex—Statcast’s most common velocity offset—is at 0.07 mph. That diminutive value is less than half of the 2015 and 2016 distributions’ predominant bias figure. Notice also that the Statcast-era curve peaks much higher than the two PITCHf/x-era curves and doesn’t drift as far into elevated error rates.

Whereas the 2015 and 2016 velocity distributions routinely top offsets of 0.5 mph and reach 1.0 mph, home stand bias seldom topped 0.5 mph in 2017 and virtually never eclipsed 0.65 mph. Trackman is also the clear victor on an annual basis; the radar’s average velocity error (at 0.12 mph) is about one-third of the mean values produced by the PITCHf/x cameras.

In contrast with the distinct speed trends, the error rates for horizontal location are practically identical across the PITCHf/x and Trackman systems.

A Hardball Times Update
Goodbye for now.

In every season, the most frequent home stand offset was ~0.35 inches, with error sometimes rolling past 1.5 inches and into three-inch territory. On a yearly basis, I find the average 2015–2017 biases are tightly clustered at a half-inch, which is in line with Sportvision’s own error estimates for its PITCHf/x system. Like Arthur’s April analysis, my full-year look shows Statcast was neither better nor worse than PITCHf/x at reporting horizontal location. Does a six-month window reveal stable error rates for the other component of location as well?

For vertical location, the 2017 distribution is the clear loser of the readings battle. It peaks at a half-inch of error, whereas a quarter-inch was the per-home stand norm in 2015 and 2016. For PITCHf/x, vertical location error almost never topped 2.5 inches; for Statcast, per-home stand biases reach as high as three inches.

Given the disparity in the distributions, it’s no surprise the average annual error rate swelled from 0.42 inches in both of PITCHf/x’s final two seasons to 0.72 inches in Trackman’s inaugural year. Those values mirror the April 2017 estimates from Arthur, so full-year radar data show no improvements in vertical location tracking. And as an important aside, the escalation of error on this front should elicit hesitation in fans and pundits who push for pitch-tracking technology to replace umpires at calling balls and strikes.

Let’s move on to movement, the area where Arthur uncovered Statcast’s greatest pitch-tracking difficulties.

The three distributions here are rather disparate with no prevailing trends. On a positive note for the radar, the 2017 curve’s apex at 0.34 inches rises above nearby high points for the PITCHf/x curves, indicating offsets under an inch became more frequent in 2017. Additionally, error rates of 1–2 inches grew rarer under Trackman.

But the radar’s advantages don’t hold across the entirety of the 2017 curve, as bias levels of two-plus inches became more common. Those heightened error rates hurt Statcast in the annual results, as the average 2017 park bias is 0.85 inches, which is slightly worse than the 2015 and 2016 averages.

The silver lining on Statcast’s annual error rate is that the rise from 2016 (+0.03 inches) is slimmer than the gain found by Arthur in April. This may be a signal that Statcast’s ability to track horizontal movement improved throughout the summer and fall. Does a full-year look also reveal lessened error in vertical break?

The answer is no. Even when evaluating full-year data, there’s a continuation of the vertical movement tracking struggles previously noted by Arthur. The red Statcast distribution is nearly bimodal, as bias in vertical break frequently hits 0.41 and 0.91 inches. The 2017 curve ultimately is much flatter than the pair of camera-tracking curves because vertical movement errors exceeding an inch became much more typical last year. The annualized bias figures don’t present a prettier picture for the radar. Whereas yearly vertical movement error averaged a half-inch during the PITCHf/x era, Trackman’s inception has brought an expansion to 0.79 inches.

Change in Per-Park Error

Let’s change our perspective and pin down how velocity, location and movement offsets have expanded or contracted at the main 30 (well, 31) major league parks. We’ll identify full-year absolute error differences between 2016 PITCHf/x and 2017 Statcast values at each park.

Positive differentials depict bigger biases and poorer pitch-tracking; these results are shaded red in the heat map below. Negative numbers indicate shrunken biases and improved tracking; these figures are colored green. To put all five pitch data points on the same scale, error rates are translated into z-scores—the differentials between a park’s readings and its “elsewhere” baseline, expressed in standard deviations.

Green shading is prominent in the velocity column as Statcast’s apparent speed-reading upgrade materialized across the major leagues. Twenty-two parks had smaller offsets in 2017 than 2016, with five sites (Camden Yards, Progressive Field, the White Sox’ Guaranteed Rate Field, Safeco Field and the Rangers’ Globe Life Park) moving more than half a standard deviation closer to baseline velocities (improvements of ~0.7 mph). Only velocity readings at Target Field worsened by a quarter of a standard deviation.

The location columns are interesting. On one hand, we know from the previous section that since the Trackman transition, average horizontal location error has held steady at a half-inch and vertical location error has risen to nearly three-quarters of an inch. But after dividing the offsets by the standard deviations (SDs), the heat map z-score changes look quite muted.

In each of the past two seasons, the location SDs were very close (about 8.5 inches for horizontal and 9.0 inches for vertical) despite the system changeover. So even within a pitcher/pitch-type/batter-handedness/ball-strike count/temperature subset, pitch locations vary enough to dilute the z-scores on sub-inch errors.

The upshot is that all 60 location z-score differentials fit in a narrow band of +/–0.28. And even though vertical location z-scores worsened at 20 of the 30 stadiums, it’s only by a skinny average addition of 0.08 SDs.

The red and green colors are more vivid for movement because the SDs are just 1.6 inches in both dimensions. In terms of horizontal break, the chart shows large error swings. Readings in four parks moved at least 0.75 SDs closer to baseline values, with Busch Stadium and Tropicana Field improving by 1.0 SD apiece to lead the pack. On the opposite end, five sites worsened by at least 0.5 SD. Notables are Citi Field, Petco Park and AT&T Park, which all own z-scores that rose by at least 0.9 in the transition from PITCHf/x to Statcast.

Last but not least, the heat map corroborates the finding that vertical movement readings were adversely affected by the tracking-system switch. Twenty-one of the 30 parks earn red shading for producing more vertical break error in 2017 than 2016. Of the 21, seven locales had z-scores worsen by at least 0.5, with the biggest standout being Target Field’s additional 1.13 SDs of error.

Concluding Remarks

The big takeaway from this analysis is that the accuracy of vertical location and vertical movement readings have degraded since MLBAM phased out PITCHf/x in favor of Statcast for pitch tracking. On the brighter side, overall precision is nearly unchanged for horizontal location and horizontal movement reporting. And the sunniest outlook comes from velocity, the pitch realm where the Statcast transition has brought the biggest leap forward in accuracy.

With all that in mind, if you’re an analyst who wants to evaluate pitches, how should error be handled? As was necessary in the PITCHf/x era, park biases should be pinpointed and rectified. Certainly, the importance of this endeavor has increased for the two vertical pitch attributes, but that doesn’t mean the others are out of the woods either.

The density distributions affirm offsets can change significantly after road trips, even in the face of unchanged or diminished bias overall. Over some home stands, the radar last season reported velocity drops exceeding 0.5 mph and horizontal movement decreases exceeding two inches. Changes at those levels are enough to raise concern about a pitcher’s health and ability, even though culpability lies solely with faulty data.

Perhaps radar hardware fixes or internal data quality improvements will mitigate pitch error in the coming season. Or maybe Statcast’s troubles will prove inherent to the technology. Either way, it’s advisable for researchers to delve into the data and hunt for offsets before analyzing pitches. It ensures a pitcher evaluation isn’t marred by tracking biases that lurk behind the scenes.

Appendix: Pitch Type Categories and Technical Details

To compare specific pitches to each other, I categorized types as follows:

Pitch Type Bins
Category Types included (with their abbreviations)
Four-seamers Four-seam fastballs (FF), general fastballs (FA)
Sinkers Two-seam fastballs (FT), sinkers (SI)
Cutters Cutters (FC)
Curveballs Curveballs (CU), knucklecurves (KC)
Sliders Sliders (SL)
Offspeed pitches Changeups (CH), splitters (FS), forkballs (FO), screwballs (SC)
Knuckleballs Knuckleballs (KN)

Categorizing pitches was the first step in a procedure that involved more than relabeling, as I made my own pitch type predictions where the judgment of MLBAM’s algorithm appeared off target. I took the velocity and movement components of every pitch and found their z-score distance from the average of each weapon in that pitcher’s arsenal.

My prediction was the pitch type with the smallest combined z-score—meaning it was closest to the typical attributes of that type. I elected to apply my pitch type predictions conservatively—only when they were at least 1.5 standard deviations nearer to typical velocity and movement values than MLBAM’s label. In the end, pitch types were amended for 1.3 percent of pitches across the three years.

Ball-strike counts were grouped by similarity. That means whether it was early in the plate appearance and in a neutral count; whether it was still early but the pitcher started to fall behind; whether the pitcher was ahead and primed to strike the hitter out; whether the pitcher fell far behind and was in trouble; or whether the count was full.

Binning also was used on temperatures, a factor that’s important for the velocity corrections. Fahrenheit readings from Gameday were partitioned into nine-degree bins around a center of 70 degrees.

And lastly, corrections for the 2015 season were broken up by pre- and post-All Star Game to account for changes sparked by the juiced ball–namely, any differences in pitcher approach and the manner in which pitches travel.

References and Resources


Gerald Schifman is the lead researcher at Crain's New York Business and a writer at The Hardball Times. He previously worked in the New York Mets' baseball operations department and in Major League Baseball's publishing department. Follow him on Twitter @gschifman.

Comments are closed.