The Physics of RoboUmp

Are human umpires likely to make fewer mistakes than RoboUmps? (via Keith Allison)

“It ain’t nothin’ ‘til I call it.” Bill Klem thus defined the role of the umpire. He would know, since he officiated major league baseball for 37 years and still holds the record for working 18 World Series.

There has been a lot of discussion lately about having balls and strikes called robotically using MLB’s Statcast technology. In response, Commissioner Rob Manfred stated, “In all candor, that technology has a larger margin of error than we see with human umpires” as reported by Patrick Saunders of The Denver Post.

That sounds like an invitation to think about the physics associated with Statcast to understand the potential sources of these errors. First, let’s just be sure we understand the meaning of the “error” as it is used by physicists.

Two Types of Error

In common language, an error is something that can be remedied. In scientific analysis, we call this type of error “systematic error.” In this case, the devices we are using do not report the correct values and, in principle, we can find the problem and correct it.

However, there is another type of error called “random error.” This kind of error is intrinsic to the measurement process. It can be minimized but never completely eliminated. Random error is associated with the fact that measurements are never perfectly reproducible.

Suppose you want to get a pitching machine to fire the ball right down main street. You immediately notice all the pitches are low and outside. You can probably adjust the machine to correct this systematic error. Now, the pitches are right down the middle, but they vary pitch-to-pitch by a few inches up, down, inside, or outside of the center. This random error can only be minimized, perhaps by using a better pitching machine, but it will never be eliminated completely.

There is evidence to suggest Statcast suffers from demonstrably systematic errors. Rob Arthur of FiveThirtyEight wrote about these systematic errors in April of 2017 in “Baseball’s New Pitch-Tracking System Is Just A Bit Outside.” He compared Statcast pitch locations this year with PITCHf/x pitch locations from previous years. He found the average systematic error in horizontal position across major league parks for Statcast was only slightly higher than PITCHf/x – about 0.2 inches. However, the vertical systematic error increased from under half an inch to almost 0.75 inches.

One would suspect Statcast is working to fix these systematic errors, and the current errors should drop over time as happened with PITCHf/x. After all, systematic errors can be fixed with better calibration, data analysis, and measurement techniques.

Let’s go back to the properly adjusted pitching machine to deal with random errors. If one fired thousands of pitches and recorded the number of pitches as a function of their horizontal (x) positions, the result would likely look like the graph below.

The most likely position for a given pitch is right in the middle. However, due to random errors, there is an ever-decreasing chance of the pitch actually turning up farther and farther away from the center. This type of distribution is called a normal distribution, and it is a common way deal with random errors because one can use this curve to estimate the probability of getting any given value (or range of values) for x.

The key parameter describing the normal distribution is the width of the curve. It is related to the standard deviation. The standard deviation for Statcast pitch locations is not publicly available. So, I’ll have to make some estimates.

The Statcast Mistake Rate

The goal here is to use the normal distribution to estimate the mistake rate for ball and strike calls produced by a RoboUmp using Statcast data. So, imagine a pitch that actually crosses the plate at a horizontal position, x, as shown below. We’ll assume Statcast has some standard deviation, so it could report the position of the ball in different locations with probabilities given by the normal distribution.

In the sketch above, x = 0 is the center of home plate. The ball actually crosses the plate at a position x. The position labeled D is the edge of the strike zone that is equal to half the width of the plate plus half the diameter of the ball. The blue curve is the normal distribution of Statcast-reported positions for this event. Note there is some chance this strike will be reported as a ball because the distribution is non-zero at locations greater than D.

Using this idea for every possible actual position of the ball, one can find the probability Statcast will report the pitch incorrectly. Below is a graph of the mistake probability as a function of actual position for a random error standard deviation of 0.25 inches.

A Hardball Times Update
Goodbye for now.

You can see the error probability is 0.5 if the edge of the ball aligns with the edge of home plate (x = D = 9.95 inches). That is, this location is a 50/50 call. Farther from the edge, the probability of a missed call drops and is essentially zero when a pitch lands an inch away from the edge. This curve includes both xD being called a strike.

If pitches were uniformly distributed across the strike zone, the total mistake rate could be found by just adding up these probabilities. However, we know pitchers try to keep the ball near the edges of the plate. If they are actually successful, the total mistake rate should increase because the ball is more often in the mistake-prone region.

The plot above is the probability of Statcast reporting a pitch from July of 2017 in the region between the center of home plate and 16 inches to the catcher’s right. You can see pitchers are only somewhat successful at keeping the ball near the edge of the strike zone. Combining the probability of a given pitch location with the probability of a missed call by Statcast as a function of the random error standard deviation results in the plot below.

I have always heard–but have no verification of the fact–that major league umps are expected to have less than a five percent error rate. I don’t know whether this means five percent of called pitches or five percent of all pitches, but I suspect the former. Anyway, this analysis shows that as long as Statcast has small systematic errors and random errors less than about 0.9 inches, it should be as good as umpires at calling inside or outside pitches.

The Top and Bottom of the Zone

Now we should investigate high and low pitches. Ball and strike calls here are not as cut-and-dried. The horizontal piece of the strike zone is carefully and quantitatively defined by the width of home plate. The vertical strike zone is much more nebulous. The MLB definition of the strike zone states:

“The STRIKE ZONE is that area over home plate the upper limit of which is a horizontal line at the midpoint between the top of the shoulders and the top of the uniform pants, and the lower level is a line at the hollow beneath the kneecap. The Strike Zone shall be determined from the batter’s stance as the batter is prepared to swing at a pitched ball.”

It is accompanied by the sketch below:

This definition leaves plenty of room for interpretation as far as the vertical part of the zone is concerned. Many batters have a straight upward stance and move into a crouch only as they swing. Others start in a deep crouch and become more upright as they unload. Not to mention that the knee cap may be hard to spot if the batter wears loose pants.

PITCHf/x originally used poorly paid “stringers” to sit in a dark room under the stands and manually turn a dial to set the top and bottom of the zone on the video image of the batter. Saunders reports that Statcast uses the previous calls of major league umpires to build a database of the top and bottom of the strike zone for each hitter.

Isn’t that ironic? Until MLB comes up with a machine-comprehensible definition of the top and bottom of the strike zone, machines will need the assistance of humans to define the strike zone for the machines.

Other Issues

One obvious problem is, on occasion, Statcast simply misses a pitch or a hit. Although these incidents seem to be occurring less and less frequently, if it did happen, would the RoboUmp have to declare a “do-over?” Several times during the World Series telecast, the strike zone box disappeared. Of course, we don’t know if that was a Statcast failure or a production mistake.

I also noticed during the World Series on several occasions, the replay of a pitch showed the ball in a noticeably different position than the “live action” did. Again, it is not clear if Statcast is to blame or the problem was a production issue.

One last concern for using Statcast data to power a RoboUmp involves the time required to collect the video and radar data, process it into meaningful numbers, and transmit those values to a RoboUmp. When one watches a broadcast, it appears as though the system produces the results in real time. The speed and location of the pitch appear on your TV as the pitch is caught by the catcher. It is easy to forget that the broadcast has been delayed by a few seconds for the express purpose of adding those graphics.

The time for data processing and transmitting is not available publicly. However, I have noticed it takes at least one second, sometimes longer, for the pitch speed to be posted on the scoreboard in most ballparks. It is not clear if this data comes from Statcast or some radar gun positioned behind the plate. If it is from Statcast, it would be an estimate of the processing and transmission time needed to alert a RoboUmp.

Travis Sawchik has suggested that perhaps inside/outside calls could be made by the RoboUmp while high/low calls are made by the human umpire. So, when the game comes down to the winning run on second in the bottom of the ninth and the closer fires a two-strike pitch on the black, we should wait a second or two for the scoreboard to tell us whether the game is over. That can’t happen.

Of course, processing and transmission times may drop as Statcast improves, allowing more instantaneous pitch calls. Nonetheless, we’ll still have the random errors and the issues associated with the definitions of the top and bottom of the strike zone to address.

I guess we’ll leave the last words to Bill Klem, who once replied to a rookie pitcher complaining about the strike zone, “Son, when you pitch a strike, Mr. Hornsby will let you know.” The point is that, for now, as the Commish says, calling balls and strikes must remain a human endeavor.

References and Resources


David Kagan is a physics professor at CSU Chico, and the self-proclaimed "Einstein of the National Pastime." Visit his website, Major League Physics, and follow him on Twitter @DrBaseballPhD.
14 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Jimmember
6 years ago

VERY interesting, David. Thank you.

Kristopher
6 years ago

David, this is a great piece. I think it’s also fun to study the systematic error of the umpires. As a physicist, you probably understand this much better than I, but I remember sitting around watching a ball game unable to grasp umpire position. They’d place themselves in a position that essentially guaranteed a poor optical angle. If you position yourself in “the slot,” you’re creating an angle, on purpose, that makes it impossible for the human eye to judge depth! Why would you ever do this?! I really knew nothing about umpiring at that point and my quest to learn why they’d do this seemingly stupid thing ended up being incredibly enjoyable. I’ve studied pitch data off and on for 10 years, and while I knew we had to adjust outwards for the lefty strike, I didn’t know that it was determined by the physics of umpiring.

Umpirebible.com has two articles on working the zone, and calling balls and strikes that take about five minutes to read that I cannot believe I hadn’t yet read. How could I be a baseball fan and not understood the systematic error of umpiring?

Philosophically, I’m hesitant to call it a benefit, but removing context from any event before error is applied seems appropriate. Computers, unlike umps, would call the same pitches regardless of team or situation. Of course, when an error occurred in an important game, we’d have people screaming that the head office in New York programmed the roboump to favour the Yankees.

Jetsy Extrano
6 years ago
Reply to  Kristopher

Removing context is a tricky idea. As long as you have measurement error, you’re going to convince your measurement with a Bayesian prior to get your estimate. You can choose a flat prior if you want, but that actually hurts your accuracy.

In game terms, umpires today call more strikes on hitters’ counts and more balls on pitchers’ counts. Maybe from Bayes, maybe from other motivations. But if you change that you change the game quite a bit. We’ll have significantly more K and BB and fewer balls in play.

Hank G.member
6 years ago
Reply to  Jetsy Extrano

Not necessarily. Once the hitters and pitchers realize that they are not going to get a freebie on certain counts, they will adjust their behavior.

The thought that the umpire changes his zone based on the pitch count has always been an anathema to me. I understand that others apparently are not bothered by this. Do we as a group accept it because it’s always been that way? I am firmly in favor of getting the call right, if possible (and practicable).

An automated strike zone might be larger than the human umpires. How many umpires will consistently call a ball that is one millimeter over the black a strike? Some pitchers (at least according the the announcers) can get the umpire to expand the strike zone on the outside, so maybe this wouldn’t be a problem. If it was, the strike zone could always be adjusted slightly (e.g., the ball has to be more than 50% over the plate to be a strike).

The Stranger
6 years ago

I’ve generally been against RoboUmp for a variety of reasons, but I hadn’t given much thought to the challenge of calling high/low strikes and how subjective the top and bottom of the zone really are. I can think of a couple workarounds for that, but I’m not convinced that they’re better than the current system.

As a fan, though, I can accept that human umpires make bad calls. Even if it’s pretty egregious, I can still recognize that these are humans doing the best they can (and that I wouldn’t do any better). If a computer botched a critical call, it would be much harder for me to accept.

Michael
6 years ago
Reply to  The Stranger

I agree with your last point. It seems from the info here that even if it improves, this technology is not as inherently perfect (ie random error) as people might assume a “robot” would be. I’m open to it, and it may be better than an human ump especially with time, but there will be disputes seemingly no matter what and it seems easier to accept human error for me for whatever reason.

DancingInPDX
6 years ago

Awesome article! One thing that is not clear to me is the degree to which “accuracy” is actually the heart of the problem, as opposed to “consistency”. I’d argue that hitters and pitchers alike can adjust to a K-zone that is slightly inaccurate (relative to the stated rules). What I sense is far more frustrating is a K-zone that moves around. So to me there are two important margins of error, where the margin of error for accuracy, once below a certain threshold (e.g., an inch), becomes acceptable, making the margin of error for consistency more important.

Regarding the top and bottom of the zone, that problem should be easily solved by taking measurements of each player prior to the season (presumably in less baggy clothing) and having those measurements saved and fed into the system – i.e., the system knows who’s batting (yes, that would involve human interaction with the system for pinch hitters, etc., but easily solved).

And as for the occasional mistakes made by the system, my assumption is that the system itself wouldn’t actually make the call. Instead the home plate umpire, who will still be there, is informed of the system’s decision by having his hand clicker vibrate, like a cell phone, at which point the ump makes the call as he does today (either agreeing with or overriding the system). But clearly that’s not practical if the lag issue cited can’t be resolved.

francis_soyer
6 years ago

They’ve been doing it in Tennis for years. Zero controversy.

As for hi-lo calls. Measure the players and have a strike zone based off of their heights.

It never made sense for a Rickey Henderson to crouch his way into the record books to begin with.

It’s about time the old hi-lo zones were abandoned, they were never consistent to begin with.

The Stranger
6 years ago
Reply to  francis_soyer

Honestly, you could make a reasonable case that the strike zone should just be the same height above the ground for all players regardless of height. It would be a fundamental change to the rules, but there’s an inherent fairness in making every batter responsible for hitting pitches in the same area. It’s not like short basketball players get to shoot at lower baskets.

Hank G.member
6 years ago

It sounds as though they are very close to being able to use automated balls and strikes, if they can get the processing and transmitting times down to an acceptable level. The error rate seems to be better than many human umpires already. The vertical strike zone issue could be addressed by processing all players (say in spring training) and then keeping a database on each player and his vertical strike zone. Then the identity of the player would be entered and the system would know his strike zone. There would probably have to be a process where a player could ask to be tested again if his batting stance changed or even he simply felt it was in error.

One advantage of automating the calls which you did not address (rightly because it was outside the purview of the article) is that with automation, even if it was no better than human umpires, it would be consistent. I would think that both superior hitters and pitchers would gain an advantage to knowing what the exact strike zone (within the margin of error) would be.

Hank G.member
6 years ago
Reply to  Hank G.

Another side effect would be that catcher framing would no longer be a useful skill.

v2miccamember
6 years ago
Reply to  Hank G.

I am okay with that.

Marc Schneider
6 years ago
Reply to  Hank G.

If you did this, how would players argue with the ump and waste more time? The point about tennis is interesting; the advent of the challenge system has eliminated all the McEnroe-esk arguments and, to made, made it more enjoyable. The players are usually wrong in challenging a call and not surprisingly. Even at my hacker level, it’s often hard to tell if a ball is in or out. I suspect it’s no different with baseball players.

v2miccamember
6 years ago
Reply to  Hank G.

Honestly, if the technology advances that far, they could simply re-calibrate the system before each game during the players batting practice.