An Exploration of MLB Umpires’ Strike Zones

Bill Miller’s strike zone is this wide. (via Eric Enfermero)

During Game Five of the 2017 World Series, home plate umpire Bill Miller became somewhat of a household name by calling strikes on several pitches that appeared to be outsize the zone. The bad calls often favored the Astros, with Dodgers batters repeatedly looking miffed in an eventual 13-12 loss.

Ignoring the merits of each call, we’ll raise a different question concerning Miller’s performance: Should players and the media have been surprised?

In this article, we look at how to identify a strike zone for each umpire. That way, whenever you’re tuned into a game, you can check back to see if and where your team might get the benefit of a borderline call. Next, we’ll describe how the observed between-umpire variability in strike zone size is unlikely to be due to chance. Finally, we’ll look at last year’s numbers to see how each ump looked in 2017.

Turns out, in calling more strikes against the Dodgers and Astros, Miller wasn’t doing anything unusual. Our findings suggest he boasts one of the widest strike zones among major league umpires.

Measuring umpire performance

One common way of assessing umpire performance is to take a binary outcome variable (was a taken pitch called a strike or a ball?) and compare it to a binary explanatory variable (was a taken pitch actually a strike or a ball?). While this and other types of accuracy measures are informative (as in this FanGraphs article), they also lose information. In ignoring the exact location of where each pitch crossed the strike zone, a pitch right down the middle, as an example, is treated the same as one an inch inside the corner of the plate.

Fortunately, MLB provides the exact location of each taken pitch as it crossed the plate. Given that the strike zone—at least how it’s called by umpires—more closely resembles an oval than a rectangle, generalized additive models (for more on GAMs, see examples here, here, and here) are a recommended tool. GAMs are attractive for strike identification in that an analyst does not need to, a priori, identify the exact association between pitch location and strike likelihood and instead can let the data drive the most plausible relationship.

Our goal is to use GAMs to learn about each umpire. To start, we grabbed pitch-level data from Baseball Savant using the “baseballr” package in R, done for the 2008-2016 regular seasons. This was merged with umpire data (e.g., the umpire for each game) that was generously provided by Brian Mills. Next, we fit a GAM for each umpire to identify the likelihood of taken pitches being called a strike and extrapolated from this model the percent chance a taken pitch is called a strike on each part of the plate. Finally, we compared each umpire’s estimated zone with one estimated on all umpires across the major leagues to roughly identify where each umpire has called either fewer or more strikes.

Miller being Miller

Let’s start with the aforementioned Bill Miller.

Here’s a chart of how Miller’s strike zone compares to the major league average. Green portions of the graph reflect locations where Miller calls more strikes than a league-average ump, while the part in purple corresponds to fewer strikes. The strike zone viewpoint is that of the catcher; that is, it reflects what he is looking out at, with the umpire behind him, and is faceted to reflect both right-handed (R) and left-handed (L) batters.

Across nearly the entire fringe of the strike zone, Miller calls more strikes, up to 27 percentage points higher (shown in the dark green shades of the graph) than an average major league ump. High pitches, low pitches, inside pitches, outside pitches, pitches to left-handed hitters, and pitches to right-handed hitters—Miller is almost always calling more strikes.

From 2009 to 2016, Miller called an estimated 1,100 more strikes—roughly four per game—than the average umpire would have. Indeed, it seems his wide strike zone in the 2017 World Series was nothing but consistent with his past.

A Hardball Times Update
Goodbye for now.

How about other umps?

A wide strike zone for Miller is one thing, but how do other umps compare?

Here’s a chart with nine selected umpires, chosen for both the uniqueness of their strike zone shapes and that they’ve each called at least 3,000 pitches during each year of our sample.


In the top row, Gerry Davis and Greg Gibson stand out as having two of the tightest strike zones in the game (mostly purple, or fewer strikes), and we include Fieldin Culbreth (top right graph) as an ump whose numbers are somewhat close to the average. Gerry Davis’ inclusion as the tightest ump over the last decade coincides with the fact that he was also considered to have the major leagues’ smallest zone way back in 2007.

In the middle row, with Joe West, ball locations to the catchers’ right side lead to fewer strikes (in purple), with more strikes to the catchers’ left side (in green). Interestingly, CB Bucknor’s strike zone is almost a reflection (across the middle of the plate) of West’s. Meanwhile, with Jerry Meals, (middle row, middle column), the strike zone varies based on batter handedness, with Meals more apt to call a strike on the outside corner than the inside corner.

In the bottom row, Doug Eddings and Miller stand out with the two biggest strike zones.

Although we couldn’t fit every ump on the chart above, we made a gif using each ump who called at least 3,000 pitches during each season between 2008 and 2016 (there are 41 umps shown, arranged in order from least-to-most pitcher friendly).

What would these charts look like if all umps were equivalent?

In fitting a separate GAM for each umpire, one potential worry is that we’re overfitting the data, which could yield exaggerated signals that may not reflect each umpire’s true propensity to call strikes. Although statistical testing can help in this regard, it’s perhaps as pertinent to replicate the charts above, except this time assume umpires were randomly assigned to each pitch.

Here’s a figure that shows what our modeling of umpire strike zones would look like if there were truly no differences between umps. For this chart, pitches were randomly assigned to one of the nine umpires above, such that the overall sample size reflected the actual number of pitches they called.


If balls and strike calls were truly random among umpires, we’d see very little of the signal we actually observe. In fact, among pitches on the border of the strike zone, the standard deviation between umpire strike call rates is about nine times what would be observed due to chance alone.

What’d the strike zone look like last year?

Between the 2016 and 2017 seasons, MLB’s pitch tracking software turned over from PITCHf/x to Trackman. As a result, it’s conceivable that a few related changes impacted the league’s overall strike zone. Alternatively, umps may have changed their behavior since the PITCHf/x era.

To take a more recent look at umpire strike zones, we looked within 2017 games called by each ump to see which ones tended to call more strikes and which tended to call more balls. (This is done by using each taken pitch in each game to determine the expected number of called strikes, which we compared to the actual number of called strikes.)

The following chart shows a boxplot of per-game deviations from the league average in strike calls for each ump who called at least 25 games in 2017. Umpires on the left of the chart (starting with Tom Woodring) mostly called games with fewer than expected called strikes, while umpires on the right (ending with Doug Eddings) generally called games with more called strikes than expected. Moving from Woodring to Eddings reflects a median of about 10 more called strikes per game.

It’s also worth noting that all umps were associated with some games above and below expectations as far as strike calls. And most umps are fairly accurate; exactly half of the 76 umps shown were observed to have an average game-level zone within one strike of the major league average. Two specific umps stood out based on a combination of both strike zone accuracy and consistency from game to game: Laz Diaz and Gerry Davis, whose boxplots are lightly highlighted above.

Davis’ move to being one of the most consistent umps is quite interesting. Recall that in using games prior to 2017, we earlier had found him to have one of the league’s least favorable strike zones for pitchers. As one possibility, this seems to highlight that umps can change how they call strikes over time. As recently as the 2014 season, Davis was the second-most stingy umpire by this same approach before climbing closer to the average in both 2015 (fourth-fewest strikes) and 2016 (eighth-fewest strikes).

Conclusion

This article is a snapshot of how statistical modeling can teach us a bit about umpire-level strike zones. In turn, we identified significant between-umpire differences in strike zone sizes.

Two uses for this analysis stand out. As one, pitchers and catchers would be well served to know, a priori, the location of pitches where each umpire tends to call strikes or balls (with the caveat being umps can change their zone at any given moment if they feel like it). Teams may well be doing some of this research already behind closed doors. Additionally, as fans, perhaps an understanding of similar strike zone maps or charts can better prepare us for games like Miller’s in last year’s World Series.


Michael Lopez is an Assistant Professor of Statistics at Skidmore College. Sadie Lewis is a soon-to-be graduate of Skidmore College and a major in the Department of Mathematics and Statistics.
32 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
francis_soyer
6 years ago

Great analysis.

Automate the strike zone already – this is ridiculous. May as well get rid of the foul lines and poles while we’re giving the “human element” a leg up on readily available technology.

He called *more* strikes in a 13-12 game ? Imagine if he had a legitimate strike zone.

bjkell
6 years ago
Reply to  francis_soyer

I disagree with automating the strike zone. I think as long as an ump calls a game consistently for both teams that it is fair. I also think that differences in how umps call games is one of those small nuances in baseball that make the game so deep and interesting. Replacing that margin for error that comes from the human element would reduce the game to machine error which, in my opinion, is more frustrating than the human element to see make mistakes.

hopbittersmember
6 years ago
Reply to  bjkell

I like elements. Sometimes humans, too. So I guess I’m in the human element camp.

francis_soyer
6 years ago
Reply to  hopbitters

Let’s get rid of the goalposts in the NFL and let humans tell us if the kick is good.

hopbittersmember
6 years ago
Reply to  francis_soyer

Sarcasm aside, the difference in that scenario is one team is kicking and the other team has no reply to the outcome based on the referee’s call or anticipated call. On balls and strikes, it’s a multiple person interaction. The pitcher-catcher combination is trying to throw the pitch in a certain location and manner to effect a reaction from either the batter or the umpire, possibly both. The batter is trying to gauge, in very short order, if it’s a pitch he should swing at, based on his skills versus the pitcher, but also the umpire’s strike zone. It’s a complex interaction. So you think of it as adding additional accuracy (and you’re absolutely right in the regard), but I see it as removing a layer of complexity that I find interesting. People are different.

Allan Wood
6 years ago
Reply to  bjkell

Many people give the “as long as he’s consistent” line. But being consistently wrong is not what we want, either. I have been watching baseball for 45 years and I have yet to see an umpire be consistent from inning to inning, let alone game to game and month to month and season to season. I believe it is humanly impossible. Also, the strike zone is defined in the RULE BOOK. When people say this guy is a “pitcher’s umpire” or a “hitter’s umpire”, it’s also an admission that he’s unable to uphold the rules of the game. The number of overturned calls on not-so-close plays should also haste the arrival of the robots!

jamesbrooksmember
6 years ago
Reply to  Allan Wood

You’ve watched baseball for 45 years and, yet, you’re progressive and not blindly rooted in tradition. Thank you sir, we need more like you.

Rotoholicmember
6 years ago
Reply to  bjkell

Humans are just poorly calibrated machines. It’s more frustrating because people have the expectation that machines are perfect.

sprenten
6 years ago
Reply to  bjkell

The best argument against automation is the cost to automate across all levels of play. The only level where automation is possible is the MLB level. That’s a huge adaptation to make as a player for both batters and pitchers. What the current automation allows is grading umpires. Pitches on the edge should have some consistency, but there needs to be little room for pitches clearly inside or outside of the zone being called incorrectly.

kenai kings
6 years ago
Reply to  francis_soyer

Miller calls on average FOUR more strikes per game… of roughly the 250 pitches per game.
Not so bad I say. If he represents he worst margin of error… less than 2%; then the system is in good shape.

kenai kings
6 years ago

Considering the number of pitches per AB on average to be 3+ and so many batters take the first strike. When you add up the ab’s for both teams… 54. My math suggests double that # to 100. As well, batter or pitcher friendly umps go both ways… the problem one are those not consistent.
On another note. Do any of your graphs recognize the depth of the strike zone? Given that home plate lies horizontal and most times we only consider the vertical plain… I wonder about the complete picture here?

GoNYGoNYGoGo
6 years ago

Just imagine the outcry if/when legal betting is allowed on MLB. Would be a scary thing.

The Umpires’ Union needs to work w MLB and the PA to allow some changes to how umpires get/keep assignments. Why is such a poor ball/strike umpire allowed to be behind home plate on such an important game? Why do some umpires who have trouble calling balls/strikes but who may be very good on other umpire duties keep rotating behind home plate? Why do some umpires who are just bad at their job overall (looking at you CB Buckner and Angel Hernandez)stay in the majors?

francis_soyer
6 years ago
Reply to  GoNYGoNYGoGo

Legal betting is allowed on MLB.

Dougmember
6 years ago

One thing that stands out in Miller’s graph is that although he is overall quite pitcher-friendly, he doesn’t call the “lefty strike” to the extent that most do. Then I wonder if that’s a sufficiently large effect that it should influence lineup/rotation decisions. How far in advance do teams know about umpiring assignments?

Also, no surprise that Angel Hernandez’s pattern is just an ugly mishmash of clashing colors. Got to love all that purple in the middle of the zone. Man, does he need to retire.

John LaRuemember
6 years ago

Those individual strike zone charts by umpire are really beautiful data viz.

Iko29
6 years ago

This is great. I`ve always wondered how a bigger or smaller strike zone affects run scoring, if it does at all. It seems intuitive that a bigger strike zone helps the pitcher – I wonder if the data backs that up?

jsolidmember
6 years ago

This is awesome, can’t believe it wasn’t all in one place before.
One request, it would be helpful to see the plot of how often balls in each location are called strikes. Then we would know what the umps are being compared to. (I’ve seen that plot before, just can’t find it right now. Plus, you make such pretty pictures.)

ccctl
6 years ago

DL’d and split the gif, dumped to my phone so I can know when to expect a “wierd” strike zone while at the ballpark.

kenai kings
6 years ago

My observations this season are that many more high strikes are being called. That is good.

kenai kings
6 years ago

So. The conversation here seems to center on replace the umps. Go to an automated system.
Let’s imagine how that works out for the hitter. They miss the pitch by a couple of inches… oh no! No ump to ask, ‘where was that pitch?’. Likewise for the pitcher and catcher. “accurate or not” players like to know/feel how balls and strikes are being called.
AND, who’s to monitor the automation? MLB, back in NYC!! Heck, they have their hands full with replays.
I will always say…”People makes mistakes, computers screw things up.”

Eskuire
6 years ago

I expected Angel Hernandez up here to have just a giant black block between both batters box’. Only to see him not here. DISAPPOINT. (fantastic read by the way, enjoyed it immensely)

Dougmember
6 years ago
Reply to  Eskuire

He’s in the full series (animated GIF in the middle). And just as ugly as you’d expect.

sprenten
6 years ago

I have to imagine Hawk Harrelson reading this article and saying “How can we trust automation? I know these umpires aren’t calling this small of a zone against our boys.”.

incagold1121
6 years ago

The truth is, they could start using an Automated Strike Zone today and we would never notice unless they tell us. The zone would be set by the official rule book. The batter can still swing or not. if not, and it is deemed a strike, a ding is heard by the Home Plate Umpire in his ear and he calls it a strike. If not, he does not. What could possibly be wrong with this. You may say the computer is not 100% accurate, but it would be a lot more accurate then what we have. There should not be high or low or inside or outside strike zones according to the umpire. it should just be the right strike zone. And a consistent strike zone.

kenai kings
6 years ago
Reply to  incagold1121

The strike zone is different for each batter… top of knees to uniform letters for each batters typical stance. How can a computer eye-ball that?

channelclemente
6 years ago

IMO, it’s the nonlinear nature of the Game that makes it so unbelievably interesting. I can’t help but think an automated strike zone is tantamount to cutting ones nose off to spite one’s face.

Mac
6 years ago

Fantastic stuff! An under-rated part of the game that has huge implications and should be talked about more.