Who watches the watchers?

Nothing will rile a fan up quicker than the notion that his team has been cheated out of a victory, a run, or the outcome of a single plate appearance through the fault of poor officiating. But how much of an effect can poor umpires have? Let’s focus in on one particular aspect of umpiring and look at how the home plate umpire calls his strike zone.

Before beginning, one caveat is that we absolutely cannot argue the umpire’s strike zone based upon what we see on television. Why not? Because of parallax. Because the center field camera used in TV broadcasts is neither in straightaway center nor at ground level, it cannot give an accurate impression of where the ball crosses the plate. It’s the same reason you can’t accurately read the speedometer of a car while in the passenger seat—in order to correctly judge the position of the needle, you need to be looking straight on.

We could, of course, look at ball tracking data, like that provided by PITCHf/x. But that only tells us whether a pitch is called a ball or strike. It doesn’t tell us whether balls and strikes are being called accurately, at least not initially. And we only PITCHf/x data for the past two years, so it tells us nothing about baseball before 2007. So can we judge how well an umpire judges the strike zone without pitch tracking data? We can certainly try.

Unfortunately, we can’t go out there and perform controlled experiments, repeating the same at-bat over and over again with different umpires. So what we want to try and do is compare batter-pitcher matchups that are similar in every way except for the umpire behind the plate. For instance, we can look at all plate appearances between Keith Foulke and Michael Young in Arlington in 2008, and break down the results by who was umpiring. If we want to evaluate one particular umpire, we compare the results when he was umpiring that specific set of players (and environment) to all other umpires who saw that same set of players. Analyst Tom Tango likes to call this a “with or without you” approach. In this case, we look at how a specific batter-pitcher matchup resolves itself with and without the umpire in question.

Given enough of these matched pairs over enough plate appearances, we can safely assume that the only change between the two sets of data is the umpire involved, and that any difference in walk rate between the two groups is due to the change in umpiring. That gives us a baseline for comparison, though this won’t tell us who is calling the “correct” strike zone unless the average umpire is calling the zone correctly, of course. But it will at least give us a yardstick against which we can measure umpires.

(Now, thanks to the fine folks at Retrosheet.org, we have play-by-play data going all the way back to 1953. To keep results relevant to our modern offensive context (and to keep my computer from spewing smoke out of its fan), I limited the query set to 1993 and beyond.)

Looking at all umpires with at least 1,000 plate appearances observed in the study, here’s what we get for the greatest difference between umpires in rates per plate appearance:

BB_DIFF

SO_DIFF

1B_DIFF

2B_DIFF

3B_DIFF

HR_DIFF

Min.

-0.02

-0.03

-0.02

-0.02

0.00

-0.01

Max.

0.02

0.02

0.03

0.01

0.01

0.01

Range

0.04

0.05

0.05

0.03

0.01

0.02

Per650

28.6

33.2

31.0

18.3

6.1

16.0

So what does this table mean? In the bb_diff column, the “Min.” figure means the umpire least likely to call a walk called .02 fewer walks per plate appearance relative to average; the “Max.” figure means the umpire most likely to call a walk called .02 more walks per plate appearance relative to average. “Range” is the difference between min and max, and the Per650 of 28.6 means that the ump with the largest strike zone would call roughly 29 fewer walks than the ump with the smallest strike zone over 650 plate appearances, which is the average number of PAs for an MLB starting player.

Now, bear in mind that these are the most drastic differences observed between umpires with significant time behind the dish; the difference between any two umpires is likely to be much smaller than that. We can measure that using standard deviation, which will tell us the range of nearly 70 percent of the population, assuming that umpiring talent is normally distributed.

BB_DIFF

SO_DIFF

1B_DIFF

2B_DIFF

3B_DIFF

HR_DIFF

StdDev

0.01

0.01

0.01

0.00

0.00

0.00

Per650

4.6

6.1

5.2

3.1

1.0

2.7

In other words, over 650 plate appearances, a typical umpire will be within 4.6 walks and 6.1 strikeouts of the average, plus or minus.

Now, it’s easy to conceive of how an umpire can effect the walk and strikeout rates of the hitters and pitchers involved, but what about the singles and home runs? It’s important to bear in mind that baseball players are not autonomous, but thinking people who learn and adapt to their circumstances. If a hitter knows that a certain umpire calls a wider strike zone, he’ll swing at more outside pitches, which will lead to increased strikeouts, but also lead to weaker contact and fewer hits. Conversely, if a pitcher is facing an umpire with a smaller zone, he’ll be forced to stay closer to the heart of the zone and give hitters a meatier pitch to hit.

So what umpires come closest to calling the average strike zone, and which are furthest away? We can figure it out using the Pythagorean Theorem (the one dealing with triangles, not the one dealing with run differential). Essentially, we figure out the distance of each rate stat from the average, and then treat those distances as the side of a triangle. (Okay, so an n-dimensional figure in space.) This is essentially the way PECOTA figures its similarity scores. For ease of use, they are placed on a scale of 100 to 0, where 100 is identical and 0 is not similar at all.

Closest to the average zone:

Umpire

PA

Sim

Mike Winters

12459

96

Dale Ford

4060

96

Mike Everitt

8460

95

Larry Vanover

10406

95

Jim Joyce

10980

94

Tim Welke

11608

93

Ted Barrett

10364

93

Chuck Meriwether

12277

93

Joe West

10270

93

Mike Reilly

12292

93

Farthest from the average zone:

Umpire

PA

Sim

Greg Bonin

4749

74

John McSherry

1665

74

Larry McCoy

4056

73

Adrian Johnson

1173

71

Scott Higgins

1003

70

Mike Vanvleet

1381

69

Matt Hollowell

2917

69

Jim Evans

4694

67

Kevin Kelley

1355

66

Vic Voltaggio

1197

66

We should note that the umpires closest to the average zone got by far more plate appearances observed than the umpires furthest away; MLB seems to do a pretty good job of weeding out the umpires inconsistent with the others over time.

One more thing for you to take from this article: like the weather and the park, the umpire is something both teams have to deal with. In a particular game, sure, it can make a difference between winning and losing for a team. But over the course of the season, well, it typically evens out.

References & Resources
The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at http://www.retrosheet.org.

The formula for converting the distance estimates to more palatable sim scores is:

(1-x*10)*100

This is based on trial and error, and isn’t supposed to represent anything particularly meaningful.


Comments are closed.