MLB Umpires: 2016 Review, World Series Preview

by Jon Roegele
October 28, 2016

John Hirschbeck is the crew chief in his final World Series as an umpire. (via Keith Allison)

As part of the MLB-watching general public, one fault I believe we have is forgetting how exceptionally talented the people are who grace our screens on a daily basis. We are quick to write off the hitter mired in a 1-for-27 slump, or the reliever who gives up runs in four straight appearances. “Why is this guy on the team?” we hear ourselves asking. We forget how many levels of baseball exist that this player has worked his way through, how many thousands of other baseball players from Little League all the way through Triple-A he has beat out just to make it onto your screen.

We notice these relative failures because we keep track of these players closely. Their stats can be found on any number of websites broken down in almost any manner you’d every like to see. We buy jerseys with their names on the back. Kids emulate them at the local park.

While the players are the people on the screen who are supposed to get all the attention, there are other people on that screen who also have risen through the ranks to make it to the highest possible level in their profession – the MLB level. These people are the umpires.

Umpires have a very, very difficult job. Even if we restrict the entirety of their jobs to only calling balls and strikes, the role is very challenging. Batters are different heights. Pitchers throw different types of pitches from all kinds of angles. Pitchers throw harder than ever. Catchers are angling to receive pitches in a manner that makes borderline pitches look more like strikes.

Home plate umpires have to make roughly 150 judgments per game in real time on pitches the general viewing audience gets to see painted on the screen overlaying a supposed strike zone grid. The mere fact that every strike zone grid I’ve seen on any broadcast is rectangular in shape tells me it does not represent a strike zone that any umpire in the league would call.

Much like the slumping players, we are quick to grow frustrated with umpires who appear to be failing. We all know the difference between a 1-1 pitch on the edge being given as a strike instead of a ball, putting the pitcher in the driver’s seat or letting the hitter sit on a fastball while ahead in the count.

While I think we all could learn to remember how difficult the jobs are of these people on our screens, there are a couple of differences when it comes to complaining about an umpire compared to a player. The first is that we believe umpires are supposed to be “invisible,” to do their jobs without drawing any attention. Some believe we could or even should utilize robots to replace their judgment. The second is that generally speaking, we don’t follow umpires on a day-to-day basis. We don’t know all of their names, we don’t look at their stats, and we don’t collect their baseball cards. You probably don’t remember who was umpiring behind home plate at the last game you went to see, but you probably do remember who the starting pitcher was for your favorite team that day.

All of this said, since the introduction of PITCHf/x in stadiums around the league a decade ago, every pitch location as it crosses home plate is tracked, and every umpire ball or strike decision is recorded. I have been monitoring and measuring the MLB called strike zone for a number of years now, so I know on aggregate how the strike zone is called in the majors. What I can do then is investigate how each individual umpire calls the strike zone as compared to the aggregate zone for the league.

The method of measuring individual umpires I used is taken from a suggestion by Tom Tango on his website last year. The idea is that many called pitches in a game don’t tell us much about an umpire’s strike zone. Pitches taken in the heart of the plate and pitches in the dirt always are called strikes and balls, respectively, by all umpires around the league. Where things get more interesting is where home plate umpires call pitches in areas where there is no consensus. Basically, around the edges of the strike zone.

Since I am using the aggregate MLB strike zone as the standard for this metric, I calculated an MLB-wide called-strike percentage over the entire regular season for each square inch above the front plane of home plate for both left-handed hitters and right-handed hitters. Umpire calls are given positive value for a call if it agrees with the majority based on the pitch location. The magnitude of that value depends on how likely a pitch in that location is called a strike. If on average, a pitch in a particular location is called a strike 60 percent of the time, then it is of course called a ball the other 40 percent of the time. If an umpire calls a pitch in that location a strike, he is awarded a positive score based on agreeing with the way that pitch was usually called around the league in that season.

In the example above case, an umpire would receive (0.60 strike% – 0.40 ball%) * 0.40 ball% = 0.08 to his expected call score. Had he called it a ball, he would have been docked (0.60 strike% – 0.40 ball%) * 0.60 strike% = 0.12 from his expected call score. This scheme weights calls in a reasonable manner based on how “easy” the call should have been, including attributing no value to calls made on pitches that are always called strikes or balls.

As Bryan Cole pointed out in the comments of this post, the formula works out to:

(2p – 1) * (c – p)

where p is the probability of the pitch being called a strike in the aggregate, and

A Hardball Times Update

by RJ McDaniel

Goodbye for now.

c is the home plate umpire’s call (1 for strike, 0 for ball)

Using this method, I calculated separately the sum of each umpire’s expected score (agreeing calls) and their unexpected score (disagreeing calls). Once I had the expected and unexpected scores, I calculated a ratio of expected to unexpected scores and then converted each ratio to a “plus” stat, Expected+, by dividing by the league average ratio. The ratio puts umpires with different numbers of opportunities on the same scale, and then the “plus” stat puts scores on a scale where 100 is league average. Every point higher or lower than 100 means one percent more expected or less expected the umpire’s calls are behind the plate than the average MLB umpire.

It is important to understand what this metric is measuring. Note that I use the word expected. It is ascribing positive value to calls that agree with the league majority for a given pitch location and negative value otherwise. This does not necessarily mean the call is correct, based on the rule book strike zone, which means this does not necessarily identify the best home plate umpire. An umpire may have a unique strike zone he calls quite consistently, and there is certainly an argument to be made that being consistent in any one zone is fine. However, this metric rates umpires on how well their calls agree with what is being called around the league.

In my opinion, given that all home plate umpires are evaluated and provided with feedback after each game, over the course of an entire season the aggregate of all called pitches is a good proxy for the strike zone the league wishes to have called. I also believe there is value is understanding how much an umpire tends to differ from the league average as far as the calls he is making, in terms of how expected his calls are and how often his unexpected calls are strikes on pitches that are more commonly balls and vice versa.

I calculated these numbers for all umpires in both the 2015 and 2016 seasons. Taking the seventy umpires who worked the most behind home plate over both seasons, there was a correlation of 0.66 in the Expected+ scores between the seasons. This suggests calling balls and strikes per the typically called league zone has a sizable degree of skill that is repeatable between seasons for umpires.

Here are the results of the 2016 season for Expected+ for all umpires.

Umpire Name	2016 Expected+
Jim Joyce	139
Mark Ripperger	135
Pat Hoberg	132
James Hoye	128
Chad Fairchild	127
Toby Basner	125
Mark Carlson	125
Ben May	124
Alan Porter	123
Greg Gibson	122
Bill Welke	121
Roberto Ortiz	118
Quinn Wolcott	116
D.J. Reyburn	115
Eric Cooper	114
Todd Tichenor	112
Adam Hamari	111
Mike Muchlinski	111
Sam Holbrook	111
Phil Cuzzi	110
Gabe Morales	110
Stu Scheurwater	110
David Rackley	110
Tony Randazzo	109
Marvin Hudson	108
Brian Knight	108
Chris Guccione	106
Jerry Meals	106
Jim Reynolds	106
Manny Gonzalez	105
Will Little	105
Mark Wegner	105
Alfonso Marquez	105
Cory Blaser	105
Jeff Kellogg	104
Sean Barber	104
John Tumpane	104
Brian Gorman	104
Paul Emmel	103
Mike Estabrook	102
Fieldin Culbreth	100
Brian O’Nora	98
Bill Miller	98
Ramon De Jesus	97
Chris Conroy	97
Doug Eddings	97
Joe West	97
Mike DiMuro	97
Ryan Blakney	97
Tripp Gibson	96
Tim Timmons	96
Scott Barry	95
Paul Nauert	95
Mike Everitt	94
Dan Bellino	94
Dan Iassogna	94
Jim Wolf	93
Chris Segal	93
Marty Foster	93
Dana DeMuth	92
Ted Barrett	92
Vic Carapazza	92
Gerry Davis	91
Chad Whitson	91
Gary Cederstrom	90
Adrian Johnson	90
Clint Fagan	89
Laz Diaz	89
Jeff Nelson	89
Jerry Layne	88
Mike Winters	88
Carlos Torres	88
Larry Vanover	87
CB Bucknor	86
Rob Drake	85
Jordan Baker	84
Lance Barksdale	84
Tom Woodring	84
Nic Lentz	83
Andy Fletcher	83
Ron Kulpa	83
Hunter Wendelstedt	82
Lance Barrett	82
Angel Hernandez	82
Kerwin Danley	82
Tom Hallion	81
Ed Hickox	81
John Hirschbeck	78
Bob Davidson	77
Dale Scott	72

Aside from this metric, I also drilled down into the unexpected calls made by each umpire to see the ratio of scores from pitches they called strikes when the league typically called the pitch a ball, and called balls when the league majority was a strike. This acts as somewhat of a proxy for strike zone size, as home plate umpires who call more unexpected strikes relative to balls than normal would tend to have what we perceive as a larger strike zone, and a smaller-than-average ratio would tend to indicate a smaller strike zone.

Once again, I adjusted the ratios to a “plus” stat. Here are the unexpected strike-to-ball ratio scores, or Unexpected S:B+, for the 2016 season:

Umpire Name	2016 Unexpected S:B+
Bill Miller	231
Jim Wolf	214
Bob Davidson	184
Brian Gorman	175
Roberto Ortiz	172
Doug Eddings	166
Stu Scheurwater	166
Lance Barrett	164
Mike Estabrook	162
Chris Segal	157
Eric Cooper	148
Hunter Wendelstedt	146
CB Bucknor	137
Will Little	137
Ben May	135
Kerwin Danley	133
Mike Everitt	130
Tripp Gibson	128
Andy Fletcher	126
Ed Hickox	125
Ted Barrett	123
Nic Lentz	120
Ron Kulpa	119
Jeff Nelson	116
Fieldin Culbreth	115
Mike DiMuro	115
Dan Iassogna	114
Jerry Layne	113
Jeff Kellogg	111
John Hirschbeck	111
Quinn Wolcott	109
Cory Blaser	109
Tim Timmons	108
Phil Cuzzi	106
Marty Foster	106
Vic Carapazza	105
Toby Basner	105
Marvin Hudson	102
Jim Reynolds	101
Brian Knight	98
Carlos Torres	98
Adam Hamari	98
Adrian Johnson	98
Mark Ripperger	97
Tony Randazzo	95
Angel Hernandez	94
Brian O’Nora	92
Mike Winters	90
Rob Drake	86
Lance Barksdale	86
Paul Emmel	86
Laz Diaz	86
Chris Guccione	85
Gabe Morales	84
David Rackley	83
Gary Cederstrom	83
Jim Joyce	81
Dan Bellino	81
Jordan Baker	81
Chad Fairchild	81
Chris Conroy	80
John Tumpane	80
Dana DeMuth	78
Mike Muchlinski	75
Sean Barber	75
Dale Scott	72
Sam Holbrook	70
James Hoye	70
Ramon De Jesus	69
Clint Fagan	68
Alan Porter	67
Chad Whitson	66
Mark Wegner	66
Pat Hoberg	64
Ryan Blakney	64
Bill Welke	63
Jerry Meals	63
D.J. Reyburn	61
Todd Tichenor	61
Joe West	60
Paul Nauert	59
Tom Hallion	58
Greg Gibson	57
Gerry Davis	56
Scott Barry	54
Manny Gonzalez	54
Larry Vanover	53
Alfonso Marquez	52
Mark Carlson	49
Tom Woodring	29

Note that Tom Woodring did not work many games behind home plate, so his extremely low score here is a small sample size. Do not take this to mean Bill Miller’s strike zone is 131 percent larger than the league average! Obviously, this could not be the case. This means that on calls Miller makes that are counter to the league norm, he is much more likely to be calling strikes when most umpires call balls than balls when most umpires call strikes.

The correlation between 2015 and 2016 scores for the busiest seventy umpires was 0.69, meaning once again this is an aspect of game calling that umpires do seem to carry significantly from season-to-season.

There was a correlation of -0.30 between Expected+ and Unexpected S:B+ in 2016, meaning there was value in having a slightly smaller zone this year in trying to conform to the league majority. This correlation was only -0.11 in 2015. Umpires tend to make more unexpected calls by calling pitches that are typically called balls as strikes than the other way around, so umpires that are less susceptible to this pattern tend to have marginally higher Expected+ scores under this system.

The most interesting home plate umpire to me after undertaking this exercise is Mark Ripperger. His Expected+ score in 2015 of 156 was by far the highest of any umpire in either of the last two seasons, with a difference between his score and the second-place score greater than the difference between second place and thirty-third place that season. Ripperger followed that up with the second-highest Expected+ score in 2016. He seems to have an excellent grasp on the strike zone being called in the league right now.

Another fun exercise I tried was looking at the most expected called game of the 2016 regular season. The game that had the best pitch calling with respect to matching the league aggregate was almost a perfect game from Brian Knight! There was only one “unexpected” pitch call, which was a called ball on a pitch location that was called a strike 51 percent of the time over the course of the season.

2016 World Series

Here is a game-by-game view of the home plate umpires assigned for the World Series games this season based on the perspective offered from these metrics. You’ll notice these umpires do not all call the most expected strike zones out of the set of umpires working in 2016. As I mentioned earlier, umpires’ jobs are very difficult, and in this article we have only been examining them from the lens of pitch calling. They are also responsible for game management, calling plays at bases, fair/foul judgments, and much more. I would expect the league would base the selection process for umpires on their entire body of work and include seniority and other undoubtedly factors when considering postseason assignments.

(Editor’s note: This article was written before the World Series began. We thought readers would be interested in knowing how all the scheduled home plate umpires rated.)

Game One: Larry Vanover
Expected+: 73rd (out of 90)
Unexpected S:B+: 87th (out of 90)
Vanover is at the extreme end among MLB umpires with respect to calling pitches typically called strikes as balls. His small zone tends to favor the hitter, and thus may be a challenge to navigate for Corey Kluber and Jon Lester. According to Baseball Prospectus, the strikeout-to-walk ratio in games he worked this season was 79th out of 90. Aside from a small zone, his calls did not line up as well with the expected MLB zone as most home plate umpires this season. Vanover worked Game Three of the NLDS between the Cubs and Giants.

Game Two: Chris Guccione
Expected+: 27th (out of 90)
Unexpected S:B+: 53rd (out of 90)
Guccione has a slightly smaller strike zone than average, as well, but he called a somewhat better-than-average expected zone. His results from 2015 were almost identical, so his calling pattern appears to be consistent. Guccione worked Game Two of the NLDS between the Dodgers and the Nationals.

Game Three: John Hirschbeck
Expected+: 88th (out of 90)
Unexpected S:B+: 30th (out of 90)
Set to retire after the season, Hirschbeck is the crew chief for the World Series. He seems to call one of the more unique zones, as his calls rated as one of the most unexpected in the game this season. He still calls one of the larger zones in the game, although it was less extreme this season than in 2015. Baseball Prospectus has his strikeout-to-walk ratio as 21st highest out of 90. This is his fifth World Series assignment and a nice way to complete his final season.

Game Four: Marvin Hudson
Expected+: 25th (out of 90)
Unexpected S:B+: 38th (out of 90)
Hudson rated as higher than average as far as his pitch-calling matched with the typical league zone this season. His metrics are also very similar to the previous season, so he seems to have settled into a relatively consistent pattern of calling pitches. Hudson was behind the plate for Game Four of the NLDS when the Cubs knocked out the Giants, so John Lackey already has pitched to his zone in this postseason.

Game Five: Tony Randazzo
Expected+: 24th (out of 90)
Unexpected S:B+: 45th (out of 90)
Randazzo also called a zone that matched the aggregate zone quite closely. His unexpected calls tend to be extra strikes and extra balls near the league average ratio. Randazzo was the home plate umpire for Game Three of the ALDS between Cleveland and Boston, the Red Sox’s final game of the 2016 season, when Josh Tomlin made the start.

Game Six: Joe West
Expected+: 47th (out of 90)
Unexpected S:B+: 80th (out of 90)
West called more pitches unexpectedly as balls this season than most umpires, meaning his strike zone was smaller than most. Baseball Prospectus noted his strikeout to walk ratio as 71st out of 90. This is his sixth World Series assignment, joining John Hirschbeck as the most experienced World Series umpires working this year.

Game Seven: Sam Holbrook
Expected+: 19th (out of 90)
Unexpected S:B+: 67th (out of 90)
Holbrook starts the World Series as the replay umpire before moving onto the field for Game Three. His pitch calling rated well as far as calling to the typical MLB zone this season. Holbrook worked Game Three of the ALDS when the Blue Jays finished the sweep of the Rangers.

Enjoy the World Series everyone!

UPDATE: Here are the results for the first two games.

Game One:
Expected+: Top 68% of games from 2016
Unexpected S:B+: Top 58% of games from 2016

Game Two:
Expected+: Top 9% of games from 2016
Unexpected S:B+: Top 4% of games from 2016

References and Resources

All data from Fangraphs unless otherwise noted.

Jon Roegele is a baseball analyst and writer for The Hardball Times. He was nominated for a SABR Analytics Conference Research Award in 2014 and 2015. Follow him on Twitter @MLBPlayerAnalys.

15 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Alex

8 years ago

Thanks Jon, really interesting stuff.

Is there anything to the notion that umpires can be (subconsciously?) biased towards the home team?

So could you split the stats between calling for home team and away team to see which umpire sees the biggest change?

Cheers.

Jon Roegele

8 years ago

Reply to Alex

Yes I actually looked at home field advantage in this recent article: http://www.hardballtimes.com/the-2016-strike-zone/

There has been a small advantage for the home team in every year of the PITCHf/x era, and the magnitude of that advantage appears to grow significantly as game leverage increases.

I did not find any correlation from season-to-season for umpires showing tendencies toward more home field advantage than others, so I don’t think it is predictive in that way. I suspect it is just human nature and ends up being circumstantial.

Jim Anderson

8 years ago

Reply to Alex

The home plate ump tonight was clearly trying to cheat the Indians, he made 17 wrong calls on balls and strikes against the Indians. It was pathetic. I live in Chicago, at least I am honest.

Jarod Garza

8 years ago

Reply to Jim Anderson

Yes dude, he was trying to “cheat” the Indians…in a World Series game. Because that’s what these guys do…they go out there with complete disregard for their jobs in the grandest setting in baseball and intentionally “cheat” teams. In a given game they see close to 200 of the fastest and dirtiest moving pitches from best pitchers in the game and have to make judgement calls where a half inch is the difference between a ball and a strike. Had an off night, maybe, but he certainly was not trying to cheat anyone.

John Riley

8 years ago

Reply to Jarod Garza

Jarod, you should stick to coaching t-ball as you obviously are not a baseball guy, dude.

Todd Huff

8 years ago

Reply to Jarod Garza

1/2 inch off the plate? Wow, you must have pretty bad eyesight. Those pitches clearly all went against the Indians and was so one sided it was pathetic.

Dennis Bedard

8 years ago

The TV strike zone is a one dimensional picture frame that I believe is super imposed in front of the plate. However, the strike zone properly defined is three dimensional and any ball that enters the “box” is a strike. Thus, a breaking ball could look to be outside as it traverses the strike zone but then break into the zone behind the imaginary box. I always wondered if a pitcher could develop a pitch that was thrown high in the air and then dropped down into the strike zone at a perfect vertical angle. It would enter the strike zone at the top and then hit the center of home plate. This pitch would be impossible to hit. I remember Steven Talbot, a Yankee pitcher in the Horace Clarke era, throwing a blooper pitch that got a lot of laughs but obviously the pitch (and him) never made it.

Jon Roegele

8 years ago

Reply to Dennis Bedard

Yes you’re right it is a 3-D zone in reality, so “backdoor” strikes are possible. There is a good article on this subject here: http://www.hardballtimes.com/analyzing-the-strike-zone-as-a-three-dimensional-volume/

Dubslow

8 years ago

Do I interpret the Game Results right by saying that Game 1 was average or a touch better than average on both metrics, while Game 2 was a pretty bad outlier having a consistently smaller than typical zone?

Jon Roegele

8 years ago

Reply to Dubslow

Sorry I wrote the update in the middle of the night, so I get that may not be clear. What I did was order all games from the 2016 season based on expected zone and unexepected strike:ball ratio.

Game 1 was below average as far as matching the correct zone, as it slotted in 68% of the way down the list. The unexpected strike:ball ratio was also smaller than average. So both of these lined up with the 2016 numbers for this umpire.

Game 2 was much above average, as it appears less than 9% of the way down the list. This lined up with this umpire being above average for expected calls in 2016. The game was also very high on unexpected strike:ball ratio, which was abnormal.

Hopefully that makes more sense?

Barbie

8 years ago

Reply to Dubslow

The Sutton & Barto book is indeed the earliest mention I have found of the hashing-trick. Well sp#eotd!Dtn&o8217;t we need to choose the size of the hashing range (i.e. bit mask) as an a priori model complexity parameter?

Gary Growe

8 years ago

Aren’t we talking about Steve Hamilton’s “Folly Floater” in the exchange re the 3-D strike zone?

Dennis Bedard

8 years ago

Ah yes! How could I confuse Hamilton for Talbot. It was Fred and not Steve Talbot. But Steve Hamilton was definitely the player who threw what purported to be a vertical strike.

Guy

8 years ago

Jon:
Great work, as usual. Is it possible to give us a sense of how large the difference is between high and low Unexpected S:B+ umpires, in terms of their average impact on a game? For example, could you tell us how many more/fewer strikes are called per game, and how R/G compare, between the top 30 and bottom 30 umps (or whatever grouping you think is appropriate)?

Kincaid

8 years ago

This is really interesting. Good work.

One thing to note is that the negative correlation between Expected+ and Unexpected S:B+ is probably expected and not necessarily a sign that a smaller zone correlates with conformity to the league-wide zone. That’s because Unexpected S:B+ uses a ratio with called balls as the denominator, which means it won’t give symmetrical scores for high-strike-call umps and high-ball-call umps.

For example, say the average ratio is 1:1 and you have two umpires, one of whom calls 100 strikes and 50 balls, and the other 50 strikes and 100 balls. The former will have a ratio of 2:1 for an Unexpected S:B+ of 200, while the latter will have a ratio of 1:2 for an Unexpected S:B+ of 50, which is closer to the average Unexpected S:B+ score than the first ump. Because both are equally extreme in their ball-strike tendencies but the high-strike-call ump has a more extreme Unexpected S:B+, you’ll probably get a negative correlation even in the absence of any relationship between strike zone size and Expected+.

BAL	CHW	ATH
BOS	CLE	HOU
NYY	DET	LAA
TBR	KCR	SEA
TOR	MIN	TEX

ATL	CHC	ARI
MIA	CIN	COL
NYM	MIL	LAD
PHI	PIT	SDP
WSN	STL	SFG