MLB Umpires: 2016 Review, World Series Preview
As part of the MLB-watching general public, one fault I believe we have is forgetting how exceptionally talented the people are who grace our screens on a daily basis. We are quick to write off the hitter mired in a 1-for-27 slump, or the reliever who gives up runs in four straight appearances. “Why is this guy on the team?” we hear ourselves asking. We forget how many levels of baseball exist that this player has worked his way through, how many thousands of other baseball players from Little League all the way through Triple-A he has beat out just to make it onto your screen.
We notice these relative failures because we keep track of these players closely. Their stats can be found on any number of websites broken down in almost any manner you’d every like to see. We buy jerseys with their names on the back. Kids emulate them at the local park.
While the players are the people on the screen who are supposed to get all the attention, there are other people on that screen who also have risen through the ranks to make it to the highest possible level in their profession – the MLB level. These people are the umpires.
Umpires have a very, very difficult job. Even if we restrict the entirety of their jobs to only calling balls and strikes, the role is very challenging. Batters are different heights. Pitchers throw different types of pitches from all kinds of angles. Pitchers throw harder than ever. Catchers are angling to receive pitches in a manner that makes borderline pitches look more like strikes.
Home plate umpires have to make roughly 150 judgments per game in real time on pitches the general viewing audience gets to see painted on the screen overlaying a supposed strike zone grid. The mere fact that every strike zone grid I’ve seen on any broadcast is rectangular in shape tells me it does not represent a strike zone that any umpire in the league would call.
Much like the slumping players, we are quick to grow frustrated with umpires who appear to be failing. We all know the difference between a 1-1 pitch on the edge being given as a strike instead of a ball, putting the pitcher in the driver’s seat or letting the hitter sit on a fastball while ahead in the count.
While I think we all could learn to remember how difficult the jobs are of these people on our screens, there are a couple of differences when it comes to complaining about an umpire compared to a player. The first is that we believe umpires are supposed to be “invisible,” to do their jobs without drawing any attention. Some believe we could or even should utilize robots to replace their judgment. The second is that generally speaking, we don’t follow umpires on a day-to-day basis. We don’t know all of their names, we don’t look at their stats, and we don’t collect their baseball cards. You probably don’t remember who was umpiring behind home plate at the last game you went to see, but you probably do remember who the starting pitcher was for your favorite team that day.
All of this said, since the introduction of PITCHf/x in stadiums around the league a decade ago, every pitch location as it crosses home plate is tracked, and every umpire ball or strike decision is recorded. I have been monitoring and measuring the MLB called strike zone for a number of years now, so I know on aggregate how the strike zone is called in the majors. What I can do then is investigate how each individual umpire calls the strike zone as compared to the aggregate zone for the league.
The method of measuring individual umpires I used is taken from a suggestion by Tom Tango on his website last year. The idea is that many called pitches in a game don’t tell us much about an umpire’s strike zone. Pitches taken in the heart of the plate and pitches in the dirt always are called strikes and balls, respectively, by all umpires around the league. Where things get more interesting is where home plate umpires call pitches in areas where there is no consensus. Basically, around the edges of the strike zone.
Since I am using the aggregate MLB strike zone as the standard for this metric, I calculated an MLB-wide called-strike percentage over the entire regular season for each square inch above the front plane of home plate for both left-handed hitters and right-handed hitters. Umpire calls are given positive value for a call if it agrees with the majority based on the pitch location. The magnitude of that value depends on how likely a pitch in that location is called a strike. If on average, a pitch in a particular location is called a strike 60 percent of the time, then it is of course called a ball the other 40 percent of the time. If an umpire calls a pitch in that location a strike, he is awarded a positive score based on agreeing with the way that pitch was usually called around the league in that season.
In the example above case, an umpire would receive (0.60 strike% – 0.40 ball%) * 0.40 ball% = 0.08 to his expected call score. Had he called it a ball, he would have been docked (0.60 strike% – 0.40 ball%) * 0.60 strike% = 0.12 from his expected call score. This scheme weights calls in a reasonable manner based on how “easy” the call should have been, including attributing no value to calls made on pitches that are always called strikes or balls.
As Bryan Cole pointed out in the comments of this post, the formula works out to:
(2p – 1) * (c – p)
where p is the probability of the pitch being called a strike in the aggregate, and
c is the home plate umpire’s call (1 for strike, 0 for ball)
Using this method, I calculated separately the sum of each umpire’s expected score (agreeing calls) and their unexpected score (disagreeing calls). Once I had the expected and unexpected scores, I calculated a ratio of expected to unexpected scores and then converted each ratio to a “plus” stat, Expected+, by dividing by the league average ratio. The ratio puts umpires with different numbers of opportunities on the same scale, and then the “plus” stat puts scores on a scale where 100 is league average. Every point higher or lower than 100 means one percent more expected or less expected the umpire’s calls are behind the plate than the average MLB umpire.
It is important to understand what this metric is measuring. Note that I use the word expected. It is ascribing positive value to calls that agree with the league majority for a given pitch location and negative value otherwise. This does not necessarily mean the call is correct, based on the rule book strike zone, which means this does not necessarily identify the best home plate umpire. An umpire may have a unique strike zone he calls quite consistently, and there is certainly an argument to be made that being consistent in any one zone is fine. However, this metric rates umpires on how well their calls agree with what is being called around the league.
In my opinion, given that all home plate umpires are evaluated and provided with feedback after each game, over the course of an entire season the aggregate of all called pitches is a good proxy for the strike zone the league wishes to have called. I also believe there is value is understanding how much an umpire tends to differ from the league average as far as the calls he is making, in terms of how expected his calls are and how often his unexpected calls are strikes on pitches that are more commonly balls and vice versa.
I calculated these numbers for all umpires in both the 2015 and 2016 seasons. Taking the seventy umpires who worked the most behind home plate over both seasons, there was a correlation of 0.66 in the Expected+ scores between the seasons. This suggests calling balls and strikes per the typically called league zone has a sizable degree of skill that is repeatable between seasons for umpires.
Here are the results of the 2016 season for Expected+ for all umpires.
Umpire Name | 2016 Expected+ |
---|---|
Jim Joyce | 139 |
Mark Ripperger | 135 |
Pat Hoberg | 132 |
James Hoye | 128 |
Chad Fairchild | 127 |
Toby Basner | 125 |
Mark Carlson | 125 |
Ben May | 124 |
Alan Porter | 123 |
Greg Gibson | 122 |
Bill Welke | 121 |
Roberto Ortiz | 118 |
Quinn Wolcott | 116 |
D.J. Reyburn | 115 |
Eric Cooper | 114 |
Todd Tichenor | 112 |
Adam Hamari | 111 |
Mike Muchlinski | 111 |
Sam Holbrook | 111 |
Phil Cuzzi | 110 |
Gabe Morales | 110 |
Stu Scheurwater | 110 |
David Rackley | 110 |
Tony Randazzo | 109 |
Marvin Hudson | 108 |
Brian Knight | 108 |
Chris Guccione | 106 |
Jerry Meals | 106 |
Jim Reynolds | 106 |
Manny Gonzalez | 105 |
Will Little | 105 |
Mark Wegner | 105 |
Alfonso Marquez | 105 |
Cory Blaser | 105 |
Jeff Kellogg | 104 |
Sean Barber | 104 |
John Tumpane | 104 |
Brian Gorman | 104 |
Paul Emmel | 103 |
Mike Estabrook | 102 |
Fieldin Culbreth | 100 |
Brian O’Nora | 98 |
Bill Miller | 98 |
Ramon De Jesus | 97 |
Chris Conroy | 97 |
Doug Eddings | 97 |
Joe West | 97 |
Mike DiMuro | 97 |
Ryan Blakney | 97 |
Tripp Gibson | 96 |
Tim Timmons | 96 |
Scott Barry | 95 |
Paul Nauert | 95 |
Mike Everitt | 94 |
Dan Bellino | 94 |
Dan Iassogna | 94 |
Jim Wolf | 93 |
Chris Segal | 93 |
Marty Foster | 93 |
Dana DeMuth | 92 |
Ted Barrett | 92 |
Vic Carapazza | 92 |
Gerry Davis | 91 |
Chad Whitson | 91 |
Gary Cederstrom | 90 |
Adrian Johnson | 90 |
Clint Fagan | 89 |
Laz Diaz | 89 |
Jeff Nelson | 89 |
Jerry Layne | 88 |
Mike Winters | 88 |
Carlos Torres | 88 |
Larry Vanover | 87 |
CB Bucknor | 86 |
Rob Drake | 85 |
Jordan Baker | 84 |
Lance Barksdale | 84 |
Tom Woodring | 84 |
Nic Lentz | 83 |
Andy Fletcher | 83 |
Ron Kulpa | 83 |
Hunter Wendelstedt | 82 |
Lance Barrett | 82 |
Angel Hernandez | 82 |
Kerwin Danley | 82 |
Tom Hallion | 81 |
Ed Hickox | 81 |
John Hirschbeck | 78 |
Bob Davidson | 77 |
Dale Scott | 72 |
Aside from this metric, I also drilled down into the unexpected calls made by each umpire to see the ratio of scores from pitches they called strikes when the league typically called the pitch a ball, and called balls when the league majority was a strike. This acts as somewhat of a proxy for strike zone size, as home plate umpires who call more unexpected strikes relative to balls than normal would tend to have what we perceive as a larger strike zone, and a smaller-than-average ratio would tend to indicate a smaller strike zone.
Once again, I adjusted the ratios to a “plus” stat. Here are the unexpected strike-to-ball ratio scores, or Unexpected S:B+, for the 2016 season:
Umpire Name | 2016 Unexpected S:B+ |
---|---|
Bill Miller | 231 |
Jim Wolf | 214 |
Bob Davidson | 184 |
Brian Gorman | 175 |
Roberto Ortiz | 172 |
Doug Eddings | 166 |
Stu Scheurwater | 166 |
Lance Barrett | 164 |
Mike Estabrook | 162 |
Chris Segal | 157 |
Eric Cooper | 148 |
Hunter Wendelstedt | 146 |
CB Bucknor | 137 |
Will Little | 137 |
Ben May | 135 |
Kerwin Danley | 133 |
Mike Everitt | 130 |
Tripp Gibson | 128 |
Andy Fletcher | 126 |
Ed Hickox | 125 |
Ted Barrett | 123 |
Nic Lentz | 120 |
Ron Kulpa | 119 |
Jeff Nelson | 116 |
Fieldin Culbreth | 115 |
Mike DiMuro | 115 |
Dan Iassogna | 114 |
Jerry Layne | 113 |
Jeff Kellogg | 111 |
John Hirschbeck | 111 |
Quinn Wolcott | 109 |
Cory Blaser | 109 |
Tim Timmons | 108 |
Phil Cuzzi | 106 |
Marty Foster | 106 |
Vic Carapazza | 105 |
Toby Basner | 105 |
Marvin Hudson | 102 |
Jim Reynolds | 101 |
Brian Knight | 98 |
Carlos Torres | 98 |
Adam Hamari | 98 |
Adrian Johnson | 98 |
Mark Ripperger | 97 |
Tony Randazzo | 95 |
Angel Hernandez | 94 |
Brian O’Nora | 92 |
Mike Winters | 90 |
Rob Drake | 86 |
Lance Barksdale | 86 |
Paul Emmel | 86 |
Laz Diaz | 86 |
Chris Guccione | 85 |
Gabe Morales | 84 |
David Rackley | 83 |
Gary Cederstrom | 83 |
Jim Joyce | 81 |
Dan Bellino | 81 |
Jordan Baker | 81 |
Chad Fairchild | 81 |
Chris Conroy | 80 |
John Tumpane | 80 |
Dana DeMuth | 78 |
Mike Muchlinski | 75 |
Sean Barber | 75 |
Dale Scott | 72 |
Sam Holbrook | 70 |
James Hoye | 70 |
Ramon De Jesus | 69 |
Clint Fagan | 68 |
Alan Porter | 67 |
Chad Whitson | 66 |
Mark Wegner | 66 |
Pat Hoberg | 64 |
Ryan Blakney | 64 |
Bill Welke | 63 |
Jerry Meals | 63 |
D.J. Reyburn | 61 |
Todd Tichenor | 61 |
Joe West | 60 |
Paul Nauert | 59 |
Tom Hallion | 58 |
Greg Gibson | 57 |
Gerry Davis | 56 |
Scott Barry | 54 |
Manny Gonzalez | 54 |
Larry Vanover | 53 |
Alfonso Marquez | 52 |
Mark Carlson | 49 |
Tom Woodring | 29 |
Note that Tom Woodring did not work many games behind home plate, so his extremely low score here is a small sample size. Do not take this to mean Bill Miller’s strike zone is 131 percent larger than the league average! Obviously, this could not be the case. This means that on calls Miller makes that are counter to the league norm, he is much more likely to be calling strikes when most umpires call balls than balls when most umpires call strikes.
The correlation between 2015 and 2016 scores for the busiest seventy umpires was 0.69, meaning once again this is an aspect of game calling that umpires do seem to carry significantly from season-to-season.
There was a correlation of -0.30 between Expected+ and Unexpected S:B+ in 2016, meaning there was value in having a slightly smaller zone this year in trying to conform to the league majority. This correlation was only -0.11 in 2015. Umpires tend to make more unexpected calls by calling pitches that are typically called balls as strikes than the other way around, so umpires that are less susceptible to this pattern tend to have marginally higher Expected+ scores under this system.
The most interesting home plate umpire to me after undertaking this exercise is Mark Ripperger. His Expected+ score in 2015 of 156 was by far the highest of any umpire in either of the last two seasons, with a difference between his score and the second-place score greater than the difference between second place and thirty-third place that season. Ripperger followed that up with the second-highest Expected+ score in 2016. He seems to have an excellent grasp on the strike zone being called in the league right now.
Another fun exercise I tried was looking at the most expected called game of the 2016 regular season. The game that had the best pitch calling with respect to matching the league aggregate was almost a perfect game from Brian Knight! There was only one “unexpected” pitch call, which was a called ball on a pitch location that was called a strike 51 percent of the time over the course of the season.
2016 World Series
Here is a game-by-game view of the home plate umpires assigned for the World Series games this season based on the perspective offered from these metrics. You’ll notice these umpires do not all call the most expected strike zones out of the set of umpires working in 2016. As I mentioned earlier, umpires’ jobs are very difficult, and in this article we have only been examining them from the lens of pitch calling. They are also responsible for game management, calling plays at bases, fair/foul judgments, and much more. I would expect the league would base the selection process for umpires on their entire body of work and include seniority and other undoubtedly factors when considering postseason assignments.
(Editor’s note: This article was written before the World Series began. We thought readers would be interested in knowing how all the scheduled home plate umpires rated.)
Game One: Larry Vanover
Expected+: 73rd (out of 90)
Unexpected S:B+: 87th (out of 90)
Vanover is at the extreme end among MLB umpires with respect to calling pitches typically called strikes as balls. His small zone tends to favor the hitter, and thus may be a challenge to navigate for Corey Kluber and Jon Lester. According to Baseball Prospectus, the strikeout-to-walk ratio in games he worked this season was 79th out of 90. Aside from a small zone, his calls did not line up as well with the expected MLB zone as most home plate umpires this season. Vanover worked Game Three of the NLDS between the Cubs and Giants.
Game Two: Chris Guccione
Expected+: 27th (out of 90)
Unexpected S:B+: 53rd (out of 90)
Guccione has a slightly smaller strike zone than average, as well, but he called a somewhat better-than-average expected zone. His results from 2015 were almost identical, so his calling pattern appears to be consistent. Guccione worked Game Two of the NLDS between the Dodgers and the Nationals.
Game Three: John Hirschbeck
Expected+: 88th (out of 90)
Unexpected S:B+: 30th (out of 90)
Set to retire after the season, Hirschbeck is the crew chief for the World Series. He seems to call one of the more unique zones, as his calls rated as one of the most unexpected in the game this season. He still calls one of the larger zones in the game, although it was less extreme this season than in 2015. Baseball Prospectus has his strikeout-to-walk ratio as 21st highest out of 90. This is his fifth World Series assignment and a nice way to complete his final season.
Game Four: Marvin Hudson
Expected+: 25th (out of 90)
Unexpected S:B+: 38th (out of 90)
Hudson rated as higher than average as far as his pitch-calling matched with the typical league zone this season. His metrics are also very similar to the previous season, so he seems to have settled into a relatively consistent pattern of calling pitches. Hudson was behind the plate for Game Four of the NLDS when the Cubs knocked out the Giants, so John Lackey already has pitched to his zone in this postseason.
Game Five: Tony Randazzo
Expected+: 24th (out of 90)
Unexpected S:B+: 45th (out of 90)
Randazzo also called a zone that matched the aggregate zone quite closely. His unexpected calls tend to be extra strikes and extra balls near the league average ratio. Randazzo was the home plate umpire for Game Three of the ALDS between Cleveland and Boston, the Red Sox’s final game of the 2016 season, when Josh Tomlin made the start.
Game Six: Joe West
Expected+: 47th (out of 90)
Unexpected S:B+: 80th (out of 90)
West called more pitches unexpectedly as balls this season than most umpires, meaning his strike zone was smaller than most. Baseball Prospectus noted his strikeout to walk ratio as 71st out of 90. This is his sixth World Series assignment, joining John Hirschbeck as the most experienced World Series umpires working this year.
Game Seven: Sam Holbrook
Expected+: 19th (out of 90)
Unexpected S:B+: 67th (out of 90)
Holbrook starts the World Series as the replay umpire before moving onto the field for Game Three. His pitch calling rated well as far as calling to the typical MLB zone this season. Holbrook worked Game Three of the ALDS when the Blue Jays finished the sweep of the Rangers.
Enjoy the World Series everyone!
UPDATE: Here are the results for the first two games.
Game One:
Expected+: Top 68% of games from 2016
Unexpected S:B+: Top 58% of games from 2016
Game Two:
Expected+: Top 9% of games from 2016
Unexpected S:B+: Top 4% of games from 2016
References and Resources
All data from Fangraphs unless otherwise noted.
Thanks Jon, really interesting stuff.
Is there anything to the notion that umpires can be (subconsciously?) biased towards the home team?
So could you split the stats between calling for home team and away team to see which umpire sees the biggest change?
Cheers.
Yes I actually looked at home field advantage in this recent article: http://www.hardballtimes.com/the-2016-strike-zone/
There has been a small advantage for the home team in every year of the PITCHf/x era, and the magnitude of that advantage appears to grow significantly as game leverage increases.
I did not find any correlation from season-to-season for umpires showing tendencies toward more home field advantage than others, so I don’t think it is predictive in that way. I suspect it is just human nature and ends up being circumstantial.
The home plate ump tonight was clearly trying to cheat the Indians, he made 17 wrong calls on balls and strikes against the Indians. It was pathetic. I live in Chicago, at least I am honest.
Yes dude, he was trying to “cheat” the Indians…in a World Series game. Because that’s what these guys do…they go out there with complete disregard for their jobs in the grandest setting in baseball and intentionally “cheat” teams. In a given game they see close to 200 of the fastest and dirtiest moving pitches from best pitchers in the game and have to make judgement calls where a half inch is the difference between a ball and a strike. Had an off night, maybe, but he certainly was not trying to cheat anyone.
Jarod, you should stick to coaching t-ball as you obviously are not a baseball guy, dude.
1/2 inch off the plate? Wow, you must have pretty bad eyesight. Those pitches clearly all went against the Indians and was so one sided it was pathetic.
The TV strike zone is a one dimensional picture frame that I believe is super imposed in front of the plate. However, the strike zone properly defined is three dimensional and any ball that enters the “box” is a strike. Thus, a breaking ball could look to be outside as it traverses the strike zone but then break into the zone behind the imaginary box. I always wondered if a pitcher could develop a pitch that was thrown high in the air and then dropped down into the strike zone at a perfect vertical angle. It would enter the strike zone at the top and then hit the center of home plate. This pitch would be impossible to hit. I remember Steven Talbot, a Yankee pitcher in the Horace Clarke era, throwing a blooper pitch that got a lot of laughs but obviously the pitch (and him) never made it.
Yes you’re right it is a 3-D zone in reality, so “backdoor” strikes are possible. There is a good article on this subject here: http://www.hardballtimes.com/analyzing-the-strike-zone-as-a-three-dimensional-volume/
Do I interpret the Game Results right by saying that Game 1 was average or a touch better than average on both metrics, while Game 2 was a pretty bad outlier having a consistently smaller than typical zone?
Sorry I wrote the update in the middle of the night, so I get that may not be clear. What I did was order all games from the 2016 season based on expected zone and unexepected strike:ball ratio.
Game 1 was below average as far as matching the correct zone, as it slotted in 68% of the way down the list. The unexpected strike:ball ratio was also smaller than average. So both of these lined up with the 2016 numbers for this umpire.
Game 2 was much above average, as it appears less than 9% of the way down the list. This lined up with this umpire being above average for expected calls in 2016. The game was also very high on unexpected strike:ball ratio, which was abnormal.
Hopefully that makes more sense?
The Sutton & Barto book is indeed the earliest mention I have found of the hashing-trick. Well sp#eotd!Dtn&o8217;t we need to choose the size of the hashing range (i.e. bit mask) as an a priori model complexity parameter?
Aren’t we talking about Steve Hamilton’s “Folly Floater” in the exchange re the 3-D strike zone?
Ah yes! How could I confuse Hamilton for Talbot. It was Fred and not Steve Talbot. But Steve Hamilton was definitely the player who threw what purported to be a vertical strike.
Jon:
Great work, as usual. Is it possible to give us a sense of how large the difference is between high and low Unexpected S:B+ umpires, in terms of their average impact on a game? For example, could you tell us how many more/fewer strikes are called per game, and how R/G compare, between the top 30 and bottom 30 umps (or whatever grouping you think is appropriate)?
This is really interesting. Good work.
One thing to note is that the negative correlation between Expected+ and Unexpected S:B+ is probably expected and not necessarily a sign that a smaller zone correlates with conformity to the league-wide zone. That’s because Unexpected S:B+ uses a ratio with called balls as the denominator, which means it won’t give symmetrical scores for high-strike-call umps and high-ball-call umps.
For example, say the average ratio is 1:1 and you have two umpires, one of whom calls 100 strikes and 50 balls, and the other 50 strikes and 100 balls. The former will have a ratio of 2:1 for an Unexpected S:B+ of 200, while the latter will have a ratio of 1:2 for an Unexpected S:B+ of 50, which is closer to the average Unexpected S:B+ score than the first ump. Because both are equally extreme in their ball-strike tendencies but the high-strike-call ump has a more extreme Unexpected S:B+, you’ll probably get a negative correlation even in the absence of any relationship between strike zone size and Expected+.