Umpire statistics

by Dan Brooks
April 3, 2012

We (Harry and I) have released PITCHf/x based statistics for every umpire that has called a PITCHf/x enabled game, provided that they called enough pitches to accurately represent their called strikezone. We sincerely appreciate you taking the time to read this post before using our umpiring data.

Being an umpire is hard. It might be one of the hardest skills in baseball, and it sure doesn’t pay $20 million a year.

Umpires are also damn good at what they do.

But at least in the public domain, there’s been little systematic survey of umpiring. There are several reasons. First, because of the way people would naturally use the data without proper instruction, it would create unnecessary controversy. No one wants to be at the center of a media scandal involving umpiring. And so, consider this your proper instruction: If you use these data to rip umpires, consider yourself an idiot.

It’s true, there are good umpires and there are better umpires, but we’re aiming to show you what umpiring really looks like, not what umpiring fails to do. We want to paint a picture of each umpire’s strengths and weaknesses, of their proclivities for calling particular pitches in particular ways. This is not an argument for computerized or mechanized strike zones; nothing could be further from the truth. Use this data wisely.

Second, umpiring is a difficult skill set to properly describe. To really do it well, you’ve got to have access to a nice database, have things properly classified, and have the right mathematical models to present the data. Here, we would like to extend a warm thanks to Dave Allen, who some five years ago showed us how to apply heat maps to PITCHf/x data in a presentation at Sportvision’s Summit using LOESS (Locally Weighted Scatterplot Smoothing) Regression.

Third, defining the strike zone is notoriously difficult. The problem is that using some average strikezone will probably not be good, because batters vary in height. However, the “easy” solution, which is to use the sz_top and sz_bot parameters from the Gameday data isn’t really a solution at all, because those parameters (as Mike Fast has convincingly shown) vary too wildly between games to give a good estimate of batter height on a per-pitch basis. Here, we’ve chosen to use an equation that looks at the average sz_top scores and weights those by a player’s height.

The strike zone is also technically a three-dimensional volume, and we’ve chosen to define it as a two-dimensional plane at the front of home plate (as we have elsewhere on the site). We realize this introduces error, but the alternative is simply too difficult to represent graphically. So we hope you forgive us here, and understand that this may slightly bias results.

We’re choosing to first present the data in two ways. The first is in a tabular form that reports not only hits (a pitch in the strike zone called a strike), misses (a strike called a ball), correct rejections (CRs, a ball called a ball) and false alarms (FAs, a ball called a strike), but also some psychometric measures of detection: d’ and c. Here’s the example from Angel Hernandez’s card:

While these last two require some explanation, the easiest way to think about them is that d’ represents discriminability (how well an umpire performs; larger is better) and that c represents how biased the umpire was in favor of hitters or pitchers (c<0 = pitcher friendly, c>0 = hitter friendly) on any particular pitch. These measures have not yet been re-normalized, but they will be, so that you get an idea of how friendly a particular umpire was relative to other umpires.

The second is a LOESS Heat Map for each batter handedness split by pitch type. This will give you the ability to tab through and see the differences in the strike zones called by each umpire in a more graphical way. Here’s Angel Hernandez’s strike zone:

We hope you enjoy these statistics and use them responsibly. Be an educated, informed fan who recognizes that ripping an umpire because he missed a call is often a selfish act with little justification. Often, looking at the bigger pattern of data can give you more answers than a single pitch.

Please feel free to direct any feedback to Dan Brooks (@brooksbaseball) or Harry Pavlidis (@harrypav) on Twitter, or by commenting below.

BAL	CHW	ATH
BOS	CLE	HOU
NYY	DET	LAA
TBR	KCR	SEA
TOR	MIN	TEX

ATL	CHC	ARI
MIA	CIN	COL
NYM	MIL	LAD
PHI	PIT	SDP
WSN	STL	SFG