Do Catchers Have an ERA?

Editor’s Note: This article has been reprinted from The Hardball Times Baseball Annual 2011. You can purchase a copy at our Bookstore.

Hell of a situation we got here. Two on, two out, your team down a run and you’ve got the chance to be the hero on national television… if you don’t blow it. Saw your wife last night. Great little dancer. That guy she was with? I’m sure he’s a close personal friend, but tell me, what was he doing with her panties on his head?”
(Batter pops up)
—Jake Taylor, Major League

How much can a catcher help his pitcher’s performance through game-calling, receiving and framing pitches? The conventional sabermetric wisdom, at least since Keith Woolner published “Field General or Backstop” for Baseball Prospectus in 1999, has been that catchers have no effect on pitcher performance, or at least so little that it is impossible to measure.

There are many who hold the opposite opinion, such as Craig Wright (whose article in this book two years ago highlighted Mike Piazza’s impact on pitcher ERA, which we’ll call “catcher ERA”). Japanese baseball teams, according to Wright, also believe that catcher ERA is an important statistic.

Baseball managers obviously believe that catcher’s game calling is important. Consider Mike Scioscia, who plays Jeff Mathis and his woeful bat almost as often as he plays Mike Napoli—he of the .840 lifetime OPS. Unless you believe catchers have an impact on ERA, there is no reason to play Mathis as often as Scioscia does. I’ll return to that combination in a few minutes.

For the past 10 years or so I have subscribed to Woolner’s conclusions, but I can no longer do so. I believe his sample sizes were too small, and I think I’ve found a way to get around that and measure the unmeasurable. Before I get into the details, here are summaries of prior research on catcher game calling skill:

In The Diamond Appraised, Wright introduced the Catcher ERA (CERA). Craig looked at matched innings, so if Mathis caught Ervin Santana for 150 innings, and Napoli caught him for 35, the Mathis innings would be pro-rated to 35. This was done for every pitcher on the staff, always using the lesser of the innings caught. He summed the pitcher/catcher pairs, and showed how many runs each catcher saved in comparison to other catchers on the staff. The chapter is well worth reading, as unlike myself Craig (employed by the Texas Rangers at the time, having access to the players themselves) investigates what the catchers were doing that helped or hurt their pitchers.

Framing a pitch is one thing catchers can do. With enough skill, they can catch a pitch a few inches off the strike zone in such a way to make the umpires think it’s a strike. This is not easy, because if you are obvious about it the umpire might even give you fewer breaks. The key is to catch it so naturally that the umpire doesn’t think you’re trying to sell a strike, he simply thinks it is a strike. The less catchers have to move for a pitch, the more likely it will be called a strike—if the catcher doesn’t have to move then the pitch was going where it was supposed to.

On low pitches, catchers try to catch the pitch without turning the glove down. According to Wright, holding the glove closer to the body can help a catcher—the umpire is less able to see the glove, but if the catcher can receive the ball without much movement a strike call is likely. A bad habit is to drop the glove—give a sign, move the glove down, and bring it back up to catch the ball. This has two bad effects—the pitcher loses his target and the umpire sees too much movement.

The question that Craig does not answer is how much of the observed difference between one catcher and another is skill and how much is just luck. If Santana pitches an eight-inning shutout one start, and the next time out gives up seven runs in four innings, how much of that can we really attribute to the catcher? If there were no difference at all between catchers, pitchers would still have good and bad starts. Statistical methods can help us estimate how much we can trust the observed data.

An issue with catcher ERA is that it is a total contribution statistic. Yadier Molina can help his CERA in many ways: (1) Calling for the right pitch to the right hitter; (2) Receiving a borderline 3-2 pitch in such a way that it looks like a strike to the umpire, getting his pitcher out of the inning instead of loading the bases for Joey Votto; or (3) Throwing out base stealers, picking them off and blocking bad pitches in the dirt.

This last aspect of catcher defense is easy to measure. In fact, we have a good handle on who the best catchers are in blocking balls and controlling the running game. What we want to find out is how much the catcher, with his interaction with the pitcher, can influence the outcome of the pitcher/batter matchup?

To do this, Woolner looked not at catcher ERA, but at batter OPS with and without the catcher, for each pitcher. He computed Z-scores, or standard deviations, to see if the observed differences in catcher/pitcher OPS were different from what one would expect by chance. He found the data looks almost identical to a bell curve; in other words, the differences observed among catchers appeared to be random.

My Method

Instead of looking at batter OPS, I am going to focus on statistics that a pitcher can control. Strikeouts, walks and home runs are the standbys of Defense Independent Pitching (DIPS), and I’m adding pop flies and line drives to the mix, as classified by MLBAM scorers and available in the Retrosheet event files. From these statistics I create an estimate for runs allowed that may differ from actual runs allowed, but is less dependent on the support of the rest of the defense. To estimate pitcher runs:

First, find out how many at-bats do not result in a strikeout, home run, line drive or pop-up. Call this AB1. Second, plug in the following formula for runs:

AB1 *.05 + HR * 1.4 + LD *.38 + BB * .33 + HBP * .345 – SO * .105 – Pop * .096

Now, use matched innings to see how each pitcher did with and without each catcher, and pro-rate the “with” and “without” stats to the lower of the plate appearance totals.

Going back to our example, say Santana pitches 200 innings and allows 80 runs, and Jeff Mathis is his catcher for 150 of those innings, allowing 51 runs. Without Mathis, Santana therefore allows 29 runs in 50 innings. Prorate to the lesser of innings (actually in the real model I use plate appearances instead of innings, but never mind that). What we wind up with is 50 prorated innings with, and 17 runs allowed, and 50 without, and 29 runs, so Mathis saved Santana 12 runs.

Do this for every pitcher for whom the catcher is behind the plate, and total the runs saved. Do this for a multi-year period, splitting the results into even years (2004, 2006, 2008) and odd years (2003, 2005, 2007, 2009). If we are looking at a skill, then catchers who are good in even years should be good in odd years. Using multiple years grouped together in this way increases our sample size, and allows us to observe a correlation that was just too hard to detect when looking only at single consecutive years.

The Results

Looking at catchers who had at least 2,000 matched plate appearances in both odd and even years, I have 70 catchers, and find a correlation of 0.21. This group has an average of 4,343 plate appearance in each year category, which implies that you would have a correlation of 0.50 (where half of the observed difference is skill and half is luck) when you have 16,300 matched plate appearances, about three years of fulltime
catching.

So, to estimate the game-calling skill of a catcher, add 16,300 of average performance to his matched innings. This is what I’ve done in the results below. A catcher will see about 5,000 plate appearances in a typical season of 130 games caught, so I’ll display the results per 5,000 plate appearances.

The best game-calling catchers, from 2003 to 2009:

Catcher PA Regressed Runs/5000 PA
Pratt, Todd 5,588 14.3
Snyder, Chris 9,375 13.2
Burke, Jamie 4,078 13
Castro, Ramon 8,829 13
Molina, Jose 13,241 12.9
Saltalamacchia, Jarrod 6,009 12.9
Lopez, Javier 6,525 12.8
Paulino, Ronny 6,695 12.1
Laker, Tim 3,158 10.9
Mirabelli, Doug 4,431 9.4

And these catchers were the worst:

Catcher PA Regressed Runs/5000 PA
Navarro, Dioner 10,509 -9.3
Montero, Miguel 5,074 -10.5
Schneider, Brian 13,773 -11.5
Wilson, Vance 7,199 -11.7
Greene, Todd 5,516 -12.4
Johjima, Kenji 7,177 -12.4
Hall, Toby 9,652 -14
Lieberthal, Mike 7,107 -14.8
Posada, Jorge 11,961 -14.8
Martinez, Victor 11,294 -15

Here’s a list of other notable catchers:

Catcher PA Regressed Runs/5000 PA
Piazza, Mike 7,956 7.4
Rodriguez, Ivan 14,859 7.1
Kendall, Jason 10,226 4.8
Ausmus, Brad 7,788 4.3
Mathis, Jeff 6,921 4.1
Pierzynski, A.J. 11,154 2.4
Mauer, Joe 11,063 -0.9
Molina, Yadier 10,360 -1.3
Varitek, Jason 9,102 -2.1
McCann, Brian 8,061 -3.1
Napoli, Mike 9,030 -5.9
Molina, Bengie 13,633 -7.6

Napoli or Mathis?

Let’s talk again about Mike Napoli and Jeff Mathis. When Bengie Molina left the Angels after the 2005 season, Mathis was handed the starting catcher job. After 12 games and batting only .103, Mathis was sent back to the minors and replaced by Napoli. Napoli homered off Justin Verlander in his first major league at bat, starting a two-month offensive tear. He finished the year in a slump, hitting .228 overall and striking out in a third of his at bats, but Napoli showed he could provide offensive value with his patience and power.

Since then, Napoli has played only a little more than half the time behind home plate, with Mathis reclaiming part of the job despite a .200/.277/.320 career batting line through 2009, and even worse numbers in 2010. It’s not just a matter of injuries and keeping a catcher fresh throughout the season. Mathis even starts playoff games—six out of 16 from 2007 to 2009, though I can’t complain about his 7-for-12, five-double showing against the Yankees last fall.

For their careers, Mathis is 30 batting runs below average per 500 plate appearances, and Napoli is 13 runs above average. On the easily measurable aspects of catcher defense (stolen bases and blocking wild pitches), Mathis is two runs below average per 1,200 innings caught, and Napoli is eight runs below average. They have virtually identical records against the stolen base, but Mathis does a better job at preventing wild pitches, and also picks off more runners from the bases than anyone not named Yadier Molina.

Add up offense and defense, and Napoli is 37 runs better than Mathis per 130 games caught. That is a lot of runs Mathis would have to save through game calling to justify playing ahead of Napoli. Looking at the regressed runs per season, Mathis is 10 runs better than Napoli. If you look at the unregressed total and give Mathis every benefit of the doubt (which is as wise as assuming that a rookie you’ve never seen before who comes up and hits .330 for two months is in fact a true .330 hitter, the next Wade Boggs) you’d have Mathis at +13.9 and Napoli at -16.6. That’s 30 runs there, and comes close to making up the difference in batting/fielding runs, but still falls a bit short.

That does represent a little double counting though, since Mathis and Napoli represent so much of each other’s without—the numbers say Mathis has been 13.9 runs better than the other catchers on the Angels staff, in other words 13.9 runs better than Napoli. There is almost no way you can make a reasonable case for Mathis to continue getting the playing time he has received. It is understandable that, given below-average defense (probably 15 runs below average all told), you want a better defensive catcher than Napoli, but Mathis costs you too many runs with the bat to justify his superior defense. He’d have to be the best defensive catcher on the planet to overcome his bat. Overall (counting game calling, blocking and throwing), Mathis is merely a slightly above average defender.

2010 Data

There were six situations in 2010 in which a team split their catching duties between a catcher who has ranked well in this metric and one who has rated poorly. In the case of the Red Sox catchers, Varitek does not rate well, but is substantially ahead of Martinez. I’ll throw this in just to see how the ratings hold up outside of the years I used to test it.

I picked six pairs of catchers based on how they rated in this metric, before looking at any of their 2010 stats. Of the six pairs, in five cases the one pre-identified as the superior catcher had a lower catcher ERA than his teammate. I’ll wait until after the season to do a full comparison controlling for matched pitchers and using defense independent stats, but this is an indication that the patterns identified persist in 2010.

Team Good Catcher CERA Bad Catcher CERA
Angels Mathis 4.04 Napoli 5.07
Blue Jays J. Molina 3.16 Buck 4.42
Yankees Cervelli 3.72 Posada 4.05
Red Sox Varitek 4.01 Martinez 4.29
Reds Hanigan 3.27 Hernandez 4.70
Diamondbacks Snyder 5.39 Montero 4.53

Rookie Catchers: Are They at a Disadvantage?

Next, I looked at catcher/pitcher matchups for the catcher’s debut year, and compared the results to how that catcher/pitcher pair did in later years. There were 416 catcher/pitcher pairs, with more than 30,000 matched plate appearances, from 2003 to 2009. In their debut years, the catchers allowed 4.54 runs per game. In later years, they allowed 4.53, so there is apparently no disadvantage to having a rookie receiver.

This conflicts with what was found by Tom Hanrahan and published in SABR’s By the Numbers in November 2004, though that appears to be a result of different approaches to the study.

I tried another approach and found something that supports his study: I grouped the catchers by age, splitting them into approximate thirds. The first group was born before 1974, and were a run better than average per season. The catchers born between 1974 and 1979 were about average, and the catchers born after 1979 were 2.2 runs below average per 5,000 plate appearances. Game calling does appear to be a learned skill; catchers get better with experience.

Tom’s study was different than mine. He looked at pitchers who threw at least 100 innings during a catcher’s first season, and how those same pitchers did if they threw 100 innings in later years with that catcher as the starting catcher. From what I can tell, he looked at the pitcher’s season ERA, not the ERA with that specific catcher. His sample included 26 catchers and 90 pitcher/catcher pairs.

Hanrahan found that pitchers performed better when that catcher was a veteran than when he was a rookie. His study is interesting, and if I don’t get around to it myself I’d like to see it repeated with two changes: First, look at all rookie catchers, not a small sample (the 26 were from 1946 to 2003) and second, look at how these pitchers did while specifically pitching to that catcher.

Other Approaches

Dan Turkenkopf in April 2008 used the PITCHf/x data to see how often a pitch in a certain location was called a strike and how many extra strikes a catcher added. The effect was large, and he did find a strong correlation between performance in the first and second halves of the season. The study used data from the 2007 season.

Bill Letson added similar research in 2010. He used the PITCHf/x pitch location, batter height and handedness, and identity of the umpire to build a model that predicts the likelihood that a pitch will be a strike. Then he compared the catcher’s actual record of strikes to the predicted total. As with Turkenkopf, he found a large effect, though he left open the possibility that some of that is related to the pitcher.

Studies like these are better for isolating one part of the catcher’s contribution: framing pitches and gaining strike calls through receiving abilities. They don’t tell the whole story though. Is the catcher calling for the right pitch in the right situation? Is he exploiting the hitter’s weaknesses? Utilizing the pitcher’s strengths? A results-based study like the one I have presented here will give a general sense of how well a catcher does for all of this, but can’t break it down into details to tell you what a catcher does well or poorly.

Warnings

I think this study shows that there are repeatable differences in how pitching staffs perform with different catchers. There is a skill involved in either proper catching technique or game calling that impacts the scoreboard.

However this is not something that we can put a number on and have the reliability that compares to a hitter’s batting contribution. This is merely a rough estimate that requires some common sense.

In an ideal world for an analyst, every catcher in the league would get to work with every pitcher in the league for a few thousand innings. From that it would be very easy to determine which catchers were the best. In reality, we are not comparing a catcher to a league average, but to only the other catchers on his team.

If one catcher has a few veteran defensive specialists as his backups, and another is backed up by sluggers who just happen to wear the mask, the first catcher is going to look worse than the second even if they are in fact equal. A dose of common sense is required to interpret a catcher’s defensive record.

Another issue is that for whatever reason, a catcher who is good overall might not be the best catcher to work with a specific pitcher. The matched-inning samples for specific pitcher/catcher matchups aren’t big enough to reliably pick this up—that’s why Woolner wasn’t able to detect a skill. A manager would have to use his discretion to make that call.

Conclusion

Catchers have a significant impact on the performance of the pitchers they catch. We need several years of data to have a reliable measure of this effect. In the future, studies like mine will probably be obsolete as the data available from PITCHf/x is more precise. Whatever the data source used, it is important that we recognize the value that outstanding game calling catchers can provide, and to understand the spread of talent among catcher skill so decisions can be balanced between a catcher’s defensive and offensive ability.

References:


newest oldest most voted
George Ryan Holton
Guest
George Ryan Holton

Hey Shawn,
Any chance you have an idea of how to do this with OF, and to see how their performance differs depending on what other OF are out there? Knowing how I would adjust my play depending on who was playing with me (albeit in slowpitch softball), I wondered if others do the same.

George Ryan Holton
Guest
George Ryan Holton

Anddddddddddddddd I spelled your name wrong. Classic. 100% my bad Sean…

William
Guest

Nice Post.

Thanks for Sharing this.