The color of clutch

by Tom M. Tango
February 20, 2009

I hope this is the last time I need to write an article devoted to clutch hitting.

The data

There have been many attempts to find evidence of clutch hitting. All of these attempts focus on the same basic principle: compare a player’s performance in timely situations to his overall performance, and determine if that difference is more than expected from random. This has been done in the following ways:

correlations of career performances in odd years to performances in even years
year-to-year correlations
distribution of differences compared to the binomial

In every case, the result is the same: yes, clutch hitting exists. There is no question: clutch hitting does exist. Indeed, as long as you make humans the central participants in contexts that change wildly, it will be a foregone conclusion that the results will not be completely random from our expectation of those participants. Therefore, that we find the existence of clutch hitting is not terribly exciting. It is expected. However, we haven’t established the degree to which it exists, nor have we established the likelihood that we can even find the thing that we know exists.

The test of clutch hitting with the most clarity for illustrative purposes was produced by Nate Silver in Between The Numbers (p. 29), using a method popularized by Keith Woolner: for each player, compare the gap in performances in clutch and non-clutch situations, and total it based on odd years and even years. The idea is that the average gap in the odd years should be roughly the same as the average gap in the even years, for each player. This method does a nice job of removing the age and aging bias. The result is a correlation of r=0.33. The number of PA required in the sample was a minimum of 2500 for each set of even and odd years. We can estimate that the average size of each set to be PA = 3500. In order to get a correlation of r=0.33, with trials=3500, we can produce this equation:

   r=PA/(PA+7000)

This equation means that if you had 7000 PA in each sample, you would get a sample-to-sample correlation of r=.50. If you had PA=3500, then the correlation would be r=.33. For purposes of ballplayers, we usually just focus on a few years. After all, it doesn’t help us to know if Bobby Abreu is a clutch hitter at age 35. We want to know this early on. Realistically, you would want to compare a two-year sample to another two-year sample. That would mean each sample would have some 1000 or 1200 PA. And using our equation above, this would mean we’d get an r=.15.

What does this mean? Well, whatever results your analysis shows as to how much clutch the sample shows, our best estimate of the true rate would be 15 percent of the sample rate. So, if you have figured out that someone has a sample of +13 clutch runs per 600 PA in the clutch (and that is a very very high figure), the regressed value would yield a +2 runs estimate as our true clutch talent. Other attempts as documented in a chapter written by Andy Dolphin in The Book, and on my site yields a similar 2 run estimate. My equation was:

   r = clutchPAs / (clutchPAs + 1250)

And since clutchPAs is 20% of a player’s total PAs, this equation is the same as:

   r = PA / (PA + 6250)

For all intents and purposes, this equation is an almost perfect match to the equation derived from Woolner/Silver. Basically, if you want to find a player’s clutch talent level, you cannot look at his clutch numbers. The sample size simply cannot give you the certainty we need. Clearly, we need to get our noses out of our spreadsheets and watch a game.

Watching a game

Last year, I proposed The Great Clutch Project, which reads in part:

Certainly, we can and should accept that Clutch exists in some form and to some extent—not everything that happens is random variation spinning around a constant centered mean. Even so, there is a limit to how much a clutch skill can change your mean center point. No amount of Clutch will make anyone want to choose Marco Scutaro over Alex Rodriguez. Even if Scutaro is the clutchiest player ever, and A-Rod is the biggest choker ever, when a manager has A-Rod on deck and Scutaro on the bench, he is not going to call back A-Rod to put in Scutaro. It simply won’t happen.

So, even if we grant that the clutch skill exists, its practicality is limited to the extent that it can exist. No one believes that the clutch skill is big enough that he would really choose Scutaro over Rodriguez. Jeter over Rodriguez, though? Maybe.

So, the questions are: How big is the clutch skill; and, in practical purposes, how far can Clutch vault a player over a better hitter who doesn’t have as much?

Realizing that the numbers are of no help to me in determining who is a clutch hitter, I instead turned to the fan. After all, it is the fan that most believes in clutch hitting, and it is the fan who knows a clutch hitter when he sees one. So the project started:

The first task is to find such pairs of hitters for each team. It wasn’t easy. I polled the blogosphere and ended up with over 2,200 votes.

The fans on each team ended up picking a clutch hitter (best exemplified by Jeter, Dustin Pedroia and Placido Polanco), while I picked strictly by the numbers (Rodriguez, JD Drew, Curtis Granderson). I ended up with 36 Clutch players as voted by the Fans, and 36 better overall and less clutchy players, as selected by a forecasting system. Obvious picks that both sides wanted (e.g., Albert Pujols, Vladimir Guerrero, Chipper Jones) were discarded. The forecasting system estimated that, clutch aside, my hitters were .020 wOBA points better than those that the Fans selected. And so, we ended up with:

So, much like Ginger, my hitters have a sizeable advantage. You might think this is not fair, but in each and every case, the Fans preferred their choice to mine. It’s their bed, people. Except that, the Fans’ picks have some intangible quality, like Mary-Ann possesses. And the Fans believe that this intangible quality, this clutch factor, is enough to propel their picks to be at least equal to, if not better than, my picks when the game is on the line.

We have a situation here where both sides agree that, overall, my hitters are better. But, even given that, the Fans decided that their pick would perform better in clutch situations. (A clutch situation is where the Leverage Index is at least 2.0, which occurs roughly 10 percent of the time.)

The results

I called on David Appelman at Fangraphs to track the results for me. And he very generously did. First, let’s see how both groups did overall. My hitters had an 11 point advantage in OBP and 46 point advantage in SLG. Clearly my guys produced better, overall. In wOBA speak, this is roughly a 21 point advantage for my players. Indeed, this is pretty much exactly what the forecasting system expected. That is, before the season started, the forecasting system expected my guys to hit 20 points better than the Fans’ clutch players, overall. And they did.

But, how did both groups do with the game on the line? First thing I noticed is that my guys got alot of IBB. In order to be fair, I removed IBB from consideration when looking at OBP. So the results are as follows: my guys had a six point advantage in OBP and a 27 point advantage in SLG. In wOBA-speak, that translates to around a 12 point advantage for my team over the Fans’ team. So, I think we can say that, yes, the Fans did have some insight into picking clutch players, but it was nowhere near enough to overcome the talent gap I started with. That is, while we can accept that “Fans know clutch”, they don’t know the extent of clutch. That extent is roughly 10 wOBA points (which is 10 OBP points and roughly 15 SLG points).

Is that a big deal? Well, it’s less than the platoon advantage, which is 20 wOBA points. So, when you give consideration to wanting a clutch hitter at-bat, you have to temper your enthusiasm with the understanding that that clutch skill is less than if you had a similar batter with the platoon advantage. No one is going to select Marco Scutaro over Alex Rodriguez. The two players must be pretty close to begin with in talent, before you go off having a preference for your clutch hitter over someone who is otherwise a better hitter.

Fan bias

One thing that was interesting is the kinds of playes Fans considered clutch. Overall, both our teams had a bit over 19,000 PA. Both had around 970 doubles and 80 triples. But my guys had almost 300 more homeruns, and 600 fewer singles. My guys had 500 more walks and 1000 more strikeouts. As I noted in the summary to this project on my blog:

The guys they selected as clutch put the ball in play (excludes HR) 76 percent of the time, compared to my great hitters of 67 percent, in all situations. Those numbers dropped 2 percent points for both groups in clutch situations.

The selection criteria by the fans on this basis was nine standard deviations from the mean, showing a fantastically clear bias in this regard.

It’s very possible that to a fan, clutch is all about doing what Carlos Beltran didn’t do in his last at-bat against the Cards, when he took strike three.

The Fans have a clear bias as to what they think is clutch: put the g-dd-mn bat on the g-dd-mn ball. This bias is best exemplified by Reds fans, as I noted before the season started:

The Reds Fans detest their best hitter (Adam Dunn) so much that they actually selected four different hitters ahead of him. Every time I would check the results, a new leader would emerge. Ken Griffey Jr., Scott Hatteberg and Brandon Phillips each would have made a fine choice, but the task will be taken up by Edwin Encarnacion. (And Javy Valentin was just behind Dunn in fan appreciation.)

In the end, the Fans’ bias is the main insight we gain from this project. The other insight is that the extent of perceived clutch does not match the reality of the impact of clutch. The Fans wanted their clutch hitters batting, even if they were 20 points worse than my hitters. And they lost. But, they didn’t lose by 20 points, just by 10 points. Color me somewhat impressed.

Technical sticklers

For you party poopers, one standard deviation given 1900 PA is 12 wOBA points. So, the observed 10 point clutch skill that the Fans perceived won’t pass any statistical significance tests. The expectation is that if I were to rerun this project for the 2009 season (which I won’t), is that the Fans would not be so lucky. But, let’s not let this technical detail get in the way of the partial win for the Fans.

Let’s let this clutch debate end today (please?), and simply agree that: a) yes, clutch exists, b) yes, fans can perceive clutch players, but c) the impact of clutch players is limited to less than the platoon advantage.

BAL	CHW	ATH
BOS	CLE	HOU
NYY	DET	LAA
TBR	KCR	SEA
TOR	MIN	TEX

ATL	CHC	ARI
MIA	CIN	COL
NYM	MIL	LAD
PHI	PIT	SDP
WSN	STL	SFG