A Short Digression into Log5

by Dan Fox
November 23, 2005

“Nellie was the toughest out for me. In 12 years I struck him out once, and I think the umpire blew the call.” – Whitey Ford, New York Yankees pitcher, on Nellie Fox

In my previous article exploring the significance of batter/pitcher matchups, I used the log5 method in order to calculate the expected average for each matchup. Additional discussion on the results from that article can be found on my blog.

For those who understandably lacked the patience to wade through that rather long tome, Bill James published the method in the 1981 Baseball Abstract in order to analyze how well one team should play against another. That usage of the method has been comprehensively applied to matchups between teams in an article by Tom Tippett at Diamond Mind.

When applied to batter/pitcher matchups, the formula includes the hitter’s average, the pitcher’s average against, and the league average and derives what the hitter should hit against that pitcher. The entire formula is:

ExAvg = ((BAVG * PAVG) / LgAVG) / ((BAVG * PAVG) / LgAVG + ((1-BAVG)*(1-PAVG)/(1-LgAvg)))

I also mentioned that Dan Levitt had written an excellent article back in 1999 for SABR’s By The Numbers newsletter that was reprinted at Baseball Think Factory.

In Levitt’s article he takes a look at data from 1995 to see how well the formula predicts actual matchups. He broke the pitchers and hitters down into three groups (Good, Average, Poor) and then compared the actual results with the formula and found that the formula does a remarkably good job of predicting the result. For example, in the NL, average pitchers against average hitters would have been expected to hit .247, but they actually hit .251.

Since that article covered just one year I thought I’d run a different test of log5, given that I had gone to the trouble of computing over 30,000 matchups for the period 2003-2005 for the previous article.

For my study I broke the results of the matchups into six categories by batting average and then calculated the hitter’s overall average over the three year period, the expected average using log5 and the actual average for the range. What I found confirms what Levitt concluded and that is that the log5 method works remarkably well. The results in table form follow:

 
    Range   Count Hit Avg   ExAvg  Actual    Diff   PctDiff
.000-.199     975   0.208   0.170   0.167   0.004      2%
.200-.249    7379   0.256   0.233   0.234  -0.001      1%
.250-.274    8502   0.268   0.262   0.267  -0.005      2%
.275-.299    7909   0.281   0.286   0.289  -0.003      1%
.300-.324    4175   0.292   0.310   0.312  -0.002      1%
.325-.454    1541   0.308   0.340   0.335   0.004      1%

In each category the differences amounted to less than five points of batting average, or just a 2% difference. The results can be shown graphically as well.

From the graph you can see that when plotted with the hitter’s overall average over the three years, the actual and expected lines match up very well. Where it differs you’ll notice that the log5 method over predicts hitter performance a bit at the extremes and under predicts it in the middle.

BAL	CHW	ATH
BOS	CLE	HOU
NYY	DET	LAA
TBR	KCR	SEA
TOR	MIN	TEX

ATL	CHC	ARI
MIA	CIN	COL
NYM	MIL	LAD
PHI	PIT	SDP
WSN	STL	SFG