A Meander Through Hitters’ K Rates

One reason why I love baseball is because every so often you read about an astonishing factoid about some player that sets your mind abuzz with questions that you want, nay, need, answered. Not you? Right, I’ll check in with the men in white coats later, but right now your favorite baseball site needs me to finish and submit this article.

Factoid #1

A few weeks ago I was gabbing with a journalist buddy at USA Today toing and froing over every baseball debate under the sun when he slipped in this gem: “Did you know”, he asked, “that in 2005 Placido Polanco (now of the Detroit Tigers but also playing for the Phillies that year) notched up a paltry 25 strikeouts in 501 at-bats?” To save you from whipping out the calculator that is a strikeout/at-bat rate a shade under 5%!

I knew Polanco was stingy on strikeouts but I never knew how stingy. To put that into perspective, in 2005 Polanco struck out once every five games, or a little over once a week. Wow. How impressive is that?

Great question, if I do say so myself.

So, How Impressive is That?

Fortunately, with the plethora of baseball resources available on the Internet, it isn’t too difficult to check. Using the Lahman database it is easy enough to see how Polanco’s 2005 strikeout/at-bat rate compared to that of other hitters.

Strikeouts are a lot more common nowadays than they were in the past, so for the time being let’s restrict our data to the era of six-division play (actually I include 1993 too as it makes the data slightly more interesting). Failure to do this means we’d get legends like Al Spalding heading our list. Al, for those who don’t know, recorded zero strikeouts in 384 at-bats in 1874, which was a massive improvement of his troublesome 1873 season when he struck out once in 322 at-bats. Later on in the analysis we’ll come back and adjust for era.

Full name       Year    AB      K/AB
Tony   Gwynn    1995    535     2.8%
Felix  Fermin   1993    480     2.9%
Ozzie  Smith    1993    545     3.3%
Tony   Gwynn    1999    411     3.4%
Tony   Gwynn    1996    451     3.8%
Tony   Gwynn    1993    489     3.9%
Tony   Gwynn    1998    461     3.9%
Tony   Gwynn    1994    419     4.5%
Juan   Pierre   2001    617     4.7%
Tony   Gwynn    1997    592     4.7%
Ozzie  Guillen  1997    490     4.9%
PlacidoPolanco  2005    501     5.0%
Lance  Johnson  1995    607     5.1%
Juan   Pierre   2004    678     5.2%
Gregg  Jefferies1996    404     5.2%
Juan   Pierre   2003    668     5.2%
Gary   DiSarcina1997    549     5.3%
Jason  Kendall  2002    545     5.3%

Yikes, that has whacked my baseball nose decidedly out of kilter. Polanco doesn’t even crack the top 10—he’s a cool 11th…but look at Tony Gwynn. Gwynn had a reputation for making contact and, let’s face it, he built a first-ballot Hall of Fame career on it, but the consistency with which he appears in the top 10 is nothing short of astonishing. Since 1993 Gwynn appears on five occasions out of 2,500 player seasons (with more than 400 AB).

Factoids #2 and #3

Bored? Here is another factoid: Did you know that Gwynn only had one game in his career where he struck out three times? Yes, that’s one out of 2,440 games that he played. Incredible.

In 1995, his finest season, he struck out a paltry 15 times in 535 at-bats. That is one strikeout every 10 days. If you needed further convincing as to how good Gwynn’s career was, consider the following table that shows career strikeout/at-bat stats for seasons from 1993. I’ve restricted the data to 2,000 or more career at-bats, which is about four seasons’ worth of fullish playing time.

Name      AB (since 1993)   K/AB
Gwynn     3664              4.3%
Jefferies 3203              5.8%
Pierre    4110              6.1%
Johnson   3320              6.2%
Vina      4240              6.9%
Guillen   2845              7.1%
Polanco   3726              7.3%
Young     5987              7.6%
Lo Duca   3274              7.7%
DiSarcina 3112              7.8%
Grace     5258              7.9%
Eckstein  3338              8.5%
Kendall   5759              8.6%
Hall      2107              8.8%
Cora      3024              8.8%

As expected Gwynn comes out comfortably on top with a strikeout rate that is 1.5% better than his nearest challenger, Gregg Jefferies. This is also one of those few categories that Juan Pierre dominates. Who’d have thought it, a list with Tony Gwynn and Juan Pierre in the top five … that’s a third factoid for you.

The Other Side of the Coin

Shooting down the career top 10 it is clear that these guys aren’t prolific sluggers. The old adage that good contact hitters strike out less appears to be true. What about the bottom 10 hitters? Are these guys champion power houses?

Name              K/AB     BA        SLG       ISO
Bellhorn          34.3%    0.231     0.396     0.165
Dunn              32.7%    0.245     0.513     0.267
Hernandez         30.1%    0.254     0.422     0.168
Thome             30.0%    0.284     0.573     0.289
Wilkerson         29.9%    0.252     0.448     0.197
McGwire           29.3%    0.279     0.675     0.397
LaRue             29.0%    0.239     0.415     0.176
Burrell           28.8%    0.258     0.479     0.221
Lankford          28.3%    0.271     0.488     0.217
Wilson            27.9%    0.266     0.461     0.195
Abbott            27.9%    0.256     0.423     0.167
Canseco           27.7%    0.265     0.519     0.253
Becker            27.7%    0.256     0.372     0.116
Buhner            27.5%    0.258     0.512     0.254
Cameron           27.5%    0.252     0.447     0.195
Wilson            27.1%    0.264     0.471     0.207
Hundley           26.8%    0.239     0.464     0.225
Rodriguez         26.8%    0.261     0.489     0.228
Sexson            26.7%    0.269     0.526     0.257

Generally, yes. Sluggers like Mark McGwire and Adam Dunn populate this list so superficially our assertion appears correct. However, we all know that by choosing the appropriate parameters we can manipulate any data set to bend to our hypotheses, so let’s try to be a little more robust. Here are career strikeout/contact/power data for 50-player cohorts since 1993.

Cohort     K/AB      SLG       BA        ISO
0-50       9.6%      0.413     0.288     0.125
50-100     13.0%     0.439     0.287     0.152
100-150    14.6%     0.447     0.281     0.166
150-200    16.2%     0.436     0.278     0.158
200-250    18.0%     0.442     0.273     0.169
250-300    19.7%     0.464     0.274     0.190
300-350    22.7%     0.463     0.269     0.194
350-391    27.0%     0.492     0.265     0.227

We can see that at the extremes the contact hitters strike out less, and the sluggers aren’t shy about racking up some gaudy K/AB rates, but the middle ground is a touch murkier. Why might this be?

First, let’s step back and establish the statistical validity of the relationship between K/AB rate and power. The R between ISO and K/AB is -0.49, so a relationship definitely exists. In statistical speak, this means that for every standard deviation increase in K/AB, ISO moves 0.49 standard deviations in the other direction. What about the repeatability of K/AB rates…how much of a skill is it?

A year-to-year correlation between 2005 and 2006 K/AB for hitters with more than 300 at-bats gives an R squared of 0.73, suggesting that swinging and missing on strike three is mostly a repeatable skill. Compare that to a stat we know to be inherently lucky, such as line-drive percentage, where the R squared is 0.08.

However, if we look back to the ambiguous no man’s land (cohorts 100-250), we see that the average K/AB rates for the different player cohorts are huddled together—with less than 2% difference separating each cohort. That translates to only one additional strikeout every 10 games; add in the statistical noise (random variance) and it isn’t a surprise that the trend isn’t perfect.

A Hardball Times Update
Goodbye for now.

Right, let’s get back to business and try to work out just how good Tony Gwynn is at avoiding the big-K.

Strike Rate Over Time

Tony Gwynn’s performance was pretty darn impressive, especially in an era of strikeout proliferation, but how impressive is it compared to other generations of hitters?

First take a look for how K/AB has varied for every decade in the bigs:

Decade     K/AB (Ave)
1870       4.3%
1880       5.6%
1890       4.7%
1900       N/A
1910       9.7%
1920       8.2%
1930       9.6%
1940       10.4%
1950       13.3%
1960       17.0%
1970       14.9%
1980       16.0%
1990       18.2%
2000       18.9%

Baseball has grown fonder of the clod-hopping slugger as opposed to the fleet-footed speedster, so K/AB rates have increased. Saying that, even in the late 19th century when strikeouts were anathema, Gwynn’s whiff rate would have been squarely in the top quartile when he was in his pomp. We have to go back to the 1870s when soft underarm tossing was the norm to observe K/AB rates on a par with Gwynn’s.

Adjusting for League Average

To pull together an all-time list we must adjust for context. I did this by working out the mean and standard deviation of K/AB for each year; I then ranked sluggers on how many standard deviations they were from the mean of that year. That allows us to identify who has had the fewest adjusted strikeouts in a season.

Name                yearID   Z score
Tony     Gwynn      1998     -2.57
Bob      Lillis     1965     -2.44
Tony     Gwynn      1999     -2.42
Tony     Gwynn      1995     -2.40
Nellie   Fox        1962     -2.35
Tony     Gwynn      1997     -2.34
Rafael   Bournigal  1998     -2.32
Dave     Cash       1976     -2.32
Juan     Pierre     2001     -2.32
Ozzie    Guillen    1997     -2.31
Nellie   Fox        1960     -2.30
Nellie   Fox        1961     -2.28
Glenn    Beckert    1967     -2.27
Gregg    Jefferies  1998     -2.26
Don      Mueller    1956     -2.26
Nellie   Fox        1959     -2.25
Gary     DiSarcina  1997     -2.25
Felix    Fermin     1993     -2.24
Tony     Gwynn      1992     -2.24
Vic      Power      1958     -2.23

The data show Tony Gwynn in a great light. He has five seasons in the top 10 and definitely appears to have a penchant for avoiding the strikeout. But…hang on just a cotton-picking minute. Budding statisticians may be slightly taken aback at the low Z-scores. Remember that a Z-score of 2.5 means we’d expect that data point to be 2.5 standard deviations away from the mean about 2% of the time through luck. Here we have nothing above 2.5. Moreover, if we look at the other side of the distribution we see Z-scores of five and more. This indicates that there is bias in the data.

Simply put, the distribution is not normal. Take 2006, the average K/AB was 18% and the standard deviation was 6%. In this case getting a Z-score over three is impossible!

(Technical Note: Although Z-scores should only be used when applied to a normal distribution, as we don’t require any significance testing, we can still apply the concept here. As such be careful to note that a Z-score of three does not imply a 99% confidence level in this instance.)

One check worth doing to ensure that our analysis is valid is to look at the maximum Z-score by year. If the standard deviation and mean interact in such a way that maximum Z-scores are higher as time goes on, then of course Gwynn will top the list.

Decade     Ave of max Z
1870       1.62
1880       2.15
1890       2.14
1900       N/A
1910       2.69
1920       2.07
1930       2.16
1940       2.17
1950       2.49
1960       2.73
1970       2.59
1980       2.7
1990       2.93
2000       3.01

Hmm…we see that maximum Z-scores have slowly been moving toward three in the last few decades—whereas in the early 20th century Z was closer to two. Another lens through which to look at this is proximity to the maximum Z-score in each year (partly correcting for the phenomena we see above). Here is a list of batters ranked by how close they were to the maximum Z-score in that year.

Name              Year     Zdiff%   Z score
Joe Sewell        1932     0.08     -2.06
Joe Sewell        1925     0.10     -1.81
Joe Sewell        1933     0.10     -1.90
Joe Sewell        1929     0.10     -2.02
Joe Sewell        1930     0.11     -2.25
Don Mueller       1956     0.13     -2.45
Charlie Hollocher 1922     0.13     -1.69
Nellie Fox        1962     0.14     -2.41
Stuffy McInnis    1922     0.15     -1.66
Nellie Fox        1958     0.15     -2.23
Nellie Fox        1961     0.15     -2.25
Dave Cash         1976     0.15     -2.26
Red Schoendienst  1957     0.15     -2.11
Buck Jordan       1938     0.15     -1.65
Lloyd Waner       1936     0.16     -1.82
Stuffy McInnis    1924     0.16     -1.82
Joe Sewell        1926     0.16     -1.77
Nellie Fox        1959     0.16     -2.25
Bob Lillis        1965     0.16     -2.39
Emil Verban       1947     0.16     -1.88
Tony Gwynn        1995     0.17     -2.56
Dale Mitchell     1952     0.17     -1.83
Nellie Fox        1960     0.17     -2.23
Clint Courtney    1954     0.17     -2.18

Hall of Famer Joe Sewell tops our list. A quick glance at Baseball Reference tells us he was the “greatest contact hitter ever” and a look at his stats shows he had a quite remarkable career with the timber. Even in a low-whiff era he regularly struck out fewer than 10 times a season.

What about Tony Gwynn—where is he in our ranking? His great 1995 season appears at number 20, that’s out of 15,000 player seasons; he appears three times in the top 50 and six times in the top 100. Even though he isn’t at the summit that is nothing to sneeze at.

Wrapping Up

Strikes are an inherently magical part of the game. We laud hurlers who can mow down over 10 an inning without paying too much attention to hitters who constantly avoid the embarrassing swing and miss. And rightly so: a pitcher’s strikeout rate is far more indicative of his skill than a hitter’s K/AB rate is. That’s why we have Juan Pierre and Tony Gwynn at the top of the same list. Amen.

References & Resources
The Lahman database was used for all historical data. Also Baseball Reference was an invaluable resource (note to ed: can’t we embed that in this section as everyone seems to write it!)


Comments are closed.