Season similarity scores
In 2001, Ichiro Suzuki showed America a new style of baseball, a speedy, high average, low walk style of play never seen since…
Wingo had an interesting career. An alumnus of Oglethorpe University, he appeared in 15 games for Philadelphia as a 21-year-old in 1919. He acquitted himself well, with an OPS+ of 115, but he would make his next appearances as an outfielder in 78 games for Detroit five years later. As a 27-year-old in 130 games, the most playing time he would ever see, he batted .370/.456/.527, good for an OPS+ of 150 and a 12th-place MVP finish.
Together, those three seasons make Wingo into Ichiro’s most similar age-27 player on the incomparable Baseball Reference. The problem is, the two players aren’t particularly similar:
Player From To Yrs G AB R H 2B 3B HR RBI BB SO BA OBP SLG SB CS OPS+ Ichiro Suzuki 2001-2001 1 157 692 127 242 34 8 8 69 30 53 .350 .381 .457 56 14 126 Al Wingo 1919-1925 3 223 649 134 224 47 15 6 96 94 56 .345 .428 .492 16 18 136
In a season’s worth of at bats, Ichiro had 18 more hits, 13 fewer doubles, seven fewer triples, 64 fewer walks and 40 more stolen bases. The two players aren’t actually similar at all.
The alert reader probably saw this coming. Ichiro, after all, was an internationally famous superstar the first day he stepped onto a major league playing field. Thhat day occurred when he was 27, due to his playing in Japan for much of the previous decade. Paradoxically, any player truly comparable to Ichiro should have so much playing time by age 27 that his career numbers wouldn’t be comparable to Ichiro at all. It’s not really a coincidence that his most similar player at age 27 was an outfielder who had a career year in his first real opportunity for playing time at precisely the correct age.
And yet, the question remains. When was the last time America saw a player similar to the 2001 Ichiro? Has any player ever had a truly similar year?
Deconstructing Bill
Similarity scores were introduced by Bill James in The Politics of Glory, a book examining the Hall of Fame selection process. James sought to bring order to a common Hall of Fame argument: If Player A is similar to Player B, who is in the Hall of Fame, then Player A should also be elected. In a characteristically insightful approach, James realized that what was needed was a way to fairly compare a player to every other player, find the most similar players, and describe how similar they were. If you can say that Player A is similar to Players B, C, D and E, all of whom are in the Hall of Fame, you’re starting to make a very strong case for Player A’s election.
Aside from their original purpose, Similarity Scores give an element of vivid detail to baseball statistics. Whenever I want to learn about a player I’ve never heard of, the first thing I do is look at his list of most similar players. Finding someone I already know about makes the player I’m investigating come to life. The point of looking at Similarity Scores isn’t that the current system doesn’t work, it’s that the idea of Similarity Scores is such a good one that it’s worth improving as much as we can.
As employed on baseballreference.com, similarity scores are calculated by starting at 1,000 points and subtracting…
{exp:list_maker}One point for each difference of 20 games played.
One point for each difference of 75 at bats.
One point for each difference of 10 runs scored.
One point for each difference of 15 hits.
One point for each difference of 5 doubles.
One point for each difference of 4 triples.
One point for each difference of 2 home runs.
One point for each difference of 10 RBI.
One point for each difference of 25 walks.
One point for each difference of 150 strikeouts.
One point for each difference of 20 stolen bases.
One point for each difference of .001 in batting average.
One point for each difference of .002 in slugging percentage{/exp:list_maker}In addition, there’s a positional adjustment to account for players who spent their careers at different positions. In this essay, I will focus on batting similarity scores only.
So what happens if we use James’ system, but look at individual seasons instead of entire careers? That’s easy enough to program. Instead of starting at 1,000 and subtracting points, though, let’s calculate a “similarity distance” by starting at zero and adding points according to James’ system. If we do this, the 10 most similar seasons to Ichiro’s 2001 are:
First Last Year BB 1B 2B 3B HR SB Outs BA OBP SLG LWRuns SSDist Ichiro Suzuki 2001 30 192 34 8 8 56 450 .350 .377 .457 46.380 0.0 Sam Rice 1930 55 158 35 13 1 13 386 .349 .404 .457 38.820 14.3 Eddie Collins 1924 89 154 27 7 6 42 362 .349 .439 .455 57.320 16.4 Jack Glasscock 1889 31 155 40 3 7 57 377 .352 .385 .467 46.430 18.0 Bill Terry 1934 60 169 30 6 8 0 389 .354 .412 .464 39.230 19.4 Sam Rice 1925 37 182 31 13 1 26 422 .350 .385 .442 35.580 19.9 Rod Carew 1973 62 156 30 11 6 41 377 .350 .413 .471 51.850 20.0 Buddy Myer 1935 96 163 36 11 5 7 401 .349 .437 .468 53.200 21.4 Eddie Collins 1913 85 145 23 13 3 55 350 .345 .435 .453 58.010 22.2 Carson Bigbee 1922 56 166 29 15 5 24 399 .350 .405 .471 45.930 22.3 Charlie Jamieson 1923 80 172 36 12 2 18 422 .345 .417 .447 46.880 22.4
Kind of an unexciting list, isn’t it? None of these seasons leap out and strike you as a great match for Ichiro. Looking through the list, we see that Ichiro stole 56 bases in 2001. In what’s supposedly the most comparable season in history, Sam Rice stole only 13, and drew 55 walks compared to Ichiro’s 30! Looking through these columns, we can see that there’s a lot of variation in every column except two: All of the top 10 seasons are near-perfect matches in batting average and slugging percentage.
The problem here is that Similarity Scores are designed to compare long careers to one another—the kind of careers that might make it into a discussion about the Hall of Fame. For a career like that, it might be reasonable to give the same number of points to a single point of batting average as you do to five doubles. But over one season, five doubles is a lot, and a single point of batting average is nothing. For finding similar seasons, James’ system is unbalanced toward batting average and slugging percentage. It almost always will find seasons which are perfect matches in these categories, with large variations in all other criteria. If we want to devise a similarity score that works well for a single season, we’ll have to do something new.
What’s the point?
If we want to devise a new system for similarity scores, we have to look at the idea of a point with a critical eye. In the last section, we saw that James’ point system becomes unbalanced if we dramatically alter the length of the periods we’re comparing. Presumably, the same kinds of problems would arise if we were trying to find comparable players to a player whose career was very short. Can we develop a system that works for any length of time?
Another aspect to consider is that the field of sabermatrics is a lot larger now than it was when James devised his original system. There are many more people using sabermetrics to answer many more questions. It would be nice to have a system that connects to the rest of what we know about sabermetrics. It’s a less obvious problem than having bad matches for single seasons, but I have to admit that, as long as I’ve enjoyed using them, I have no idea what a point means in a Similarity Score. Are Similarity Scores consistent with the rest of sabermetrics?
At the level of a single batting event, they don’t match up very well. Using Pete Palmer’s linear weights formula, the runs created by a particular batter can be estimated by
Linear Weights runs= .47*1B + .78*2B + 1.09*3B + 1.4*HR + .33*(BB+HB) + .3*SB
-.52*CS – .26*(AB-H-GIDP)-.72*GIDP
We can now take the ratio of the run value of a single (.47 runs) and a double (.78 runs) to find that a single is roughly 60 percent as valuable as a double. But in James’ Similarity Scores formula, a single counts for 1/15 of a point, while a double counts for four times as much (1/5 point as an extra double plus 1/15 point as an extra hit).
Let’s make a table of the relative weights for the different batting events as compared to a single in Linear Weights vs Similarity scores:
Event LW SS 1B 1.00 1.00 2B 1.66 4.00 3B 2.31 4.75 HR 2.97 8.50 SB 0.64 0.75 CS 1.11 NA GIDP 1.53 NA out 0.55 0.2 (one plate appearance without a hit)
Pretty bad agreement!
Oddly, things become better if we compare the Similarity Scores ratio to the square of the linear weights ratio.
Event LW^2 SS 1B 1.00 1.00 2B 2.76 4.00 3B 5.33 4.75 HR 8.82 8.50 SB 0.41 0.75 CS 1.23 NA GIDP 2.34 NA out 0.30 0.2
The agreement here is much better, but still not great. It appears that Similarity Scores match up with square of the linear weights run value of particular offensive events. I suspect a lot of the disagreement comes a desire on James’ part for a system that could be worked out easily by hand. In this age of ubiquitous computing, that’s no longer an important consideration.
Run distance
We would like to construct a new system of Similarity Scores that weights different offensive events in a way that is consistent with Linear Weights. Ideally, we would like this system to be easily adjustable to the different offensive contexts seen at different points in the history of baseball. Fortunately, nothing could be easier. To find the distance between two points, you simply take the square of the difference in each dimension, add them up, and take the square root.
The only catch is that we have to use the same units for distance in every dimension we use in the calculation. It doesn’t make any sense to add inches to seconds, even if inches is a perfectly reasonable distance in space and seconds is perfectly reasonable distance in time. Similarly, it doesn’t really make any sense to add singles in one dimension to doubles in another. We’d like to use some common system of units in which both a single and a double can be expressed in a meaningful way. This is exactly what Linear Weights does.
If we use Linear Weights to convert Ichiro’s 2001 season to the number of runs he contributed with singles, doubles, etc., we find that he produced…
30*.33=9.9 runs from walks 192*.47=90.2 runs from singles 34*.78=26.5 runs from doubles 8*1.09=8.72 runs from triples 8*1.4=11.2 runs from home runs 56*.3=16.8 runs from stolen bases
On the negative side, he lost…
14*.52=7.28 runs from being caught stealing 53*.26=13.78 runs from strikeouts 3*.72=2.16 runs from grounding into double plays 394*.26=102 runs from all other outs
It’s now easy to calculate a “run distance” using the distance formula given above. Because strikeouts, caught stealing, and grounding into double plays were not official statistics for all of baseball history, we’ll leave those categories out of the calculation. Although it would be easy to adjust for different levels of scoring in different years, at the moment we’ll just use the linear weights formula given above.
We can now search for the player seasons with the smallest distance from Ichiro, as measured in runs. The new top 10 seasons are:
First Last Year BB 1B 2B 3B HR SB Outs BA OBP SLG LWRuns SSDist Ichiro Suzuki 2001 30 192 34 8 8 56 450 .350 .377 .457 46.4 0.0 Juan Pierre 2004 45 184 22 12 3 45 457 .326 .368 .407 30.5 14.4 Ichiro Suzuki 2006 49 186 20 9 9 45 471 .322 .367 .416 32.6 14.5 Ralph Garr 1971 30 180 24 6 9 30 420 .343 .372 .441 32.2 14.9 Willie Wilson 1980 28 184 28 15 3 79 475 .326 .352 .421 38.3 15.3 Richie Ashburn 1951 50 181 31 5 4 29 422 .344 .391 .426 35.8 15.4 Steve Sax 1989 52 171 26 3 5 43 446 .315 .366 .387 25.0 15.9 Sam Rice 1920 39 170 29 9 3 63 413 .338 .377 .428 40.9 16.7 Matty Alou 1969 42 183 41 6 1 22 467 .331 .369 .411 25.0 17.0 Sam Rice 1925 37 182 31 13 1 26 422 .350 .385 .442 35.6 17.1 Frankie Frisch 1923 46 169 32 10 12 29 418 .348 .392 .485 47.3 17.8
These seasons match up much better with Ichiro’s 2001 than the earlier list. Ichiro himself even appears in a later incarnation. It’s a little distressing that all of these players drew more walks than Ichiro’s 2001, but that’s due more to Ichiro’s own unusualness than anything else—there just aren’t many 242-hit, 30-walk seasons to choose from. A related issue is that all of these comparable seasons are distinctly worse than Ichiro’s 2001; this is more a list of “poor man’s Ichiro” seasons than true equals to Ichiro’s 2001.
Barry Bonds also had an unusual season in 2001:
First Last Year BB 1B 2B 3B HR SB Outs BA OBP SLG LWRuns SSDist Barry Bonds 2001 177 49 32 2 73 13 320 .328 .510 .863 131.5 0.0 Mark McGwire 1998 162 61 21 0 70 1 357 .299 .468 .753 104.0 16.1 Mark McGwire 1999 133 58 21 1 65 0 376 .278 .425 .697 81.9 25.6 Babe Ruth 1920 150 73 36 9 54 14 285 .376 .531 .849 127.4 32.6 Babe Ruth 1927 137 95 29 8 60 7 348 .356 .486 .772 116.8 32.8 Babe Ruth 1921 145 85 44 16 59 17 336 .378 .510 .846 139.9 33.5 Sammy Sosa 2001 116 86 34 5 64 0 388 .328 .440 .737 99.4 34.8 Babe Ruth 1928 137 82 29 8 54 4 363 .323 .461 .709 97.5 36.1 Mark McGwire 1996 116 59 21 0 52 0 291 .312 .460 .731 79.5 38.0 Jim Thome 2002 122 73 19 2 52 1 334 .304 .445 .677 77.8 38.1 Hank Greenberg 1938 119 90 23 4 58 7 381 .315 .436 .684 88.1 38.6
No surprises here. There are no particularly good matches to a 73-home run season. Let’s look at some particularly famous or unusual seasons.
Babe Ruth’s 1927 is surprisingly untouched by the steroid era:
First Last Year BB 1B 2B 3B HR SB Outs BA OBP SLG LWRuns SSDist Babe Ruth 1927 137 95 29 8 60 7 348 .356 .486 .772 116.8 0.0 Babe Ruth 1928 137 82 29 8 54 4 363 .323 .461 .709 97.5 11.1 Hank Greenberg 1938 119 90 23 4 58 7 381 .315 .436 .684 88.1 12.8 Jimmie Foxx 1932 116 113 33 9 58 3 372 .364 .469 .749 112.3 13.4 Mickey Mantle 1961 126 87 16 6 54 12 351 .317 .452 .687 89.4 14.4 Sammy Sosa 2001 116 86 34 5 64 0 388 .328 .440 .737 99.4 15.4 Ralph Kiner 1949 117 92 19 5 54 6 379 .310 .431 .658 81.0 15.9 Babe Ruth 1921 145 85 44 16 59 17 336 .378 .510 .846 139.9 16.2 Babe Ruth 1930 136 100 28 9 49 10 332 .359 .492 .732 108.8 16.2 Mickey Mantle 1956 112 109 22 5 52 10 345 .353 .465 .705 96.9 16.7 Hack Wilson 1930 105 111 35 6 56 3 377 .356 .454 .723 101.9 16.9
Ted Williams 1941: the last .400 season
First Last Year BB 1B 2B 3B HR SB Outs BA OBP SLG LWRuns SSDist Ted Williams 1941 147 112 33 3 37 2 271 .406 .551 .735 112.1 0.0 Mickey Mantle 1957 146 105 28 6 34 16 301 .365 .515 .665 100.1 11.5 Ted Williams 1957 119 96 28 1 38 0 257 .388 .523 .731 93.7 13.3 Ted Williams 1942 145 111 34 5 36 3 336 .356 .496 .648 95.9 17.1 Babe Ruth 1926 144 102 30 5 47 11 311 .372 .513 .737 112.6 18.6 Babe Ruth 1932 130 97 13 5 41 2 301 .341 .487 .661 83.8 20.5 Ted Williams 1946 156 93 37 8 38 0 338 .342 .496 .667 98.1 20.8 Babe Ruth 1924 142 108 39 7 46 9 329 .378 .510 .739 117.2 20.9 Ted Williams 1954 136 80 23 1 29 0 253 .345 .515 .635 76.3 21.3 Babe Ruth 1923 170 106 45 13 41 17 317 .393 .542 .764 135.3 21.6 Jason Giambi 2000 137 97 29 1 43 2 340 .333 .475 .647 86.9 21.6
Rickey Henderson 1982, 130 stolen bases:
First Last Year BB 1B 2B 3B HR SB Outs BA OBP SLG LWRuns SSDist Rickey Henderson 1982 116 105 24 4 10 130 393 .267 .397 .383 61.5 0.0 Rickey Henderson 1983 103 109 25 7 9 108 363 .292 .411 .421 63.0 11.8 Arlie Latham 1891 74 108 20 10 7 87 388 .272 .361 .387 36.7 20.8 Jim Fogarty 1887 82 83 26 12 8 102 366 .261 .366 .410 46.1 21.0 Hugh Nicol 1887 86 81 18 2 1 138 373 .215 .335 .267 28.5 21.1 Rickey Henderson 1980 117 144 22 4 9 100 412 .303 .418 .399 63.3 21.1 Rickey Henderson 1988 82 131 30 2 6 93 385 .305 .395 .399 50.4 21.5 Billy Hamilton 1889 87 129 17 12 3 111 373 .302 .399 .395 56.2 21.9 Hub Collins 1890 85 100 32 7 3 85 368 .278 .382 .386 41.7 21.9 Tommy Harper 1969 95 105 10 2 9 73 411 .235 .350 .311 18.3 22.1 Rickey Henderson 1998 118 97 16 1 14 66 414 .236 .373 .347 29.9 22.2
I’ll bet you didn’t have Arlie Latham in the office pool. Hugh Nicol is a surprisingly good match in everything but home runs.
George Brett’s .390 season in 1980:
First Last Year BB 1B 2B 3B HR SB Outs BA OBP SLG LWRuns SSDist George Brett 1980 58 109 33 9 24 15 274 .390 .460 .664 72.8 0.0 Harry Heilmann 1922 58 104 27 10 21 8 293 .356 .429 .598 55.6 8.7 Bill Dickey 1936 46 97 26 8 22 0 270 .362 .424 .617 50.4 10.4 Joe DiMaggio 1939 52 108 32 6 30 3 286 .381 .444 .671 68.0 10.4 Goose Goslin 1928 48 110 36 10 17 16 283 .379 .439 .614 61.5 10.9 Rogers Hornsby 1923 55 104 32 10 17 3 261 .384 .455 .627 59.7 11.4 Mickey Cochrane 1931 56 106 31 6 17 2 299 .349 .419 .553 45.7 13.0 Hal Trosky 1939 52 90 31 4 25 2 298 .335 .404 .589 46.1 13.1 Mike Sweeney 2002 61 104 31 1 24 9 311 .340 .415 .563 49.7 13.4 Moises Alou 1994 42 85 31 5 22 7 279 .339 .399 .592 43.8 13.9 Rico Carty 1964 43 96 28 4 22 1 305 .330 .388 .554 37.3 14.0
I’m fascinated that Mike Sweeney, universally described as the best hitter the Royals have had since George Brett, turned in a season so similar to Brett’s magnum opus.
For a season heavy in doubles, let’s take Todd Helton’s 2000:
First Last Year BB 1B 2B 3B HR SB Outs BA OBP SLG LWRuns SSDist Todd Helton 2000 103 113 59 2 42 5 364 .372 .467 .698 101.0 0.0 Carlos Delgado 2000 123 97 57 1 41 0 373 .345 .461 .664 92.2 10.7 Albert Pujols 2003 79 117 51 1 43 5 379 .359 .434 .667 85.1 11.1 Hank Greenberg 1940 93 96 50 8 41 6 378 .340 .432 .670 84.5 13.5 Frank Thomas 2000 112 104 44 0 43 1 391 .328 .437 .625 79.0 14.9 Derrek Lee 2005 85 100 50 3 46 15 395 .335 .418 .662 83.5 15.1 Albert Pujols 2004 84 97 51 2 46 5 396 .331 .414 .657 78.2 15.3 Frank Robinson 1962 76 116 51 2 39 18 401 .342 .415 .624 77.3 15.7 Lance Berkman 2001 92 97 55 5 34 7 386 .331 .423 .621 73.6 15.8 Todd Helton 2001 98 92 54 2 49 7 390 .336 .431 .685 89.2 16.0 Todd Helton 2003 111 122 49 5 33 0 374 .359 .461 .630 86.6 16.3
Surprising how many of those seasons came in a five-year window, isn’t it?
Alex Rodriguez’s best home run year brings back some memories for Seattle fans:
First Last Year BB 1B 2B 3B HR SB Outs BA OBP SLG LWRuns SSDist Alex Rodriguez 2002 87 101 27 2 57 9 437 .300 .385 .623 68.3 0.0 Ken Griffey 1997 76 92 34 3 56 15 423 .304 .382 .646 71.0 9.0 Ken Griffey 1998 76 88 33 3 56 20 453 .284 .361 .611 62.1 10.2 Sammy Sosa 1999 78 91 24 2 63 7 445 .288 .367 .635 64.0 10.6 Alex Rodriguez 2001 75 114 34 1 52 18 431 .318 .390 .622 72.1 12.0 Johnny Mize 1947 74 98 26 2 51 2 409 .302 .380 .614 58.6 12.2 Luis Gonzalez 2001 100 98 36 7 57 1 411 .325 .420 .688 88.0 12.3 Ryan Howard 2006 108 98 25 1 58 0 399 .313 .421 .659 79.8 12.7 Shawn Green 2001 72 100 31 4 49 20 435 .297 .371 .598 60.8 13.3 George Foster 1977 61 112 31 2 52 6 418 .320 .382 .631 65.1 13.6 Ken Griffey 1999 91 96 26 3 48 24 433 .286 .379 .576 60.5 13.8
Conclusions
Finding similarities between different players is one of the most interesting aspects of sabermetrics, but it has been sorely neglected as an area of research. In this essay, I have tried to put the concept of player similarity on more solid ground by introducing the idea of “run distance” between two different statistical records.
One advantage of looking at player similarity in this new way is that problems which were previously very difficult to address now become simple. For instance, a common complaint about Similarity Scores is that a mediocre player in a high offense era can show a superficial similarity to a much better player in a low offense era. It’s not at all clear how this problem could be corrected using the traditional formula, but simply dividing the run value in each category by the number of runs per game scored in a particular park or league naturally produces a historically corrected Similarity Score. It would be similarly easy to construct a rate-based Similarity Score, where each category is divided by plate appearances, to account for seasons with differing amounts of playing time.
Improved Similarity Scores can help sharpen Hall of Fame debates by pointing out when a season is truly unique, or comparable to the greats of the past. Mostly, though, I hope that the improved Similarity Scores presented in this article will help the enjoyment of baseball statistics by pointing out the unexpected similarities and parallels in baseball history.