Is there a better way of comparing players between historical eras? by Dan Lependorf March 15, 2012 How do we compare players from different eras of baseball to each other? We can’t just compare raw numbers, since baseball today is a wildly different game than baseball was in 1920. Was Barry Bonds’ best season stronger than Babe Ruth’s best? Who was more dominant in his time? One of the best tools we have for these types of comparisons is the “plus” stat, like OPS+ or ERA+. But as useful as plus stats are, it seems to me that they’re missing a small, but important point: Plus stats don’t consider the width of the talent distribution in baseball, which has changed over time. As a quick refresher, plus stats are calculated very simply—it’s just the percentage points above or below the average, with the average set at 100. And while this works quite well, simple percentage points don’t give the full picture. An OPS+ of 150 is 50 percent better than the league average, but what does that mean? Is 50 percent historically great? Is it fairly commonplace? We can’t tell from the number alone. When I talk about the width of the talent distribution, I’m talking about the size of the spread of talent in baseball. To use an example, someone who grows to a height twice that of the national average will be studied by doctors for years, but someone who has an annual salary that’s twice the average is very easy to find. Both examples are twice their respective averages (the equivalent of 200 in plus stat form), and yet they have completely different meanings. The spread of human height is relatively narrow, whereas the spread of income is far greater. Baseball’s talent distribution has had different widths throughout history, so if you’re going to adjust a player’s statline to reflect the average of his time, why not adjust for the width as well? Adjusting for both brings us closer to the whole point of adjusting in the first place—we’re attempting to compare two players based on how dominating they were in their era. A little math background: standard deviation is a statistical measurement of the width of a sample of data, in terms of “distance” from the average. To use my example from above, the standard deviation of human height is much smaller than the standard deviation of annual salaries. Simply, a larger standard deviation means that data points are more likely to be found further away from the average. A general rule of thumb is that you can expect around two-thirds of a sample to exist somewhere between one standard deviation above and below the average. Two standard deviations above and below encompass 95 percent of the sample, and that percentage sharply rises more and more the further you move from the average. For example, the scores on an IQ test are usually scaled so that the average is 100 and the standard deviation is 15. In other words, two-thirds of people have IQs between 85 and 115, and 95 percent of people fall in the range between 70 and 130. To bring this back to baseball, qualified batters in 2011 had an average wOBA of 0.342, with a standard deviation of 0.037. That means that we can expect two-thirds of qualified batters to have a wOBA between 0.305 and 0.379. For batters, wOBA gives us a total picture of a player’s offensive output, and as you’d expect, the average wOBA for each year has changed fairly dramatically, especially with all of the rule changes in the early portion of baseball history. I plotted the average wOBA along with one standard deviation above and below the average for each year, to give a visual look at how the run environment of baseball has changed over time. Click to enlarge. As you might expect, the width of talent in baseball has reduced over the years. In the 1870s, one standard deviation comes out to roughly 20 percent of the average wOBA, and it’s been a gradual decline to the 9 or 10 percent we see today. This means that regular plus stats undervalue modern players, since being 50 percent better than the average in 2011 means you’re ranked quite high. Being 50 percent better than the average in 1885 doesn’t put you ahead of as many players. Interestingly, there’s a spike in the standard deviation up to 12 percent in the late 1990s and the early 2000s. Barry Bonds is probably singlehandedly responsible for a good chunk of that. More on that in a bit. Once we have the average and standard deviations for each year, it’s easy to see which single seasons were the most dominant of their time in terms of z-score, which is the number of standard deviations above or below the average. Compared to plus stats, this ranking gives a more accurate representation of how dominating a player was, compared to his peers. An plus stat of 150 could be historically excellent or simply merely good, depending on the stat in question and the era. But a z-score is always consistent—a z-score of 3 is always fantastic, no matter the context. Here’s the top 10 in baseball history. Player Season wOBA Z-Score Barry Bonds 2004 0.538 4.996 Babe Ruth 1920 0.600 4.536 Barry Bonds 2002 0.546 4.495 Babe Ruth 1921 0.578 4.257 Babe Ruth 1926 0.547 4.158 Ted Williams 1941 0.565 4.102 Barry Bonds 2001 0.539 4.089 Barry Bonds 2003 0.506 3.972 Babe Ruth 1923 0.566 3.844 George Brett 1980 0.479 3.860 Seem a little repetitive? That’s right, eight of the best nine offensive seasons in baseball history came from the bats of either Babe Ruth or Barry Bonds. That 2004 season from Bonds is something we won’t see again for a long long time, if ever again. I’ve put together a full leaderboard of all qualified batters below. For my next article, I’ll do the same for pitchers. References & ResourcesFull sortable leaderboard of all qualified seasons from 1871-2011 here. Be warned, it’s on the larger side. You should be able to sort it if you go to list view. If someone wants to confirm this in the comments, that’d be great. Some notes: This would technically only apply if baseball talent followed a normal distribution, which it doesn’t. But if we restrict the sample only to qualified batters, a normal distribution becomes a decent approximation. On the other side of that leaderboard, we’ve got White Sox shortstop Mike Caruso. With a z-score of a ridiculous -2.9 in 1999, he’s the proud owner of the worst qualifying season in baseball history, aside from a handful of guys in the 1800s. All data from Fangraphs. Sincerest apologies, Dave Appelman, for asking your servers for every season by every batter in the history of baseball as a CSV file.