Measuring the Change In League Quality (Part Two)
Last week, we looked at the changes in league difficulty over the past 135 years. We discovered that the quality of competition has improved consistently, though with some bumps, but that properly measuring league difficulty required some careful adjustments.
Today, we will look at the difficulty of individual leagues throughout professional baseball history, look at how our adjustments affect individual players, and discuss another necessary adjustment.
Let’s start with the individual leagues. Last time, we concluded with the following graph, which shows the improvement in quality of competition over the years in professional baseball:

But, as last season showed, every league is not created equal. It would be unfair to last year’s American League players to apply the same adjustment to their statistics as to the numbers of 2006 National Leaguers. What we have to do is apply the same process to all leagues separately.
The effect is actually less apparent than we might have guessed:

We do find that United Association (1884) and Federal League (1914-15) were worse than any other leagues in history, but even so, not by that much. If you click on the picture, you can see an enlarged version that demonstrates the small differences between the American and National Leagues.
For example, the American League was somewhat worse than the National League in 1945, but better for most of the 90s. The American Association’s decline in its last two years (1890-91) is also evident on this graph.
Most importantly, this information allows us to granularly adjust players’ statistics for league difficulty, and see the effect that our quality of competition adjustments will have.
So what happens? First let’s calculate runs above replacement for each player without any adjustments (including park adjustments, which won’t be included in any lists in this article):
First Last RAR Ty Cobb 1358 Ted Williams 1310 Stan Musial 1173 Hank Aaron 1150 Tris Speaker 1125 Willie Mays 1109 Barry Bonds 1109 Lou Gehrig 1101 Rogers Hornsby 1071 Mel Ott 1050
(Note: Numbers through 2005. Numbers only counted from seasons in which the player did not appear as a pitcher, which is why Babe Ruth isn’t on the list.)
The list looks pretty reasonable, but is anyone bothered by the fact that nine of these players peaked between 1910 and 1970? That’s less than half of baseball history; even if we accept that the first 40 years of baseball history did not produce one of the 10 greatest hitters, could the last 35 have only produced one?
More specifically, are we really willing to accept that Ty Cobb was a more valuable hitter than Ted Williams? Tris Speaker over Willie Mays?
I think this pretty clearly demonstrates the need for some kind of era adjustment. So what happens if we adjust each player’s statistics? (Note: Statistics are adjusted so that the all-time numbers of runs above replacement is equivalent to the numbers there were before the adjustment.)
First Last RAR Barry Bonds 1431 Ted Williams 1294 Hank Aaron 1225 Willie Mays 1161 Stan Musial 1156 Rickey Henderson 1146 Frank Robinson 1103 Carl Yastrzemski 1099 Rafael Palmeiro 1077 Ty Cobb 1066
The modern players jump up on the list, but not outrageously so. Only Palmeiro seems out of place on this list, and I do have to remind you that, steroids or not, he is one of four players to collect 3,000 hits and 500 home runs. He also probably would drop out of the top 10 if we adjusted for park, replaced by Mickey Mantle
Palmeiro also gains the most from the difficulty adjustment, to the tune of 400 runs. Cap Anson loses the most, more than 425 runs in all. That’s a huge adjustment! It’s the difference between first and 10th; it’s also the difference between 10th and 60th.
Based on that fact, you can see the importance of doing these adjustments right.
Age Effects
After last week’s article, I received a ton of comments on the method, and many useful suggestions. A running theme was that I had to adjust for age to properly do the league difficulty adjustments.
After all, what I’m looking at is how players do in one season compared to the year before. If they decline, we conclude that the new league has become more difficult than the old. But what if they decline because of age, not because the league has improved? My study would not have picked up on that; instead, that decline would register as an improvement in the quality of competition.
The problem is that the process we use to adjust for aging is the same process I employed to generate league adjustments. Think about it: To measure quality of competition, we looked at how hitters in one year performed compared to the year before. To measure aging, we look at how players at a given age perform compared to the year before. If we adjusted for age, we would then conclude that baseball players have not gotten any better since 1871!
Unfortunately, the league difficulty adjustments are inextricably intertwined with the age adjustments. One cannot be separated from the other, but it is imperative that we do so. So what can we do?
What if we look at the distribution of playing time by age? After all, while major league teams make many stupid decisions, on the whole, they should be operating in a reasonably efficient manner. That includes giving the optimal amount of plate appearances to players of each age. As players improve, they should get more plate appearances, and vice-versa.
So what does the distribution of plate appearances by age look like?

What you see is a nice bell curve with a slight right skew. But if you just look at ages 26-30, something amazing happens:

Players at 26 get just about the same number of plate appearances as 30-year-olds, and players at 27 get just about the same number of plate appearances as 29-year-olds. Overall, the age curve is almost exactly flat, with a slight progression to 28, and an equivalent regression after. Therefore, we can reasonably assume that there is no overall change in performance from 26 to 30.
So, if we redo the league adjustments, but concentrate only on 26- to 29-year-olds (in the first year, so they’ll be 27 to 30 in the second), we should be able to remove the aging issue from our adjustments.
So what happens? The following graph shows the league adjustments using just our age range, compared to the league adjustments using all players:

The differences are not huge, but important nonetheless. Overall, you can see that the old method tended to underrate players from earlier days. Now, for the sake of posterity, let’s re-do the individual league adjustments:

(Note: Click on the picture to enlarge it.)
And now let’s apply them. Here are the top 10 hitters of all-time, by adjusted runs above replacement:
First Last RAR Barry Bonds 1376 Ted Williams 1321 Hank Aaron 1234 Stan Musial 1201 Willie Mays 1174 Frank Robinson 1110 Carl Yastrzemski 1109 Rickey Henderson 1104 Ty Cobb 1104 Mickey Mantle 1071
The earlier players gain a little, but more importantly, I think that this list would make complete sense to anyone.
It’s great when the numbers match up with intuition, isn’t it?
Great analysis, just wanted to drop a compliment to the author to show my appreciation. Read the first of the series and had to search around for this one but the extra effort was worth it.