What’s past is prologue by Colin Wyers July 30, 2009 Roll up your sleeves; there’s going to be some math in this one. Let’s talk about regression to the mean. True score theory The concept of regression to the mean is rooted in what’s called true score theory. The theory states that a measurement consists of two factors: an individual’s “true score,” and measurement error, sometimes called random error. (I’ll take this opportunity to note that not all measurement error is random; some is biased.) In sabermetrics, we have taken to calling a player’s underlying talent (if it could be observed without measurement error) as “true talent.” We can express this using a mathematical equation: Observed Performance = True Talent + Random Error + Bias (From here on out, we’ll be ignoring bias, for the sake of clarity.) Obviously a baseball player’s innate ability isn’t constant: He can be nursing a minor injury or learn better plate discipline. A lot of things can happen to change a player’s true talent level. Of course, the same can be said of taking a test, the typical use case of true score theory. A student can be well-rested one day, tired another day, for instance. When we refer to something as “true” we simply mean that it is repeatable under the same conditions. So what we notice is that when we observe something repeatedly, whether it’s baseball players or students, is those who did better or worse than the mean (or average) tend to perform closer to the mean as we add more observations. What it looks like in practice Let’s take a look at all batters who had over 100 plate appearances in a season between 1993-2008. The league average OBP in this period was .356. What we are looking at here is how well their OBP in their first 100 plate appearances in a season predicts their OBP the rest of the season. So with that in mind, here’s a look at how players who started differently fared over the rest of the season: OBP_Start PA OBP_Rest 0.300 101992 0.341 0.320 145662 0.350 0.330 150128 0.350 0.340 165794 0.353 0.350 189717 0.357 0.360 188231 0.357 0.370 189845 0.360 0.380 169848 0.368 0.390 173844 0.365 0.400 164177 0.367 0.410 127379 0.374 0.420 123294 0.379 The first column is OBP in the first 100 PAs, the second column the number of PAs that group of players had after that, and the third column is the average OBP of that group of players after the first 100 PAs. (On average, each player had 393 PAs after their first 100, or 493 PAs in the season. What this table shows us is that the concept of regression to the mean seems to do very well in predicting how groups of players will perform in the future. Breaking it down But do all individuals in the group regress to the mean? Let’s take a look at all players in our sample who put up a .320 OBP in their first 100 PAs. As a group, they put up a .350 OBP in the rest of the season. Pretty well regressed, right? But here’s a look at how the individual players in that group did: While the group mean does move toward the overall mean, some players end up doing far worse than the mean, and some end up doing far better. We can look at this again, this time using a group that was above-average to begin with, say a .390 OBP: Again, the group regresses to the mean: They put up a .365 OBP for the rest of the season. But again, some improved, and some did even worse than the mean. Frankly, 100 PAs isn’t enough to tell us much of anything about a baseball player. The R-squared (in other words, the square of the correlation between OBP in the first 100 PAs and OBP rest-of-season) is only .105. We can look at a scatterplot with a regression line on it: 100 PAs of OBP simply doesn’t do a very good job of predicting future OBP. The answer is simply to use more PAs – if Pujols puts up a .320 OBP in his first 100 PAs, the question of how much he will regress to the mean will be almost entirely overshadowed by the fact that he’s Albert Pujols and he is the best hitter on the face of God’s green earth. Do we regress Pujols’ OBP? And since he’s Albert Pujols, and we have over 5,800 PAs of .426 OBP from him, we don’t expect him to regress toward the mean anymore, right? Wrong! Let’s rework our formula from the beginning. The variance (or spread) of OBP among baseball players in a given time can be broken down into: Observed Variance = True Variance + Random Error As the number of observations (in this case, plate appearances) goes up, random error goes down. In 10 PAs, one extra walk (or any other on-base event) is worth .100 points of OBP. In 1,000 PAs, it takes 100 extra walks to be worth .100 points of OBP. Errors become harder as the amount of observations go up. But so does everything else: both kinds of variance decrease as our number of observations goes up. When we say that Pujols regresses to the mean, we don’t mean that he’s getting worse. We simply mean that due to the limited number of observations we have a certain amount of random variance that we have to account for. Typically for a very large number of PAs it’s a small amount of regression (probably not noticable if you’re rounding to three significant digits), and since his peers are typically expected to regress by the same amount it doesn’t change our relative opinion of him. A side note about selective sampling A funny thing happened on the way to the 101st plate appearance: Players who did very poorly tended not to make it that far, while players who did very well tended to get more PAs. There is a selective sampling bias here, in that teams allocate playing time based upon 100 PAs (as unpredictive as we assert that they are). Those players who tend to do poorly and still recieve playing time are generally either those who have done well in the past or who teams think are better players for reasons unrelated to performance to date, like scouting. References & ResourcesThe information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at “www.retrosheet.org”. Here’s a great article about regression to the mean that goes into the history of the phrase. Another great resource on the topic is here (even though it focuses on basketball, not baseball.) And of course I’ve written previously on the topic. The Sage Encyclopedia entry on reliability was a great help in preparing this article. Graphs were created using gretl. All results were weighted by the number of plate appearances in the second sample – in other words, a player with 300 PAs was counted twice as many times as a player with 150 PAs.