How accurately can we estimate a hitter’s runs? (Part 2)
Last week I presented a method for estimating the error around an individual hitter’s linear weights. This week we’re going to refine the model.
To review, the basic idea was that the error around a team’s estimate of runs provided by linear weights was equal to the sum of all the errors of the individual players, or rather the error term for each plate appearance. So by looking at the team error, we could figure out the absolute value of the error for each player, based upon their PA.
A revision to the quick and dirty formula
Let’s go back and reconsider that formula. Reader Keith Karcher was kind enough to send an email that clarified a few issues for me, and so I’m going to revise the formula based upon his input. (I should note in the event that I continue to get things wrong that this is my fault and not his.)
The formula was based upon the root mean square error of linear weights for the typical team. RMSE is one way we can evaluate an estimator but in this case it’s the wrong way. In this instance, we should use the Mean Average Error instead.
Why?
Because what we’re really after is the error for that particular team. But since we’re looking for an expedient way to do things, we instead want to use the average error at the team level as a proxy for that. MAE gives us that, whereas RMSE gives us the standard deviation. (This method gives results much closer to the more complicated method that I’ll be presenting a little later on in the article.)
The MAE for the time period (19932008) that we’re using given the test linear weights is 17.37. That gives us an x variable of .22, changed from .286 last week. For a hitter with 650 PAs, the method presented before overestimated the amount of error in our linear weights calculations by about a run and a half.
And as commenter “carl” pointed out, this is going to overstate the total amount of error when we’re comparing two players from the same team. Typically, in that case, the errors will have a (slight) tendency to point in the same direction, and therefore our uncertainty will be lower than predicted by the formula. I don’t have a good idea how to compensate for this, unfortunately.
A somewhat more rigorous version
As I alluded to last week, batting events do not have the same amount of variance in their contribution to run scoring. Walks, for instance, are far more consistent in their contributions to runs than home runs. But how to measure this?
Most empirical linear weights are derived using the change in run expectancy—in other words, the change in the average number of runs that score given the situation before and after the plate appearance. So what I did is I looked at the root mean square error between the average change in run expectancy (in other words, the linear weights value of that event) and the change in run expectancy for each individual event.
Event

Error

Out

0.21

K

0.17

SB

0.11

CS

0.28

BB

0.18

IBB

0.08

HBP

0.19

1B

0.33

2B

0.46

3B

0.53

HR

0.54

Avg

0.24

Some things to take away from the chart:
 Power hitters are far more “inconsistent” than other kinds of hitters. Doubles, triples and home runs by far have the highest standard error.
 Walks are very consistent, showing very little standard error compared to other events. Outs are also very consistent. So, all else being equal, players with either very high or very low onbase percentages would tend to have less error in their linear weights estimates.
One might also note that we have a third value for the standard error per plate appearance, although both values presented this week are closer to each other than either is to the value presented this week. But in 650 PAs that’s a difference of about a half run between the two methods. This is going to require further investigation.
Putting it together
Let’s switch back to the simpler formula for the time being, and look at how to apply the formula to two actual hitters. We’ll look at Joe Mauer and Mark Teixiera, considering only hitting performance as measured by Fan Graphs. We’ll use the stats listed under the Value heading, specifically Batting and Replacment to give us Batting Runs Above Replacement. (Why replacement? I’ll let myself speak for myself, in my series on player value: parts one, two and three. Short version: runs above average would give Mauer an undue advantage because of the amount of playing time he missed compared to Teixiera.)
(We are of course sneaking in some confounding effects here: the uncertainty of our park factors and the uncertainty of our estimate of the replacement level. For the time being, we’ll set that aside, but it bears remembering.)
Tex has 50.9 BRAR in 556 PAs, while Mauer has 60.7 BRAR in 390 PAs. That’s a standard error of 5.1 for Teixiera, and 4.3 for Mauer. We can represent that visually if we assume a normal distribution for actual run production:
Note the area of overlap; there is in fact a chance that Teixiera has outhit Mauer in that time. But how much of a chance?
Remember that the standard error of the difference of the two values is the square root of the sums of the squares of the standard error, or:
SQRT(SD_P1^2 + SD_P2^2)
That gives us 6.7 for Teixiera and Mauer.
This is where I screwed up last weekL we don’t really care about the odds that Mauer is better than his LWTS say or that Tex is worse than his LWTS. All we need to know is whether the difference between the two greater than our standard error.
And in this case, it is, by a difference of 9.8 runs. That means we’re more than 68 percent confident that Mauer has hit better than Teixiera. (How confident? Take 9.8 and divide by 6.7 and that gives you the zscore, 1.46. Consulting Excel’s NORMSDIST function tells us that we’re 93 percent confident. This, as the pedants in the audience will surely note, “is not significant for p<0.05." Which  they are correct. But using 0.05 rather than 0.07 as a p value is a rather arbitrary point; we're pretty darn confident either way.)
References & Resources
The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at “www.retrosheet.org”.
Again, everything presented here assumes that the linear weights are unbiased. Not all linear weights are unbiased – VORP, for instance, rates the walk too low and the home run too high.
Fangraphs does present another set of values, RE24, which should be more accurate (as defined here) than traditional linear weights. How much more accurate I couldn’t say.