How to measure a player’s value (Part 2)

As the title implies, there was a Part 1. It’s advisable to read it first.

But a quick little summary couldn’t hurt, and what’s quicker than a one-line summary?

A player’s value is essentially an average team’s runs or wins with that player, minus their runs or wins without that player.

But despite what you might read on the back of a bubblegum card, it’s not always simple to determine how many runs or wins an individual player is responsible for—baseball, after all, is a team sport. It’s not fruitless or impossible to try—baseball is one of the most richly documented sporting experiences, and so it’s possible for us to do a very good job of this. But it does require rolling up our sleeves a bit.

This is supposed to be a survey, not an exhaustive look at these subjects. This week we’ll be long on explanation, short on technical detail. Next week there’ll be all the numbers you could hope for.

Valuing runs

Just go ahead and picture this. At the start of the inning, the pitcher lets a breaking ball get away from him and hits the batter. You now have a runner on first, no outs.

The next batter strikes a solid single into left, advancing the runner to third. Runners on first and third, no outs.

The next batter skies one deep to left but easily playable; the runner tags up and scores easily. Runner on first, one out.

According to the instructions given to the official scorer, the first hitter is awarded a run scored. The third hitter is awarded a run batted in. The second hitter is not credited for the run at all—this in spite of the fact that he was clearly the most valuable player in the entire sequence of events.

It’s one example, sure, and seemingly rather contrived as well. But the larger point I want to make is that Runs Scored and Runs Batted In are accounting methods; the official scorer is directed to assign (almost) every run to two batters, and he does so. You will sometimes hear people refer to a player’s R/RBI as “real runs,” compared to those fake runs that we sabermetricians apparently are discussing. But there’s nothing particularly compelling or persuasive about the methods used to assign runs to players by the official scorer.

Or, to put it another way: if a batter hits a triple, followed by a batter who hits a single, they each get equal credit for the run that results. Same holds true if a batter hits a single, followed by a batter who hits a triple. But it’s patently obvious that the triple is more valuable than the single.

In short, the simple assertion of responsibility of a run by the official scorer is a very poor model for the way teams actually score runs. It doesn’t tell us as much as it purports about a player’s hitting value—just because some people mistake these runs for actual team runs doesn’t make it so.

So if we truly want to develop an estimate of how to apportion team runs among players, we want a better model than the two-base cricket model that governs Runs Scored and Runs Batted In. This is also useful in comparing two hitters from different teams on an even playing field; we know that the same hitter will have much different R/RBI if he plays for a team like the 2005 Yankees than if he plays for the 2008 Nationals, even if he performs exactly the same at the plate—there will be more runners ahead of him to knock in on the Yankees and more players behind him driving him in.

We have two kinds of models: dynamic and linear. Dynamic models work very well on entire contexts, like a team’s overall performance or a pitcher’s performance. They do not work very well on individual hitters, because a single hitter only controls one ninth of his context. (And a hitters’ performance does not interact with itself: if a hitter walks, he cannot then go to the plate and hit a home run to drive himself in.) Linear models hold the envioronment context, and thus work well in estimating a hitter’s contribution to a certain context. They work less well for evaluating pitchers—a home run against C.C. Sabathia results in fewer runs on average than a home run against Jason Marquis—a pitcher like Marquis is simply more likely to have runners on base when a home run occurs.

For a dynamic run estimator, BaseRuns is the most accurate (and most versatile). The formula for BaseRuns is:

A*B/(B + C) + D

A Hardball Times Update
Goodbye for now.

Where A is the number of baserunners, B is the “advancement factor,” C is the number of outs, and D is the number of home runs. A simple version of BaseRuns would use the following:

A = H + W – HR

B = (1.4*TB – .6*H – 3*HR + .1*W)*1.02

C = AB – H

More complicated (and thus more accurate) BaseRuns formulas are available.

When it comes to linear run estimators, there are many, many linear weights formulas, nearly too many to name (and many that have no name at all). For a player value system, it’s probably best to use custom linear weights based upon the season. The key thing to pay attention to is baseline—all dynamic run estimators give you absolute runs, but some linear run estimators will give you runs above average instead.

You do not have to use the most complicated or most accurate run estimator available to you—at the extremes of accuracy you are fighting over very miniscule differences, probably well within the level of uncertainty you should have about these models. It is, however, important to use the least biased run estimator you have available. Runs Created, for instance, will typically produce a low run value for the walk and an inflated run value for the home run. This will overvalue hitters with high home run rates and low walk rates.

If we have more detailed data, we can also look at baserunning; not just basestealing, but baserunning. Who goes first to third on a single? Who goes first to home on a double? This is a secondary skill—the best baserunner is not better than the worst baserunner to nearly the same extent as the best hitter is better than the worst hitter. But if we have that data, it helps give a better picture of a player’s contribution to team wins.

Measuring playing time

Measuring playing time is one of those things that can sometimes sound simpler than it really is. When looking at a total value metric, it’s very important for us to understand what the unit of playing time is.

Let’s start off by looking at playing time at the team level. At the team level, we find that the fundamental measure of time is the out. So long as a team has outs remaining on offense, they can still score runs; so long as a team has outs remaining on defense, they still are responsible for preventing runs.

We seem to understand this when it comes to pitchers and fielders, although sometimes instead of outs we use games or innings, both of which are functionally equivalent to outs. (One game equals nine innings equals 27 outs.)

But when it comes to batting, we have a tendency to instead measure playing time in plate appearances or (shudder) at-bats. The trouble is that plate appearances are not fixed—every time a player makes an out, he is denying a plate appearance to one of his teammates. Two players with otherwise equivelent production in the same number of plate appearances are not equally as valuable if the two of them used different number of outs during the course of their PAs.

Most run estimators provide results per out, not per plate appearance. A linear run estimator baselined to runs above average uses plate appearances at the rate of playing time. This really, truly does not matter, so long as you use the correct unit of playing time. If your run estimator is giving you runs per out, the appropriate playing time comparison is to players with the same number of batting outs, not the same number of plate appearances.

Since many people have a hard time making the adjustment to considering outs as a unit of playing time for hitters, you can instead convert runs per out into runs per plate appearance. The exact method depends upon the run estimator used.

Park factors

It’s one of the more conventional pieces of sabermetric wisdom: a player’s contribution in runs should be separated from the effect of his home park.

Where it gets sticky is the level of detail one is willing to go to in order to do so. The basic arguement is between simple, run-based park factors and more detailed component park factors.

For a value metric, the run-based park factor is probably more appropriate. The reason we want to park adjust a player’s performance in a value metric is because the value of a run is distorted based upon the environment; a run in Coors is simply less valuable than a run in Petco. But if a player is hitting more doubles than the typical hitter because he is especially well-suited to his particular home park, those extra doubles are providing real value to his team.

Now, in some contexts, it may be appropriate to use component park factors, but this doesn’t make the component park factors "more accurate" in assessing a player’s value, at least when it comes to explaining team wins.

Fielding: Measuring a position player’s offensive value only gives you half a picture; the other half is his fielding prowess.

Fielding value is more difficult to measure, because while there is only one batter, there are nine fielders for the defense on every play. It is pretty easy to discern who made a specific play, but often difficult to discern who should be responsible for a ball when no play is made. Various defensive metrics attempt to assign responsibility for who is responsible for a ball in play. Generally speaking, the more detailed the underlying dataset, the better the results.

It should be noted here that the primary defensive skill is fielding a batted ball for an out. That is by far the most important thing a fielder does, and it’s also the skill with the largest differention in talent between fielders. And so this is the skill that most fielding metrics measure. Other things, such as turning the double play, catching balls at first base, throwing arms for outfielders, are of secondary importance.

(The great, shining exception is catcher defense, where skill at converting a batted ball into an out is a secondary concern at most.)

There are two presentations of defensive metrics: skill, which is generally represented as plays per chances; and value, generally represented as plays or runs saved above the average player at the position. Be careful to know whether your defensive metric comes in plays or runs—the Fielding Bible, for instance, uses plays, while UZR uses runs. Runs is more useful for our purposes, because then it can be directly compared to offensive production.

Now here’s the tricky part. Some folks are naturally inclined to ask, "Isn’t a player who is +10 on offense and +10 on defense equally valuable from a team perspective, regardless of position? They both create the same number of runs for the team, after all."

This is entirely wrong, and it comes from the simple error of mistaking a model of reality with actual reality.

Remember: average in anything, whether it’s hits, plays, runs, or wins, is simply a statement of central tendency. Typically we use average—and to be clear here I really should be saying mean—because it’s simple to compute, convenient to use and is commonly understood.

But that doesn’t make it particularly meaningful. If instead we used the median or the mode we’d get different results, and no one would be more right than the other.

More to the point, we shouldn’t misconstrue certain features of our mathematical models as having a particular meaning when they are a product not of the data (or the underlying truth) but of the tools we used to build the model. To assert that the average defensive shortstop and the average defensive first baseman are equally valuable on defense means favoring math over truth.

Or, to illustrate: Does it make sense to only compare a hitter to other hitters in his batting order spot? If one hitter is +10 relative to guys who bat fourth, and another is +10 relative to guys who bat eighth, are they equally as valuable?

In order to appropriately compare two players, we want to know their fielding contributions relative to all players, not just to players who play the same positions. This means that we want to adjust their value based upon their fielding position. We do this by figuring out the relative difference in value between positions.

Once we’ve done this, there is absolutely no need to adjust offensive performance based upon position. After all, a home run by a first baseman isn’t any less valuable than a home run by a shortstop—given the same base-out situations either player will drive in the same number of runs by batting a homer.

Pitching

If we’ve assigned credit for fielding batted balls for outs, then we have an issue when it comes to assessing the value of a pitcher. We’ve already handed out credit for a large portion of "his" outs. This raises the question of whether or not we should be assigning credit for fielding outs to pitchers at all. It is generally (but by no means universally) accepted among the analytic community that in fact we shouldn’t.

In that event, what is needed is a model for producing an estimation of a pitcher’s run prevention ability given a league average defense. These are generally called Defense Independent Pitching Statistics, or DIPS.

One word of caution – most DIPS and DIPS-like formulas are based upon a linear model of run scoring. Take FIP, for instance. The basic formula:

(HR*13+(BB+HBP-IBB)*3-K*2)/IP+3.2

Remember what we said about pitchers earlier: a walk is less valuable againt a better pitcher because the batter is more likely to be stranded on base. A linear formula like FIP is not modeling that reality. It is usally a small difference, but one worth noting.

Up next

Please to ignore last week’s schedule – next week, plan to actually go step-by-step in applying these ideas to some actual baseball players.

References & Resources

Almost everything of note I’ve had to say on the topic of run estimation, at least as it relates to this discussion, appears in the Hardball Times 2009 Annual. If you have a further interest in the topic, please, go there.

For further reading on a lot of topics, check out either Patriot’s website or Tango’s wiki.

Here’s an overview of THT’s fielding metrics. You can also take a look at Sean Smith’s fantastic TotalZone system, which covers the entire Retrosheet era.

DIPS is the rather brilliant brainchild of Voros McCracken. It’s probably the most controversial – and in my mind, the most important – finding of sabermetrics in the past decade. For a non-linear DIPS model, look at David Gassko’s LIPS, my BsRA or McCracken’s DIPS Base Runs.


Comments are closed.