How to measure a player’s value (Part 3)
I have defined what I mean by player value and why I believe that in parts One and Two. Now it’s time to get down to brass tacks, to apply principles to practice and see how they look in action.
This is not the most indepth or accurate method of calculating a player’s value. The goal is to present a process that lends itself well to explanation. That said, I believe the values I will be presenting will hold up well with most of the advanced uberstats that have been made publicly available.
Certain assumptions will be presented, particularly when it comes to the replacement level baseline and the adjustment between positions. When reading through the explanations, please try not to get too wrapped up in the specific details. Consider them a starting point for discussion, neither more nor less. Once you’ve made it through the examples and understands how it all fits together, we can discuss challenging those assumptions.
All player data is drawn from the Stats section of THT.
Offense
For estimating a player’s value on offense, a linear system works best. What linear systems tell us is how many runs that player contributes to an average team. Therefore, linear weights values are not guaranteed to add up to team totals on teams that are especially good or poor on offense.
There are many linear weights formulas, of varying quality. Since a linear weights formula is baselined against the production of an average team, the best linear weights formula to use is one that’s specific to the context under study. For our purposes, I’ve prepared a linear weights formula for 2008 based on playbyplay data. It’s specifically tuned to the data on our basic hitting pages. The definition of out is ABHK. (The values below are rounded to three decimal places, and therefore might not reconcile precisely.)
Out

K

BB

HBP

H

2B

3B

HR

SB

CS

0.278

0.280

0.308

0.329

0.451

0.290

0.569

0.950

0.150

0.467

These are runs above or below an average player’s. These should be similar to the batting values you see listed on Fangraphs, or the Batting Runs on Baseball Reference.
Now to park adjust. As discussed last week, a simple runbased park value formula is appropriate for a value metric. But we do have an issue. Our standard runbased park factors are typically applied as VALUE / PF. This works well with a lower bound of zero, but our linear weights are relative to average – negative values are in fact rather common. So what do we do?
What’s good for the goose should be good for the gander: If we’re using a linear run estimator, might it not be prudent to also use a linear park factor? What we simply do is take the average runs per plate appearance—roughly .122 in 2008—and park adjust that, and take the difference. That provides us with a linear factor to use, like so:
TEAM

PF

PF_PA

ARI

1.05

0.006

ATL

1.00

0.000

BAL

1.01

0.001

BOS

1.04

0.005

CHA

1.04

0.005

CHN

1.04

0.005

CIN

1.02

0.002

CLE

1.00

0.000

COL

1.09

0.011

DET

1.00

0.000

FLA

0.98

0.002

HOU

0.99

0.001

KC

1.00

0.000

LAA

0.98

0.002

LAN

0.99

0.001

MIL

1.00

0.000

MIN

1.00

0.000

NYA

1.00

0.000

NYN

0.97

0.004

OAK

0.98

0.002

PHI

1.02

0.002

PIT

0.98

0.002

SD

0.92

0.010

SEA

0.97

0.004

SF

1.01

0.001

STL

0.98

0.002

TB

0.99

0.001

TEX

1.03

0.004

TOR

1.02

0.002

WAS

1.01

0.001

Simply mutliply that by the number of plate appearances and add it to the linear weights values and you have your park adjustment. (For those accustomed to seeing park factors relative to 100 instead of one, remember to simply divide your park factor by 100 when using it.)
And that’s really all there is to it. You have, simply enough, a measure of a player’s offensive value relative to average. We don’t want to adjust for position or replacement level yet—those both come later.
Defense
This is harder than measuring offense. This is also more controversial than measuring offense, and less accurate than measuring offense.
Please do not get bent around the axle about any of this. There is cause for reasonable people to have reasonable disagreements about this. And we can have those differences and discuss those differences reasonably.
But on the whole, any model based upon playbyplay data is going to get more right than it gets wrong. So we should be in the clear as long as we remember some of our basic principles:
 We can live with a certain amount of inaccuracy in our models, so long as we take pains to avoid bias, and
 We understand the limits of our models and take care to moderate our claims accordingly.
If you have two players who are five plays apart in fielding in a single season of a zonebased metric, with similar playing time at the same position, it makes little sense to declare too strongly that one was better than the other. And remember, there is no magic number for when a number becomes “reliable;” it’s not an onoff switch. Use everything with a grain of salt.
Start with the Revised Zone Rating figures published here on THT. What we have is a measure of playing time (BIZ, or balls in zone) and of plays made (Plays and OOZ, or out of zone plays). What we want is to baseline these against average, convert plays to runs, and then to adjust for the difference between positions.
Converting RZR to plays above or below average is simple. I use the formula:
BIZ * (PlayerRZR – LeagueRZR) + Innings * (PlayerOOZ/PlayerInn – LeagueOOZ/LeagueInn)
(You will occasionally see people use BIZ as a unit of playing time when measuring out of zone chances, but that doesn’t quite work the way you’d expect; there is a practical limit to the number of balls in play, and so at the team level what you tend to see is that the more balls in zone there are, the fewer out of zone chances there are. Using innings as the denominator is an imperfect solution to this issue.)
From there, it’s a simple matter to use a constant to convert plays to runs. Then you adjust for position, prorating out the difference between positions based upon playing time. The positional adjustment should be based on the relative difficulty between the positions on defense, which we can measure to some extent by looking at players who play multiple positions. Here’s the full nittygritty:
Pos

RZR

OOZ_INN

Run/Play

BIZ/30

PAdj

1B

0.739422

0.030305

0.798

219

12.5

2B

0.821831

0.026007

0.754

426

2.5

3B

0.696536

0.038408

0.8

354.1333

2.5

CF

0.921688

0.067393

0.842

349.0333

2.5

LF

0.882837

0.04453

0.831

275.4

7.5

RF

0.899098

0.048531

0.843

292.0333

7.5

SS

0.828403

0.037678

0.753

424.8333

7.5

Once a player’s positional rating is figured, I simply prorate out the positional adjustment based upon playing time. To measure playing time, I use the number of BIZ chances relative to the league average. (Sometimes they’ll be prorated by plate appearances instead—I prefer to use a measure of defensive playing time instead, but that should be a largely pedantic point for the majority of players.)
Catchers are a different ball of wax—ability to turn a ball in play into an out is largely irrelevant to their defensive value on the field. Now, if there’s a reliable measure of a catcher’s game calling ability in a single season, I’ve yet to see it. But what we can measure is a catcher’s impact on the running game and a measure of how well he blocks pitches. That data is also available right here on THT.
We can measure a catcher’s value in controlling the running game using the same weights we used for stolen bases and caught stealing to evaluate hitters. (Simply reverse the sign; what’s positive for a runner is negative for a catcher.) But we also need to factor in the number of attempts—a catcher with a strong arm and a good reputation simply won’t see many attempts against him, while a weak armed catcher may see a lot of attempts against him. To account for this, we multiply stolen base attempts above or below average by 0.086. Blocked pitches are handled similarly: 0.232 for each wild pitch or passed ball blocked above average.
For catchers, I used innings played as the unit of playing time, and used a defensive adjustment of 12.5. For designated hitters, I use plate appearances as the unit of playing time, and use a positional adjustment of 17.5; some would argue that the defensive value of a DH should be zero, not negative, but that’s not an applestoapples comparison with their peers. If a DH could play the field and provide some sort of value, it’s very likely that his team would use him in such a fashion. As it stands, they can’t, and that makes a player who can hit and field simply more valuable to the team.
I don’t pretend any of that is definative or even state of the art; a system like UZR or PMR or the Fielding Bible Plus/Minus system are superior to this approach. When it comes to presenting the values and thinking through their meaning, I do prefer having the input components of a system like RZR to look at, tear apart and put back together again, however.
Putting it together
We have offense, and we have defense. What now?
Here’s the step where we want to convert runs into wins. This is where many a brilliant sabermetrician has become shipwrecked. We do not want to end up with halfflooded engines and radios, with a halfburied bow. So we are going to proceed with extreme caution. In order to convert runs to wins in a sensical fashion, we need to use marginal runs; in short, the question is how many additional runs result in one additional win? You may scoff, but that word additional has caused us to lose many more good men than necessary.
The question of where to set that margin is a rather controversial one, but the most important guiding principle is that it is very dangerous to set the margin too low. Let’s use a practical example. Baseball Prospectus’ Wins Above Replacement Player is explicitly set to a baseline somewhere around a .125.150 team win percentage. And yet if you add all the wins together, you end up with 2567.9 marginal wins for 2008. This is, quite frankly, absurd; in the actual 2008 AL and NL there were a combined 2428 wins. It’s frankly impossible for marginal wins to exceed total wins.
But that is where you end up if you set the margin too low. A typical ruleofthumb for converting runs to wins is to use 10 runs per win. But try applying that to the average team, which scored roughly 735 runs and allowed 735 runs. We know that a team that scores as many runs as it allows in the course of a season is right around a .500 team, or 81 wins in a 162game season. But applying our runtowins converter to all runs scored gives us 73.5 runs creditable to the offense alone!
So we need to set our margin high enough to where our runstowins conversion reconciles with team wins. We already have offense and defense measured relative to average; why not use that?
Here’s where we run into a problem. A player who, combining offense and defense, is precisely 0 is then worth 0 wins. And that’s true regardless of whether they play two games or ten or 162. In order to know a player’s overall value with an average baseline, we have to know both his run value and his playing time. This can be inconvenient and unwieldy.
So what we ideally want is a baseline low enough to capture the value of simply being on the field and contributing, but high enough to actually capture real contributions to team wins. The compromise between the two typically goes under the name “replacement level,” and is not so much one baseline but any number of baselines in the range between roughly a .290 and a .350 win percentage. For the time being, let’s simply say that a replacement level position player contributes 20 runs compared to average per 700 plate appearances. So, add that number of runs (prorated out based upon playing time to a player’s totals.
One more thing: the NL and the AL are not, and have not, been equal leagues for quite some time. (Don’t believe me? Check the interleague records.) To compensate for that, add five runs per 700 plate appearances to players in the American League. Then, to convert runs to wins, simply divide the total by 10 (you can improve upon this, but it’s easy to remember and it works well enough for right now).
So now, without further adieu, the top 10 position players, 2008, according to WAR:
Last

First

Offense

RepBonus

Defense

Total

WAR

Pujols

Albert

73.3

18.3

16.7

108.4

10.8

Jones

Chipper

51.5

15.3

18.1

84.9

8.5

Utley

Chase

33.6

20.2

30.0

83.7

8.4

Berkman

Lance

49.5

19.0

12.6

81.0

8.1

Rodriguez

Alex

39.0

21.2

14.3

74.4

7.4

Teixeira

Mark

45.5

21.2

7.2

74.0

7.4

Ramirez

Hanley

40.8

19.8

9.6

70.2

7.0

Wright

David A

41.0

21.0

7.3

69.4

6.9

Beltran

Carlos

30.4

20.2

16.8

67.4

6.7

Sizemore

Grady

30.1

26.6

8.1

64.8

6.5

And the bottom ten:
Last

First

Offense

RepBonus

Defense

Total

WAR

Wilkerson

Brad

13.9

11.0

9.4

12.3

1.2

Balentien

Wladimir R

16.1

9.3

6.0

12.9

1.3

Jacobs

Mike

5.4

14.8

31.5

11.3

1.1

Matthews Jr.

Gary

12.6

17.0

16.4

11.9

1.2

Patterson

Corey

29.4

11.2

4.7

13.5

1.4

Lamb

Mike

15.6

9.6

8.5

14.4

1.4

Francoeur

Jeff B

26.0

18.7

4.6

11.9

1.2

Pena

Tony F

30.4

8.4

6.2

15.8

1.6

Guillen

Jose

9.7

22.6

28.3

15.4

1.5

Gload

Ross

15.4

14.9

20.1

20.6

2.1

Check out all those Royals! Now it’s time to turn our attentions to…
Pitching
As discussed last week, we want to view pitching seperately from their defense. Thus, we need to use a model that analyzes a pitcher’s own contributions. These models are not perfect, because no model is perfect. We should always be trying to improve these models.
At THT we have Fielding Independent Pitching, a pretty simple model but a very effective one. I have a hangup about applying linear models to pitching, however, as discussed last week, and thus use a dynamic FIP based on BaseRuns. The difference in most cases is likely pedantic, but for extremely good or bad pitchers it will matter.
If you want to use FIP instead (or any metric scaled to ERA), you first need to convert to RA instead; an unearned run can lose you a game as easily as am earned run. As a rule of thumb, divide ERA (or anything scaled to look like ERA) by .92 before using in a player value metric.
Once we have a pitcher’s RA, we also have that pitcher’s runs per game if he were to pitch a whole game. From there, we can compute his win percentage compared to a leagueaverage pitcher—what percent of his games would he win if he pitched all nine innings, assuming his team scored an average number of runs? To figure it out, we can use the Pythagorean win expectation. (Instead of parkadjusting the pitcher’s performance line, at this step I parkadjust the average runs per game.)
Then, we want to compare his production to the win percentage of a replacementlevel pitcher. The replacement level of a pitcher depends on his role – a relief pitcher is easier to replace than a starting pitcher. And again, there’s a league quality difference.
There’s also the question of a relief pitcher’s leverage to consider: A fireman who comes in and pitches the tough outs is more valuable than a middle reliever, or a mopup guy who is sent in once the game is essentially already out of hand. For that, we need to figure a pitcher’s leverage bonus—essentially, how many extra wins does his leverage add?
So to figure WAR, we use:
WinPctReplacementWinPct * IP/9
And add:
WinPctLevWinPct * Lev * IP/9
The values I use for those constants:
Lg

Start

Relief

Lev

RA

AL

0.37

0.46

0.57

4.72

NL

0.39

0.48

0.57

4.66

And now, the top ten pitchers by WAR:
Last

First

BsRA

WAR

Halladay

Roy

3.18

7.8

Sabathia

CC

6.35

7.5

Lee

Cliff

2.99

7.4

Lincecum

Tim

3.02

6.9

Haren

Dan

3.27

6.3

Webb

Brandon

3.54

5.8

Santana

Ervin R

3.50

5.6

Mussina

Mike

3.46

5.4

Lowe

Derek

3.34

5.3

Danks

John W

3.61

5.3

Doc is sadly overlooked sometimes, isn’t he? And the bottom ten:
Last

First

BsRA

WAR

Gagne

Eric

5.85

1.4

Walker

Jamie

6.54

1.4

Manning

Charlie

6.13

1.4

Pinto

Renyel

5.33

1.4

Hansen

Craig R

13.53

1.5

Borkowski

Dave R

6.48

1.5

Villarreal

Oscar

6.88

1.8

Speier

Justin

5.76

1.8

Batista

Miguel

6.78

1.9

Heilman

Aaron

5.57

2.1

What’s worse than being a gascan? Being a gascan with a high leverage.
Wrapping it all up
Despite how it may feel, I have tried to be brief about these issues while still presenting enough to give an idea of what methods I use and, more importantly, why. I have also cribbed a lot—there are plenty of smart people out there and whenever possible I try to use their ideas rather than my own. There is a lot of material in the references down there that I heartily recommend, and I owe a debt to a lot of people, specifically folks like Tom Tango, Justin Inaz, Patriot and Sean Smith.
Now, like I promised above, here’s your oportunity to question the assumptions listed above. Provided is the complete spreadsheet I used to calculate all the values above. What do you get? Well, you get the full ratings – offense, defense, catching and pitching, plus WAR, for all players as well as team totals for 2008.
In addition, there’s a tab called “Assumptions.” That’s where I stored all the constants in the fancy charts from the article. Don’t like one of my assumptions? Play around with it. Substitute your own values. Substitute someone else’s values. See what happens.
But if you have questions—like, say, why Alexai Ramirez only rates out at barely above replacement level, or why NL MVP runnerup Ryan Howard only rates 117th place in position player WAR—just check the Assumptions tab. Tell me what assumption you disagree with and why. Or tell me why RZR underrates their fielding. Or whatnot.
Just please, don’t come to the table assuming that you tell the model what to think about certain players, not the other way around. The model has limits, of course—they all do. But on the whole they work pretty well in explaining how real baseball teams win real games. And a model that doesn’t challenge our preconceptions is just as useful as no model at all.
References & Resources
Adapted from Tom Tango’s Wins Above Replacement methodology.
The information used to calculate linear weights values was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at “www.retrosheet.org”.
Park factors adapted from work by Patriot. Conversion to linear park factors inspired by a conversation with Tom Tango.
Method of adapting RZR to a plus/minus format adapted from work by Justin Inaz and Chris Dial. Catcher defense ratings adapted from work by Sean Smith. Tom Tango explains his positional adjustments.
Method for estimating reliever leverage courtesy of Justin Inaz. Method for estimating starter/reliever usage based upon games started courtesy of Tom Tango. Fielding Independent BaseRuns is my own work, based upon David Smyth’s BaseRuns and Tom Tango’s FIP. Pitcher win percentages derived from Pythagenpat.