Ghosts in the outfield

by Michael Humphreys
August 24, 2007

As some of you know, I’ve been trying to sell a book in which I use DRA to rate the greatest fielders of all time. For people to have confidence in the historical ratings, I felt it only fair to show that for recent seasons for which ratings based on Zone Data were available, DRA provided ratings consistent with these state-of-the-art estimates of “objective truth.”

I had published two separate studies showing that DRA had at least a .70 correlation with the best known published ratings based on Zone Data (UZR) and nearly exactly the same standard deviation in runs-saved/allowed per fielder. To the best of my knowledge, DRA was the first and remains the only fielder evaluation system based solely on publicly available data to match Zone Data this well.

When Plus/Minus was published to great acclaim last year, I figured I would have to show that DRA also matched well with the newest system based on Zone Data. I did some very laborious calculations, and satisfied myself that DRA had a .8 correlation and nearly exact standard deviation match with Plus/Minus for middle infielders. At third, the correlation was only .5, but the average overall in the infield was still over .7.

What drove me to write this article were some wacky results in the outfield.

After running some numbers, it became clear to me that the number of plays reported by The Fielding Bible for each outfielder above or below what the average outfielder at that position would have made (“net plays”) were so compressed as to be virtually zero—in other words, there was virtually no measurable, and certainly no statistically significant, variation in estimates of net plays, aside from a couple of players who were simply playing out of position (e.g., Bernie and Junior in center).

It was subsequently revealed that when John calculated net plays, he had collected his data into “buckets” defined by vector, distance and trajectory that were too precise—so that the samples sizes of each “bucket” were too small to yield statistically significant results. John has changed his calculation methodology to increase the sample size of each bucket of data, resulting in ratings with a typical degree of variation, but unfortunately he has not reported revised results, other than for a few players.

However, Tangotiger’s blog linked to Wharton statistics Professor Shane Jensen’s new BIS-based SAFE system, which might be particularly well adapted to deal with missing data points, for technical reasons that would take a while to explain.

Shane’s SAFE website currently discusses his methodology in general terms, but the kernel smoothers and numerical integration techniques he uses are not yet disclosed. Nevertheless, his general description has attracted the attention of other academics in statistics and will be the subject of an upcoming presentation at the 2007 New England Symposium on Statistics in Sports at Harvard this fall.

Given all of the above, I decided to do a SAFE-UZR test based on a comparison of ratings of all outfielders with at least two full-time seasons during 2003-05. (A full-time season is a season of 130 games or more at one position for one team.) I wanted only full-time seasons because I wanted to test serial correlations from year to year, and so sufficiently large within-year samples for each player were needed. I also wanted a manageable sample size for the study. Doing this kind of thing requires a painful amount of copying and pasting of numbers from and to various spreadsheets.

Here are the SAFE, UZR and DRA ratings, denominated in terms of runs saved per 1,450 innings played during 2003-05, including only full-time seasons for each player.

Database:                     BIS    STATS    Trad'l
Rating:                      SAFE     UZR       DRA

Outfielder           Pos
Adam Dunn            L       -14        -7      -29
Andruw Jones         C        12        -3        9
Bobby Abreu          R       -13         3       -4
Brian Giles          R         3        25        2
Carlos Lee           L         2        -3        8
Carl Crawford        L        19        12       17
Carlos Beltran       C         3        20       12
Gary Sheffield       R       -15       -12       -3
Ichiro Suzuki        R        10         6       11
Jermaine Dye         R         2         8       -6
Jim Edmonds          C        -5        -8       16
Johnny Damon         C        -6         1       10
Jose Cruz            R         4         8        1
Juan Encarnacion     R        -9        14      -12
Juan Pierre          C         1         5      -13
Luis Gonzalez        L         3        -5       -3
Manny Ramirez        L       -26       -36      -28
Mark Kotsay          C        -5        -3        2
Marquis Grissom      C        -8        -8        1
Mike Cameron         C        11        12       34
Moises Alou          L        -5         3       -9
Pat Burrell          L        -6        -4      -14
Shawn Green          R        -5        -9       -1
Vernon Wells         C         1        12      -12

                             SAFE     UZR       DRA
Average                       -2         1        0
Standard Deviation            10        13       14
Correlation w/ SAFE                   0.65     0.68
Correlation w/ UZR                             0.42

Well, at least the correlation between UZR and SAFE was better than .5. Nevertheless, it was a surprisingly weak result. Given that in prior years DRA had achieved a .71-.77 correlation with UZR, it seemed reasonable to expect a substantially stronger correlation between two state-of-the-art systems based on almost unbelievably detailed data designed to pin down the ‘truth’.

I wasn’t too concerned that SAFE had a lower standard deviation; part of Shane’s method discounts plays made in zones where plays are relatively rarely made. Although I currently do not believe that’s the right thing to do, at least the results could be explained. I felt great that DRA matched with SAFE as well as UZR did, but bad about the poor match between DRA and UZR, which is still the most widely known system, given DRA’s prior “success.”

Then it occurred to me just how differently each of the various rating systems use Zone Data. UZR does a lot more than merely count net plays—it takes into account not only distance, “slice” and trajectory (popup, flyball, line drive), but also how hard the ball was hit, batter-handedness, the expected run-value for BIP in each ‘bucket’, park factors, and even a “ball-hogging” adjustment (explained below) that hadn’t been used when I had previously tested DRA against UZR.

BIS Zone Rating doesn’t take any of these other factors into account. SAFE’s methodology is also clearly different from UZR’s, and in particular, does not include park effects. BIS-based PMR, in its published form on David Pinto’s website, Baseball Musings, has its own park factors. It was all apples-to-oranges-to-bananas-to-spinach. True, BIS-based Plus/Minus and STATS-based UZR matched well in the infield, and the infield calculations also use many parameters, but “‘depth” and “trajectory” are largely irrelevant, park factors are much less extreme, ball-hogging on groundballs much less a factor, and so forth. The outfield was the problem.

What was needed was a comparison of BIS and STATS outfielder ratings based on calculation methodologies that were as consistent as possible. Fortunately for all of us, David Pinto and Mitchel Lichtman diligently ran the numbers and graciously shared their output.

BIS and STATS outfielder ratings using consistent methodologies 2003-05

David, Mitchel and I decided to reduce zone ratings back to the bare essentials—counts of BIP defined by distance, slice and trajectory—and compare the results based on BIS and STATS data. We were careful to collect counts into “buckets” that were larger than those used by John for Plus/Minus. Parameters for speed, batter-handedness and park were left out. Speed is in some sense redundant—the distance, slice and trajectory of a BIP hit into the air largely determine the speed, or how hard the ball is hit.

A Hardball Times Update

by RJ McDaniel

Goodbye for now.

Batter-handedness might be worth adding later, but we were seeking simplicity. Park factors are notoriously difficult to do correctly, even in the case of well-known and reliable offensive models. So the numbers you’re about to see don’t take park effects into account (neither does DRA, except for Coors, but no Coors fielder is in the study). We also didn’t bother calculating separate estimates of runs saved per type of play.

We just tried to answer the simple question: How many more or fewer batted balls did this outfielder catch compared to the average outfielder at his position, given the trajectory, direction and depth of all BIP hit anywhere near him?

We also left out the adjustment for ball-hogging used by Plus/Minus and the latest version of UZR in “shared zones.” Assume a pop-up is hit to short centerfield, and that, on average, it drops in for a hit 10% of the time, and otherwise is equally likely to be caught by the centerfielder (30%), shortstop (30%) or second baseman (30%). Under Plus/Minus or regular UZR, if the ball is caught, the player who catches it gets credit for .1 hit saved, and no other player is affected. If the ball is not caught, the center fielder, shortstop and second baseman each is charged .3 hits allowed. That way, a “ball-hog” who took all these easy chances wouldn’t be overrated, nor his teammates underrated.

Under the posited example, this kind of approach makes a great deal of sense. However, it turns out that there aren’t any shared zones among outfielders in which (1) it is highly likely the ball will be caught and (2) the out-conversion rates for neighboring outfielders are at all similar. (BIP shallow enough to be fielded by infielders were not included in the new UZR outfielder ratings.)

Thus the underlying motivation for this type of calculation doesn’t apply in the outfield. Instead, what happens when this approach is used is that good outfielders “subsidize” poor outfielders sharing a zone. It’s simpler and more informative to compare each outfielder’s out-conversion rates for a particular distance, slice and trajectory against the average rate for his position, without regard to what his neighboring teammate does.

Here are the combined Simplified-PMR and Simplified-UZR ratings for all full time seasons in 2003-05, denominated in terms of runs saved per 1,450 innings played, assuming each net play saves .8 runs on average (.525 for the hit, which is likely to be a single worth .45 or a double worth about .75; .275 for the out).

                              BIS       STATS     Trad'l     Avg S-PMR
Outfielder                   S-PMR      S-UZR       DRA       & S-UZR
Adam Dunn                      -18        -4        -29           -11
Andruw Jones                    19        -2          9             9
Bobby Abreu                     -3         1         -4            -1
Brian Giles                     -9        18          2             5
Carlos Lee                       2         2          8             2
Carl Crawford                   42        14         17            28
Carlos Beltran                   9        14         12            11
Gary Sheffield                 -13       -15         -3           -14
Ichiro Suzuki                   22         5         11            13
Jermaine Dye                     1         8         -6             5
Jim Edmonds                      8        -2         16             3
Johnny Damon                    -6         0         10            -3
Jose Cruz                        8         2          1             5
Juan Encarnacion                 7        15        -12            11
Juan Pierre                     -1        -1        -13            -1
Luis Gonzalez                   14         8         -3            11
Manny Ramirez                  -27       -46        -28           -36
Mark Kotsay                      1       -19          2            -9
Marquis Grissom                -18        -6          1           -12
Mike Cameron                    28        21         34            24
Moises Alou                     -6         2         -9            -2
Pat Burrell                      8        -3        -14             3
Shawn Green                      1        -6         -1            -3
Vernon Wells                    -6         6        -12             0

Avg                              3         1          0             2
Std                             15        14         14            13

S-PMR v. UZR correlation:               0.60
S-PMR v. DRA correlation                0.69
S-UZR v. DRA correlation                0.50
SUPA v. DRA correlation:                0.68

The correlation between S-PMR and S-UZR, .60, is actually lower than the SAFE/UZR correlation, .65, though the standard deviation match is better. For some reason, S-UZR matched better with SAFE above (correlation .70) than with S-PMR, even though we were trying to calculate S-PMR the same way as S-UZR. So it seems that stripping away the UZR park factors might be enough to get a BIS-based (SAFE) and STATS-based system (original UZR) to the threshold of “strong” correlation, though, again, the standard deviations are significantly different.

The average of the correlations of DRA with S-PMR and S-UZR is .60. Once again, I was pleased to see that DRA matched with BIS as well as STATS does, but disappointed with the low correlation with STATS.

Not surprisingly, ratings based on the same data sets have higher correlations: S-PMR has a .85 correlation with SAFE and S-UZR has a .89 correlation with UZR.

When I saw that despite the best efforts of (1) a consultant who has been paid by a major league team to run precisely these kind of numbers (Mitchel), (2) a Harvard Ph.D. in statistics (Shane) and (3) a Harvard grad and programmer with 20 years of experience (David), it was just barely possible for BIS and STATS outfielder ratings to reach the threshold of a strong correlation, but that it was not yet possible to achieve both a strong correlation and standard deviation match (when in prior studies DRA had done so with UZR), I was ready to give up on Zone Data altogether.

Then it occurred to me to look at a simple average of S-PMR and S-UZR, on the theory that each is at least independently trying to answer the same question. Maybe the sum could be greater than the parts, in a wisdom-of-crowds way.

It is hardly scientific to say so, but it appears to me that an average of these simplified forms of PMR and UZR yields significantly better ratings than either one considered alone—in fact, the combined ratings look virtually perfect, except that in one or two extreme cases some park effects should be taken into account.

To get some qualitative support for the “Simplified UZR-PMR Average” (“SUPA”), let’s glance at some comments from The Fielding Bible. Rather than repeat commentary that simply regurgitates numerical findings, I’ve generally tried to find more nuanced and subjective comments.

Adam Dunn LF (SUPA –11; DRA –29) …(H)as been a first baseman playing left.

Andruw Jones CF (SUPA +9; DRA +9) He gets amazing jumps on the ball, which can overcome any slight loss of range (due to bulking up) in recent years.

Bobby Abreu RF (SUPA –1; DRA –4) Possesses good speed (but) . . . . is a very conservative defender. He has been accused of having lapses in concentration, fear of going for balls or running into walls on the warning track, and just not giving a maximum effort in the field.

Carl Crawford LF (SUPA +28; DRA +17) The best defensive left-fielder in baseball today. He possesses world-class speed and routinely turns gap hits into outs.

Carlos Beltran CF (SUPA +11; DRA +12) A leg injury seemed to hinder his range but also his aggressive nature; he did not have his usual [e.g., 2003] explosive burst in 2005.

Gary Sheffield RF (SUPA –14; DRA –3) (D)oesn’t offer much range and plays very conservatively . . . . As one scout says, he “turns outs into outs.”

Ichiro! RF (SUPA +13; DRA +11) (P)robably the finest defensive player in the game. He combines tremendous speed and range with the most feared throwing arm in the game. However, his game showed some decline in 2005.

Juan Encarnacion RF (SUPA +11; DRA –12) (H}as the skills, but not the consistency. He has good speed and often makes spectacular catches, but then turns around and botches routine plays.

Mark Kotsay CF (SUPA –9; DRA +2) (C)ombines good range with good reads on the ball (and) good routes.

Mike Cameron CF (SUPA +24; DRA +34) (W)as a Gold Glove center fielder who reluctantly slid over to right after the signing of Carlos Beltran.

Moises Alou LF (SUPA –2; DRA –9) (M)erely an adequate defender. A history of nagging injuries has really hurt his speed and range over the years but he handles balls he can get to.

Vernon Wells CF (SUPA 0; DRA –12) (A) superb defensive center fielder. He is very smooth in the field, has good instincts and is particularly adept at going back on the wall.

Based on Fielding Bible commentary, both SUPA and DRA seem to do reasonably well. The only clear miss by DRA is Vernon Wells. SUPA might have Kotsay wrong. Juan Encarnacion’s rating should probably be an average of the SUPA and DRA rating. In general, a three-way average of S-PMR, S-UZR and DRA might be best of all. I would encourage readers to read all of The Fielding Bible essays regarding the fielders in the sample, not just those listed above.

It appears that player ratings are very consistent year-to-year under the simplified calculation methodologies (and DRA); always better than .50, often much higher. (Year by year numbers are shown immediately below.) This is approximately the same year-to-year correlation for batting performance based solely on BIP—that is, batting outs other than strikeouts, singles, doubles and triples. In other words, a player’s fielding BIP performance is about as persistent from year-to-year (based on skill rather than luck) as his batting BIP performance.

For some reason, the correlation between 2003 and 2004 ratings for the same players was extremely low for SAFE (.20) and UZR (.04), though the 2004/05 correlations were .80. It could just be a small sample size issue, or perhaps the extra parameters add to year-to-year noise. For example, the “ball-hogging” adjustments in UZR might cause player ratings to vary with the quality of their teammates, for reasons explained above.

On Monday, I’ll lay out the specific outfield results for all systems for each of the past three years, and I’ll also reveal the actual outfield formulas for Defensive Regression Analysis.

BAL	CHW	ATH
BOS	CLE	HOU
NYY	DET	LAA
TBR	KCR	SEA
TOR	MIN	TEX

ATL	CHC	ARI
MIA	CIN	COL
NYM	MIL	LAD
PHI	PIT	SDP
WSN	STL	SFG