Defensive Regression Analysis: Complete Series
Editor’s Note: Last year, Michael Humphreys introduced a revolutionary new fielding statistic, called Defensive Regression Analysis (DRA), which represents an entirely new way of thinking about fielding stats. DRA uses stats that are available throughout baseball history so it can be used to evaluate fielders of any era. We consider it a significant improvement over fielding Win Shares.
The original DRA article (pdf) submitted to Baseball Primer is now available. Also, Web Archive has the original Primer articles, with the correct formatting — Parts One, Two, and Three.
In this series of three articles, Michael will explain DRA, use it to evaluate major league fielders from 2001-2003, and compare it to zone-based systems such as Zone Rating and Ultimate Zone Rating in order to verify its accuracy.
I. Introduction and Summary of Results
A. General Introduction to Fielding Stats
Alan Schwarz says in his wonderful new book, The Numbers Game: Baseball’s Lifelong Fascination with Statistics, that “in some ways, fielding is baseball statistics’ holy grail.”
Baseball analysts have been rising to the challenge, and the quality of fielding information available to fans has never been higher, particularly for the 2001-03 seasons.
Back in the 1980s, Dick Cramer helped create the first and most widely used proprietary record of the number of batted balls hit reasonably close to each fielding position, and the percentage rate at which individual fielders turn such batted balls into outs. The resulting Zone Ratings (ZR) eventually found their way to the public (though not the underlying data, except at a price and with licensing restrictions); during 2004, ESPN posted 2001-03 ZR on the Internet for all major league players.
Beginning in 2001, Mitchel Lichtman purchased zone data even more detailed than what is used for publicly posted ZR, converted it into runs saved or allowed (“runs saved”) ratings, and published his results (Ultimate Zone Ratings, or UZR) for the 2001 through 2003 seasons (as well as the 1999 and 2000 seasons) at Baseball Think Factory (BTF), as well as Tangotiger’s site.
Tom Tippett at Diamond Mind (DM) has posted “Gold Glove reviews” for the 2001-2003 seasons, in which he uses high-quality zone data, traditional data and even videotapes of performance to provide thoughtful verbal evaluations of the best fielders and a few notably bad ones.
In 2002, Bill James published his latest fielding evaluation system (Fielding Win Shares, or FWS), which, when introduced, was the best publicly disclosed and reproducible method for rating fielders throughout major league history. Last year, Studes here at The Hardball Times began posting FWS, and Total Baseball updated Pete Palmer’s Fielding Linear Weights (FLW) to reflect many of the ideas of FWS. At this point, FLW is probably more accurate than FWS at most positions, but only because it has incorporated Bill’s ideas.
In 2003, David Pinto introduced a new system (a Probabilistic Model of Range, or PMR) based on proprietary play-by-play data similar to (yet clearly different from) zone data, and has published ratings for 2003 and 2004.
Sometime in the past few years (I’m not sure when), Baseball Prospectus, on its website, posted individual runs saved ratings throughout major league history, based on Clay Davenport’s Davenport Fielding Translation (DFT) system, which converts traditional fielding statistics into runs saved ratings, but which was never fully disclosed (at least in a reproducible way), and which is currently described in only general terms. As I will explain below, I believe DFT currently provides the most accurate fielding evaluations throughout major league history. I also believe that DFT has probably been improved by incorporating Bill James’ ideas.
B. Introduction of DRA in Late 2003
In November 2003, I published a 3-part article (Parts One, Two, and Three) at BTF (then called Baseball Primer) introducing a new pitching and fielding evaluation system I had developed. I called the system Defensive Regression Analysis (DRA) — I apologize for proliferating acronyms and abbreviations; but they do save space and time — because it is based on the statistical technique known as “regression analysis.” Regression analysis has long been used to generate equations for evaluating batters similar to Pete Palmer’s Batting Linear Weights equations, but has never been used before (or at least so comprehensively) to evaluate pitching and fielding. Fans who might be put off by statistical jargon can think of it as Defensive Runs Analysis. The acronym rhymes with ERA, so that should make it easier to remember.
DRA is the first pitching and fielding evaluation model that systematically works through and determines the statistically significant relationships between traditional, publicly available pitching and fielding statistics and the actual number of runs allowed by a team. DRA yields formulas just as simple as the well-known one-line FLW equations in Total Baseball, that enable us to estimate the number of runs saved or allowed by pitchers and fielders (a) relative to the league average and (b) independently of each other. DRA is designed to be fundamentally accountable—the pitcher and fielder ratings add up to a team DRA rating (i.e., an estimate of the number of runs the team should have allowed). Such estimates are as or more accurate than Batting Linear Weights or Runs Created estimates for the number of runs a team should score.
This article will try to show, through a careful consideration of the published 2001-03 results of a multitude of fielding evaluation systems using proprietary zone-type data, that DRA has essentially solved the problem of evaluating fielding using traditional fielding statistics. The two simplest criteria for determining the accuracy of a fielding evaluation system that uses only traditional statistics are (a) its correlation with the best zone-based systems (i.e., how well it estimates the relative quality of fielders) and (b) how close the standard deviation in its runs-saved ratings is to the standard deviation of the best zone-based ratings (i.e., how well it estimates the absolute impact of fielders). DRA ratings at all positions (other than first base, for reasons which I’ll explain) have an overall 0.8 correlation with — and almost exactly the same standard deviation as — ratings derived from the best proprietary zone-based data.
Based on its correlation and standard deviation output, DRA is significantly more accurate than FWS and FLW, and meaningfully (not terrifically, but meaningfully) more accurate than ZR (which is based on proprietary data) and DFT (which is based on a proprietary methodology). The DRA methodology is proprietary now as well, but the general principles and most of the techniques used in DRA are disclosed in the November 2003 article, and sometime this summer, I plan to complete a draft of a book that will reveal the method and formulas in complete detail. (This article is a first draft of one of the technical appendices; I fully intend to make the main part of the book more accessible, but I trust that The Hardball Times readership will appreciate the detail here. I hope you will find this the most careful assessment yet published of any fielding systems.)
When DRA becomes “open source,” baseball fans will not only have a way of generating good historical ratings for themselves, but also a tool that, combined with ZR output posted on the ESPN website, will enable them to produce with minimal effort contemporary fielder runs-saved ratings having close to a 0.9 correlation with proprietary zone-based ratings. Fans will be back to having information nearly as good as what the teams use. (In the meantime, this article actually provides the first objective evidence that DFT ratings available for free online are usually quite good, particularly at the most important positions: shortstop, second base, and centerfield.)
DRA will be (as DFT is today) an especially valuable resource for fans as alternative sources of information disappear. Two of the best sources of fielding information will apparently not be publicly available for seasons after 2003. Mitchel Lichtman has been hired by the St. Louis Cardinals and will no longer be publishing UZR. It appears that Tom Tippett won’t be publishing a Gold Glove essay for 2004; he used to get them out by the December after each season, but no 2004 essay has appeared as of February 2005.
C. Summary of Results of DRA Test for 2001-03
This article is being published in three parts. So that you won’t be in complete suspense, I’ll now post the summary results of the “test” of DRA against UZR, as well as corresponding results for ZR and DFT. Part 2 of this article will provide background as to the whys and hows of the UZR, DRA, DFT and ZR test. Part 3 will provide the complete results and some brief further explanations regarding individual players.
The chart below shows the average UZR, DRA, ZR and DFT runs-saved ratings over the 2001-03 time period for players who played at least 130 games at a single position for at least two of those seasons (without splitting seasons between teams) (a “Full Season”). Catchers are not included because I did not have a complete sample of UZR ratings for them. The DFT ratings are the RAA2 ratings available online for the player when playing his main position. ZR has been converted into runs saved by calculating marginal plays made and multiplying that value by the sum of the approximate value of the out created (about 0.3) and the hit saved (about 0.5 to 0.6, depending on the position). If a player played only two Full Seasons, the average rating shown is for only those two Full Seasons. The “Yrs” column indicates if the rating is based on three Full Seasons. As will be explained later, the sample does not include players whose DRA rating differs from UZR by more than one UZR standard deviation if non-UZR zone-type sources (DM, PMR, ZR) and other reliable sources of information agree more with DRA than UZR. I’ll explain each of these cases in Parts 3 and 4.
Also included is “DRAZR”, a weighted average of .67*DRA + .67*ZR, which shows a very, very good match with audited UZR. Once DRA is open source, fans will easily be able to reproduce “DRAZRs” (rhymes with “razors,” as they’re so sharp; and “lasers,” because they’re so precise) and evaluate contemporary fielders with confidence. DRAZRs work because DRA and ZR measure different things with different data, so they complement each other.
Average, standard deviation, and correlation numbers are provided at the bottom of the chart, broken down into three categories: (a) all positions except catcher (“3456789”), (b) all positions excluding first base (“456789”), and (c) all positions excluding first and right field (“45678”).
Pos Yrs Last UZR DRA ZR DFT DRAZR 6 Aurilia 1 -9 5 5 -3 6 3 Cabrera 11 13 5 13 12 6 Cruz -5 -10 3 -4 -4 6 Furcal 7 5 -5 -4 0 6 Garciaparra 9 12 -3 -7 6 6 Gonzalez, A. 2 2 3 -6 4 6 3 Gonzalez, A.S. 8 8 5 10 9 6 Guillen 3 -3 1 -11 -1 6 Hernandez 16 10 8 7 12 6 Jeter -25 -22 -19 -20 -28 6 Ordonez -1 -1 4 6 2 6 3 Renteria 7 7 5 -3 8 6 3 Rodriguez 9 -5 12 10 4 6 3 Tejada -1 6 -8 -11 -1 6 Vizquel 8 -4 -4 9 -5 6 Wilson -8 3 -1 2 2 Pos Yrs Last UZR DRA ZR DFT DRAZR 4 Alomar -13 -13 -9 2 -15 4 3 Anderson -4 2 8 -9 7 4 3 Boone 14 9 4 7 9 4 3 Castillo 0 1 6 -2 5 4 Grudzielanek 11 4 5 -2 6 4 3 Kennedy 21 15 13 11 19 4 Kent 7 4 2 12 4 4 Rivas -20 -22 -14 -19 -24 4 3 Soriano -4 -15 -8 -11 -15 4 Vidro 1 0 -5 -5 -3 4 Vina 4 -10 4 -6 -5 4 Walker -9 -9 -5 -12 -9 4 Young -11 -8 5 -3 -1 Pos Yrs Last UZR DRA ZR DFT DRAZR 5 Alfonzo -3 2 2 0 3 5 Batista -3 3 -4 10 0 5 Beltre 16 1 5 -10 4 5 Castilla 1 -4 2 -8 -1 5 3 Chavez 17 12 8 12 13 5 Glaus -10 -2 -1 -4 -1 5 3 Koskie 11 9 7 11 11 5 Lowell -6 -3 -2 14 -3 5 Rolen 18 13 6 10 13 5 Ventura 19 10 7 3 12 Pos Yrs Last UZR DRA ZR DFT DRAZR 8 3 Beltran 6 6 8 9 9 8 3 Cameron 28 24 10 12 23 8 Damon 13 -1 5 4 3 8 Edmonds -4 5 4 10 6 8 Erstad 42 36 13 20 32 8 3 Hunter 8 0 4 3 2 8 3 Jones 15 24 -1 17 15 8 Wells -2 -14 5 0 -6 8 Williams -20 -11 -14 -11 -17 Pos Yrs Last UZR DRA ZR DFT DRAZR 7 3 Anderson -2 4 6 6 6 7 Bonds -8 -6 -7 -4 -8 7 3 Burrell -11 -12 2 0 -7 7 3 Gonzalez 11 3 5 5 6 7 Jones, C. 2 -5 2 -6 -2 7 Jones, J. 14 12 9 5 13 7 3 Lee 4 0 10 -3 7 Pos Yrs Last UZR DRA ZR DFT DRAZR 9 3 Abreu -7 -4 5 -6 0 9 3 Green -13 9 5 6 9 9 Guerrero 16 0 4 -6 3 9 3 Ordonez -7 1 5 2 4 9 3 Sosa -6 -2 -1 -4 -2 9 3 Suzuki 7 15 -2 12 9 Pos Yrs Last UZR DRA ZR DFT DRAZR 3 3 Bagwell 7 0 -6 -5 -3 3 Casey 5 -11 14 -3 2 3 3 Delgado -2 1 -1 -3 0 3 3 Helton 22 11 7 16 12 3 Konerko -10 -6 -3 -11 -6 3 3 Lee, D. 9 8 12 7 13 3 3 Lee, T. 11 8 13 8 14 3 Martinez 14 5 8 9 8 3 3 Mientkiewicz 10 6 7 12 8 3 3 Olerud 0 8 2 2 7 3 3 Sexson -5 9 1 15 6 3 Thome -14 -10 5 -6 -3 3 Young 10 0 12 5 8 UZR DRA ZR DFT DRAZR Avg 3456789 3 2 3 2 3 Std 12 10 7 9 10 Correl w/ UZR 1.00 0.76 0.64 0.60 0.82 UZR DRA ZR DFT DRAZR Avg 456789 3 2 2 1 2 Std 12 11 7 9 10 Correl w/ UZR 1.00 0.79 0.67 0.59 0.84 UZR DRA ZR DFT DRAZR Avg 45678 4 1 2 1 2 Std 12 11 7 9 11 Correl w/ UZR 1.00 0.84 0.72 0.64 0.89
It’s pretty clear that DRA is very, very accurate at all positions except first base and right field. Perhaps larger samples will improve the situation in right field. My only concern is that the DRA average is lower than the UZR average, though DRA, ZR and DFT are all closer to each other than to UZR. This reflects the effect of outliers; the median ratings are closer together:
Medians UZR DRA ZR DFT DRAZR med 3456789 3 2 4 2 4 med 456789 2 1 4 2 3 med 45678 3 2 4 2 3
During the 2001-03 period, the best fielders at each position, taking into account per-season ratings and the ability to play three Full Seasons:
Shortstop: Orlando Cabrera (whom saber-saavy Boston picked up for defense). Second: Adam Kennedy. Third: Eric Chavez; perhaps Scott Rolen, who had a great 2004 after an OK 2003. Center: Mike Cameron, perhaps Andruw Jones. Right: Ichiro! (DM commentary is closer to the DRA +15 rating). Left: Jacques Jones. First Base: Todd Helton.
And their DRA ratings are all fundamentally in agreement with the UZR/DM consensus.
II. Explanation of DRA Testing Methodology
A. DRA Test Results for 1999-2001
The November 2003 DRA article on Baseball Primer described in general terms how DRA was developed, but did not include the actual formulas or all of the ideas necessary to replicate the system, because I was considering selling DRA to a team for minor league player evaluation. For a variety of personal reasons, I have now decided instead to publish a book containing the formulas, a complete explanation of their derivation, ratings of all of the best fielders in baseball history, and new (some will probably find them radical) time-line adjustments for changing talent pools.
As the DRA article did not provide explicit formulas, I thought the best way to promote interest in the system would be to show how its results compared to results from what I consider to be the best system, UZR, and to report historical DRA results I had already prepared for 1974-2001. The UZR-DRA comparison included all players who had played at least two full seasons (at least 130 games) at one position (without splitting a season between teams) (“Full Seasons”) anytime between 1999 and 2001. Average per-player DRA ratings over that time period had an approximately 0.7 correlation with corresponding UZR ratings, and approximately the same “scale” (as measured by standard deviation of ratings) as UZR.
Although I believed UZR was the best reference point for testing DRA, I knew it wasn’t perfect, for reasons I’ll explain below. When I adjusted UZR ratings to reflect zone-based DM commentary during that period (DM actually has more detailed individual fielder evaluations, including non-Gold-Glove-quality fielders, for the 1999-2000 seasons than for the 2001-2003 seasons), the correlation rose to slightly over 0.8. A BTF poster called AED (a Ph.D. who said he had also developed a regression-based fielding model, though he has not published it, and I do not believe it provides ratings integrated with pitcher ratings, or ratings denominated in runs) explained that the appropriate thing to do if UZR ratings differed from DM evaluations was just to delete the UZR (and corresponding DRA) ratings from the sample. When I did so, the correlation was still slightly greater than 0.8.
After publishing this study, I came across two new pieces of information that were encouraging. First, Tangotiger wrote that there is so much noise in even the best proprietary zone-based fielding data that one really needs at least two years of full-time UZR ratings to get a reliable rating. So my approach of comparing the average of two Full Seasons (or, if available, three Full Seasons) of UZR and DRA ratings over 1999-2001 made sense. Second, Ken Ross, baseball fan and former President of the Mathematical Association of America, was, in his own words, “fearless” enough to actually provide a “rough” estimate of what a “strong” positive correlation should in general be: 0.7 (A Mathematician at the Ballpark: Odds and Probabilities for Baseball Fans, 127, Pi Press, 2004). So I felt more comfortable saying that DRA had a strong correlation (at least 0.7) with UZR ratings (and a probably 0.8 correlation with “correct” UZR ratings) derived from an appropriate sample (players with at least two Full Seasons of ratings).
B. Motivation for New DRA Test
As I began compiling 1893-2003 Retrosheet data for the historical ratings section of my book, I thought I should try one more time to see if I could improve DRA, which seemed to have some difficulties rating third basemen and right fielders in the 2001-2003 study. I theorized that the problem was that traditional baseball statistics don’t provide ground out and fly out data for left-handed and right-handed pitchers separately, but that there were a few potential indirect approaches to addressing the problem. (ESPN may report actual ground ball and fly ball data for contemporary pitchers, so perfect lefty ground ball/fly ball and righty ground ball/fly ball calculations can be done for recent seasons, but the goal of DRA has always been to provide a system that works throughout major league history, as well as for minor league and non-U.S. prospects.)
Furthermore, close analysis of 1993-2003 Retrosheet data suggested that the indirect effect of left-handed pitching had drastically declined after 1995 in the National League, reflecting the increased use of LOOGies and ROOGies, as well as the shortage of utility and platoon players, so the left-handed pitching adjustment (independent of ground ball / fly ball adjustments) should have been different in 1999-2001. (As you probably know, left-handed pitching doesn’t “cause” more ground outs to the left side and more fly outs to the right side; rather, if an opposing team has enough players to platoon, left-handed pitchers will face more right-handed batters, who will generate more ground outs to the left side and fly outs to the right side than left-handed batters, regardless of the handedness of the pitcher.)
I made some interesting discoveries, which I’ll discuss in the book, including a few that seem to have improved ratings at third base, but there are still lingering problems in right field (though the same is true for ZR and DFT, which seem to do even worse there). At first base, the fundamental problem for all but zone-based systems, identified a few years ago by Charlie Saeger and Bill James, is that we not only don’t know the number of fielding opportunities at first base (batted balls fieldable at first — that problem is true at all positions for non-zone-based systems); we don’t even know precisely the number of batted balls fielded by first basemen, still less the number of ground balls fielded by first basemen (pop-ups and most fly outs caught by infielders are usually discretionary chances). We can make estimates, and they’re getting better, but non-zone-based first base ratings are probably only really reliable over time periods longer than two or three years.
C. Accuracy and Reliability of UZR
UZR is the best system out there, and Mitchel Lichtman deserves the thanks of all baseball fans for paying the multi-thousand-dollar price for the necessary data and generating zone-based run-savings ratings when no one else in baseball was willing to. However, UZR ratings are derived from extremely large and complicated data sets, and Mitchel was not then part of an organization that could provide auditing and de-bugging back-up. So it’s not surprising that there have been errors in the past in how the UZR data has been processed. (I know; I helped Mitchel fix a formula that had almost eliminated the year-to-year consistency of per-player UZR ratings.) Even now, a few individual UZR ratings are clearly wrong. I am certain that I have made errors in compiling and analyzing my data; we’re all human, and we’re not getting paid to do this.
A few examples might illustrate the need to double-check UZR. As mentioned above, DM uses zone data similar to UZR’s and actually provides (in its team essays) commentary for most full-time fielders, whether good, bad or indifferent, for the 1999 and 2000 seasons, whereas the 2001-03 “Gold Glove” essays primarily address only the best fielders. So there is a large sample of 1999-2000 full-time fielders who have both a UZR rating and a highly specific DM evaluation based primarily on high-quality zone data.
The 1999 UZR rating for shortstop Rey Ordonez is +39 runs. That would translate into about 55 plays above average. Yet in its detailed evaluation of Rey Ordonez’s 1999 season, DM says, “Error totals aren’t usually a good indication of fielding prowess, but the four errors charged against Ordonez were impressive nonetheless.” Not a word about what would be possibly the greatest range performance at shortstop in baseball history, if UZR were correct. And DM acknowledges elsewhere that marginal plays measured using the best zone data can be quite high:
In a typical season, the top fielders at each position make 25-30 more plays than the average. Exceptional fielders have posted marks as high as 40-60 net plays, but those are fairly uncommon. Recent examples include Darin Erstad in 2002, Scott Rolen just about every year, and Andruw Jones in his better seasons. The worst fielders tend to be in the minus 25-40 range. DM, “Evaluating Defense.”
The 2001 UZR rating for Ordonez, albeit after an injury-shortened 2000, is –6. DM (correctly) identifies Rey Sanchez as the best shortstop in 1999 (UZR +31, DRA +33).
The 2000 UZR rating for centerfielder Doug Glanville is -32 runs: a truly catastrophic performance. This is what DM says about that season: “Glanville’s speed is still his best asset. He is a very good base stealer and has been a strong performer in center field, though 2000 was far from his best year defensively. Glanville made just four errors in 2000 and threw out nine runners.” It is true that speed alone won’t make a good outfielder (Lou Brock was a poor outfielder and Tim Raines nothing special). But an established “strong” (presumably well-above-average) performer in centerfield whose stolen base attempts and success rate have not declined will not cost his team 32 runs, even if his performance is “far from his best.” The 2000 Glanville DRA rating (not including arm) is -3. Another quick example in centerfield: the 2000 Gerald Williams UZR rating is -42. Yes, minus forty-two runs, close to sixty plays below average. DM says, “Playing average-at-best defense.” DRA rating: +3.
A close review of UZR ratings and detailed DM commentary for 1999 and 2000 suggests that about a third of the single-season centerfielder (I haven’t check left and right field) and about a quarter of single-season infielder assessments are clearly inconsistent, meaning that UZR and DM are effectively well more than 10 runs apart, i.e., UZR gives a poor rating (say, -15 or -25), whereas DM says the player was average or only subpar; or UZR gives a Gold-Glove type rating (+15 to +25 or more) and DM had nothing positive to say. Remember: DM looks at the same kind of zone data used by UZR. Yet Tangotiger was onto something when he suggested using at least two years of UZR ratings, because after two years, UZR is (at least as far as I can tell) very, very close to the broad consensus of other zone-like evaluations, such as DM, PMR or ZR, about 90% of the time.
D. Procedure for Determining Data Points to Delete
In spite of these difficulties, I still must say (for at least the third time) that UZR is the best 2001-03 resource for fans. It just has to be considered with care. In trying to determine when a UZR (and corresponding DRA, DFT and ZR) rating needed to be deleted from the sample, I finally settled upon the following principles, based on AED’s advice:
(1) If a UZR rating differs from DRA by more than one (UZR) standard deviation (12 runs), the rating is flagged.
(2) If non-UZR zone-based assessments agree with DRA more than with UZR, the player’s rating is deleted from the sample, otherwise it remains.
(3) In evaluating the non-UZR zone-based assessments, I consider the following sources of information, in descending order of reliability and relevance: (a) DM “Gold Glove” essays for 2001-03, (b) DM individual player evaluations in 1999 and 2000, (c) PMR ratings in 2003 and 2004, (d) ZR in 2001-03, and, as a last resort if no other information is available (e) anything I can think of, including rates or changes in rates of hitting triples, attempting steals, or getting caught stealing.
This will all become clearer as we get down to cases. The important things to bear in mind are:
(1) All results are disclosed, including the ones I believe should be deleted.
(2) All deletions are explained.
(3) I considered and analyzed many “rules” for deletion, including some that might be considered more rigorous by a trained statistician, but I wanted a procedure that would require minimal explanation for purposes of this article (I may, depending on the patience of editors, provide a more rigorous but far more elaborate procedure for the book).
(4) Under every deletion methodology I considered, the overall DRA correlation with UZR (excluding first base) was always between 0.69 and 0.79; DFT’s was always between 0.54 and 0.66; ZR’s was always right around 0.67. In every case, DRA had a higher correlation than ZR and DFT. For example, when I followed the AED algorithm, but applied it to DFT to delete UZR ratings inconsistent with DFT and other zone-based systems without regard to DRA, the overall DFT correlation did not increase meaningfully (it went up 0.60), and though the DRA correlation did decline (to 0.73), it was still clearly higher than DFT’s.
(5) I have probably spent more time studying all full-time player UZR, DFT, ZR, PMR and DM evaluations from 1999-2004 than anyone else, certainly any sane person, including the creators/authors of each of those systems. My firm conviction is that the DRA “correlation” with the “truth” is closer to 0.8 than 0.7. If you’re inclined to disagree, all the information I have relied upon is at your disposal, so you can make your own assessment. What is undeniable, however, is that DRA’s standard deviation numbers, as you will see, are clearly more in line with UZR than those of any other system (though still slightly more conservative than UZR), which leads me to my last point.
(6) FWS and FLW are not included in the comparison, because they are less accurate than DFT on inspection. FLW outfield ratings, as a casual look through Total Baseball will reveal, are at least 50% too “compressed” compared with infielder ratings, which are getting better but still effectively double-count double plays and errors and overemphasize putouts. All FWS ratings are compressed about 50% too much. In addition, the outfielder ratings have a poor correlation with UZR, based on a 2003 study available at Raindrops.
III. Complete DRA Test Results for 2001-03
So that you won’t be overwhelmed by a mass of numbers, I’ll show results separately position-by-position, explain the deletions and make other comments below. The “check” comment indicates a greater-than-12-run difference between UZR and DRA (as calculated by the Excel spreadsheet, without rounding); the word “dra” indicates that non-UZR zone-based information supports DRA more than UZR; “pk” flags a park effect in Colorado, which DM says depresses out-conversion rates at each outfield position by about 24 plays per season. (DRA currently has no park factors, and I personally believe they’re not worth the trouble in evaluating fielders except in extreme cases, such as left field in Fenway.)
A. Shortstop
Pos Yrs Last UZR DRA ZR DFT 6 Aurilia 1 -9 5 5 6 3 Cabrera 11 13 5 13 6 Clayton 15 -4 4 5 Check dra 6 Cruz -5 -10 3 -4 6 Furcal 7 5 -5 -4 6 Garciaparra 9 12 -3 -7 6 Gonzalez, A. 2 2 3 -6 6 3 Gonzalez, A.S. 8 8 5 10 6 Guillen 3 -3 1 -11 6 Guzman -16 -2 -12 -5 Check dra 6 Hernandez 16 10 8 7 6 Jeter -25 -22 -19 -20 6 Ordonez -1 -1 4 6 6 3 Renteria 7 7 5 -3 6 3 Rodriguez 9 -5 12 10 6 3 Rollins -11 3 2 -1 Check dra 6 3 Tejada -1 6 -8 -11 6 Vizquel 8 -4 -4 9 6 Wilson -8 3 -1 2 Avg 1 0 0 0 Std 11 9 7 9 corr w/UZR 1.00 0.58 0.69 0.62
UZR rates Royce Clayton as one of the two best shortstops of the period. He is never mentioned in 2001-03 DM Gold Glove reviews. The negative inference from just that data source is that he was either average or below. DM specifically says in its team evaluation in 1999 and 2000 that Royce was average. DRA, DFT and ZR are all closer to this evaluation.
Guzman is trickier to evaluate. He is the worst shortstop under UZR (other than every sabermetrician’s “favorite” fielder, Jeter). He’s not mentioned in the 2001-03 Gold Glove essays, but all we can conclude from that is that he was average or below. DM says nothing about Guzman in 2000, but in 1999 they describe him as “spectacular” but “inconsistent,” due to his “botching” the easy plays and making too many errors. His error rates actually went down during the 2001-03 period, and his stolen base data doesn’t suggest a decrease in his speed. As reported by Raindrops, PMR gives him a 2003 per-162-game rating of -6; that’s a little difficult to interpret, because it includes fielding fly balls, which UZR, ZR, DM (and DRA) ignore. (I don’t know about DFT.) Guzman’s 2004 PMR ground ball fielding rating (which is what UZR and DRA focus on) is only -5 plays, or -3 runs, in a season in which PMR shows league-average shortstops underperforming multi-year baseline data, so Guzman was average or perhaps even slightly above-average in 2004. ZR probably rates him low because the ZR “zones” exclude a lot of the field, so that a low-range, sure-handed fielder will look better than a rangier fielder who “botches” the routine plays. I believe the evidence overall suggests that Guzman was probably a little below average in 2001-03, but not a downright poor fielder.
The UZR A-Rod rating is more accurate than the DRA rating. Without going into all the detail, DM consistently identifies A-Rod as someone with average-to-slightly-better-than-average range and good hands. (He does have notably good error rates.) My best guess as to how DM evaluates range and surehandedness is that range includes reaching a batted ball, even if you drop it for an error. Thus, a fielder such as A-Rod with slightly better than average range and great hands will make several plays more than average overall. This is reflected in his high ZR. Solid, sure-handed fielders have higher relative ZR than UZR (or DRA).
Jimmy Rollins is never mentioned by DM, but has a +3 runs-per-162-game PMR rating, and is -4 plays under PMR in 2004 (separate ground ball rating unavailable). ZR rates him +2 for his Full Seasons in 2001-02. I think the non-UZR consensus is closer to DRA than UZR.
I have a theory that UZR mismatches in the infield are somehow due to the calculation of the effect of errors. UZR tracks them and calculates them separately from plays made. Tangotiger reports on his website that allowing a runner to reach base on an error is basically no worse than not making the play at all. Errors should simply be treated the same as plays not made. My best guess is that UZR somehow overrates errors, so that surehanded infielders rate higher and error-prone infielders rate lower. Recall the +39 Rey Ordonez rating for 1999, in which he made only four errors.
If you delete Clayton, Guzman and Rollins from the sample, you get the following summary results at shortstop (edited sample size of 16 players):
Shortstop (16) UZR DRA ZR DFT Avg 2 1 1 0 Std 10 9 7 9 corr w/UZR 1.00 0.75 0.69 0.65
DFT does well, in some ways better than ZR, because it shows a better standard deviation.
B. Second Base
Pos Yrs Last UZR DRA ZR DFT 4 Alomar -13 -13 -9 2 4 3 Anderson -4 2 8 -9 4 Biggio -1 -18 -13 -14 check dra 4 3 Boone 14 9 4 7 4 3 Castillo 0 1 6 -2 4 Grudzielanek 11 4 5 -2 4 3 Kennedy 21 15 13 11 4 Kent 7 4 2 12 4 Rivas -20 -22 -14 -19 4 3 Soriano -4 -15 -8 -11 4 Vidro 1 0 -5 -5 4 Vina 4 -10 4 -6 4 Walker -9 -9 -5 -12 4 Young -11 -8 5 -3 avg 0 -4 0 -4 std 11 11 8 9 corr w/UZR 1.00 0.83 0.66 0.71
The decision to delete Biggio is difficult. Let’s look at the complete 2001 DM analysis regarding Biggio:
This former Gold Glover missed the last two months of the 2000 season with a knee injury that required surgery. In January, his general manager warned that Biggio’s range and baserunning ability would most likely be limited, especially early in the year. Those comments proved to be accurate, as Biggio’s range was far below its previous level and he stole only seven bases, down from 50 only three years ago. His baserunning instincts are still good, so he was a little above average in that regard, but nowhere near the Excellent level he sustained before he hurt his knee. DM, “2001 Gold Glove Review.”
I’m not certain whether “Excellent” refers to his baserunning or his fielding, but grammatically it refers to baserunning, and DM provides such ratings (“Excellent,” “Very Good,” etc.) for baserunning separately from hitting and fielding. DM says Biggio had a “classic year” either in 1999 or 2000, without mentioning fielding in particular, so I have no particular DM information about what Biggio’s “previous [pre-2001] level” of fielding performance was.
Just for the sake of argument, let’s assume DM rated Biggio’s fielding as “Excellent” at one point during his career. One of the difficulties of relying on DM is that they seem to think that overall fielder ratings should balance out range and sure-handedness; one of their essays asks whether one should prefer ranginess or sure-handedness. No balancing act is appropriate—you simply need to look at the number of plays made given zone-identified opportunities, and when I look to DM for evidence about UZR, I focus on the range information and then adjust up or down for the (usually) relatively tiny variance in error rates. Biggio did always have low error rates at second, so perhaps an Excellent rating could reflect DM’s tendency to value sure-handedness for its own sake. Anyway, my November 2003 DRA article has a long discussion about Biggio, who has poor DRA ratings during his early seasons after making the historic (first ever?) shift from catcher to second base, eventually has a good DRA rating or two, then has terrible DRA ratings after his injury in 2000. I also quote Biggio’s admission that switching from catcher to second was the “hardest thing” he ever did in his life. My take is that with his speed (and perhaps agility) gone, he could no longer field second base effectively. Even his ZR numbers (which, remember, overemphasize surehandedness and underemphasize range) are quite low. Deleting Biggio from the sample helps DRA, but also ZR and DFT.
DM identifies Vina in 2001 as being the best National League second baseman, with “above average” range. In 2002 they say his range declined “noticeably” to “near average.” That sounds like a +10 to +15 in 2001 and a +5 in 2002, for what would be an average rating between +5 and +10. UZR is clearly correct.
Second base summary results, with Biggio deleted.
Second Base (13) UZR DRA ZR DFT Avg 0 -3 1 -3 Std 11 11 8 9 corr w/UZR 1.00 0.88 0.73 0.74
C. Third Base
Pos Yrs Last UZR DRA ZR DFT 5 Alfonzo -3 2 2 0 5 Batista -3 3 -4 10 5 Bell 28 10 9 7 check dra 5 Beltre 16 1 5 -10 check 5 Castilla 1 -4 2 -8 5 3 Chavez 17 12 8 12 5 Cirillo 25 9 9 12 check dra 5 Glaus -10 -2 -1 -4 5 3 Koskie 11 9 7 11 5 Lowell -6 -3 -2 14 5 Ramirez -25 0 -5 -9 check dra 5 Rolen 18 13 6 10 5 Ventura 19 10 7 3 Avg 7 5 3 4 Std 15 6 5 9 corr w/UZR 1.00 0.77 0.94 0.43
During his Full Seasons in 2001 and 2002, David Bell was probably about as good as Scott Rolen was, on average, during his 2001 and 2003 Full Seasons, that is, about +15 runs per season, on average. He was not 10 to 15 runs better than Rolen. DM says Rolen was “amazing” in 2001, and elsewhere says he was approximately +40 plays, or about +30 runs. That I would consider the absolute upper bound for single-season performance at third. DM notes that Rolen fell to just about average performance in 2003. Thus an average +15 rating or so; UZR gives him a +18 rating. DM never describes Bell in those terms, but does say he was second or third best in the American League in 2001 and second best in 2002. He’s clearly a +10 to +15 guy.
A similar problem occurs with Cirillo. DM says he was the second or third best in the National League in 2001; UZR gives him a +35 runs saved rating. DM doesn’t mention Cirillo at all in the 2002 Gold Glove essay. Assuming he was average then, he looks to be about +5, maybe +10 on average over those two seasons.
UZR is probably right about Beltre, and his rating is the only infield DRA rating I’m really unhappy about. DM does not mention Beltre in the 2002 Gold Glove essay, but it does consider him for its “Gold Glove” for 2003 due to his “good” range. His fielding drew the “very good” and “excellent” adjectives in 1999 and 2000.
Ramirez is probably not a good fielder, but he is almost certainly not the disaster that UZR says he is. There is no DM commentary, but his 2003 PMR (at a 162-game pace) is only -6 runs. His ZR over his 2001-02 Full Seasons is -5. His provisional 2004 PMR is about +4 runs, but that is against a base that seems to rate most third basemen in 2004 as above average against a multi-year baseline.
Deleting Bell, Cirillo and Ramirez, here are the summary results at third:
Third Base (10) UZR DRA ZR DFT avg 6 4 3 4 std 11 6 4 9 corr w/UZR 1.00 0.80 0.89 0.11
DRA’s standard deviation is a little low, but ZR’s is even smaller.
D. Center Field
Pos Yrs Last UZR DRA ZR DFT 8 3 Beltran 6 6 8 9 8 3 Cameron 28 24 10 12 8 Damon 13 -1 5 4 check 8 Edmonds -4 5 4 10 8 Erstad 42 36 13 20 8 3 Finley -16 -3 -3 2 check dra 8 3 Hunter 8 0 4 3 8 3 Jones 15 24 -1 17 8 3 Pierre -1 -4 2 -2 pk 8 Wells -2 -14 5 0 check 8 Williams -20 -11 -14 -11 8 Wilson -4 -26 -9 -6 check pk Avg 5 3 2 5 Std 17 18 8 9 corr w/UZR 1.00 0.81 0.78 0.81
DM describes Damon as the #4 2002 American League centerfielder and in 2003 as “close” to being as good as Torii Hunter, who is described as one of the best centerfielders in either league. UZR is right; DRA isn’t.
UZR gives Finley a -40 runs allowed rating in 2003. Minus forty runs. Though DM concedes that “age caught up with” Finley in 2003, it gives no indication that he was anything close to being that bad. DM rates his range “above average” in 2002, ignores him in their 2001 Gold Glove essay, and says he was “basically average” in 1999-2000. PMR gives him below-average ratings in 2003 (-13 runs per 162 games) and approximately -13 runs as well in 2004. My interpretation of all of this information is that Finley was probably about -15 runs in 2003, perhaps +10 runs in 2002 and +0 in 2001, so between -5 and 0 for 2001-03. I believe that DRA, ZR and DFT are closer to DM’s perception.
Pierre and Wilson ratings are shown, but I will delete both of them for the Denver park effect described above. DM doesn’t discuss Wells in 2002, but says he was “close” to being as good as Hunter in 2003. So DRA is clearly wrong.
Erstad’s UZR and DRA ratings are quite close, so Erstad is not flagged under the “AED” test, but it’s a fair question whether both ratings are so screwy that they should be deleted. DM describes Erstad as its “Gold Glove” centerfielder in 2001 and “Defensive Player of the Year” in 2002. Furthermore, based on the DM quotation in Part II.C referred to in the Rey Ordonez discussion, Erstad may in fact have made close to 60 marginal plays in 2002. Arguably that would results in a +30 average runs-saved rating over two years. Since DRA isn’t too far from that, I left it in.
My best guess for the sometimes extreme UZR ratings in the outfield is that the park effects used under UZR may be excessive.
Deleting Finley, Pierre and Wilson, we get the following summary results:
Center Field (9) UZR DRA ZR DFT Avg 10 8 4 7 Std 18 17 8 9 corr w/UZR 1.00 0.86 0.78 0.83
E. Corner Outfielders
Left field and right field are combined in this chart, because the sample of players who played full-time at just one of those positions is so small.
Pos Yrs Last UZR DRA ZR DFT 9 3 Abreu -7 -4 5 -6 9 Burnitz -15 0 -4 1 check dra 9 3 Green -13 9 5 6 check 9 Guerrero 16 0 4 -6 check 9 3 Ordonez -7 1 5 2 9 3 Sosa -6 -2 -1 -4 9 3 Suzuki 7 15 -2 12 7 3 Anderson -2 4 6 6 7 Bonds -8 -6 -7 -4 7 3 Burrell -11 -12 2 0 7 3 Gonzalez 11 3 5 5 7 Jones, C. 2 -5 2 -6 7 Jones, J. 14 12 9 5 7 3 Lee 4 0 10 -3 Avg -1 1 3 1 Std 10 7 5 6 corr w uzr 1.00 0.42 0.45 0.12
Once again, right field gives DRA (but also ZR and DFT) trouble. Shawn Green is the only player, in either the 1999-2001 or 2001-2003 DRA tests, whom DRA rates as clearly above average who is pretty clearly below average. DM says there is a slight park effect favoring outfielders at Dodger Stadium, but I don’t think that can explain the error. All other “misses” generally involve the failure to identify a good fielder by rating him only average. (No good fielder is rated poorly by DRA in either test.) Green actually won a “real” Gold Glove in 1999 without DM second-guessing it, yet (A) DM never mentions Green during 2001-03 Gold Glove essays, (B) the PMR per-162-game rating for 2003 is -23 runs, and (C) the Dodgers moved him to first base in 2004. UZR is probably right about this one.
According to DM, Guerrero had “great range” in 2001, but wasn’t mentioned thereafter, as injuries reduced his range. Nevertheless, the UZR rating is closer than the DRA rating.
Jeromy Burnitz is another tricky case, similar to Biggio, whose deletion from the sample someone could take issue with. DM has nothing to say about him. Raindrops does not provide a PMR rating for 2003. But Burnitz’ team moved him to center for about half his playing time in 2004. He was well below average there (probably about -10 runs for less than half a season’s worth of play) and about -7 runs in right field (also part time), but both those outcomes may be due to the unusual Denver park effect and are in any event are derived from samples far too small to draw any conclusions from. Burnitz maintained average or well-above average Range Factors (ugh) during every one of his five Full Seasons in right field (including four Full Seasons with Milwaukee and one Full Season with the New York Mets), and actually played centerfield for 20 or more games for three seasons with three teams (Milwaukee, LA and Denver). That doesn’t sound like a right fielder who costs his team 15 runs per season.
Deleting Burnitz (and leaving the misses in):
Corner Outfield (13) UZR DRA ZR DFT Avg 0 1 3 1 Std 10 7 4 6 corr w uzr 1.00 0.44 0.35 0.13
DRA doesn’t do too well, but the problem is limited to right field and DRA anyway clearly outperforms ZR and DFT. By the way, the UZR ratings in the outfield do not include “Arm Ratings.” I thought DFT ratings might include the effect of outfielder assists, but the UZR/DFT correlation did not improve by including Arm Rating in UZR.
F. First Base
Pos Yrs Last UZR DRA ZR DFT 3 3 Bagwell 7 0 -6 -5 3 Casey 5 -11 14 -3 check 3 3 Delgado -2 1 -1 -3 3 3 Helton 22 11 7 16 3 Konerko -10 -6 -3 -11 3 3 Lee, D. 9 8 12 7 3 3 Lee, T. 11 8 13 8 3 Martinez 14 5 8 9 3 3 Mientkiewicz 10 6 7 12 3 3 Olerud 0 8 2 2 3 3 Sexson -5 9 1 15 check 3 Thome -14 -10 5 -6 3 Young 10 0 12 5 Avg 4 2 6 4 Std 10 7 6 8 corr w uzr 1.00 0.56 0.50 0.64
Not much to say here, except that DM, PMR and ZR are clearly more consistent with UZR than DRA in evaluating Casey and Sexson. As mentioned before, the problem is that we can’t know how many ground balls are fielded by first basemen. In the November 2003 article I argued that Bill James was wrong, and that estimating unassisted putouts by first basemen is not worth the effort, because too many pop-ups and short fly outs pollute the total. I’ve since come around to Bill’s point of view and adapted his methods to DRA.
References & Resources
I’d like to thank Dick Cramer for his support in the past, Mitchel Lichtman for creating UZR, and baseball analyst, Tangotiger, for making detailed UZR output available in a convenient form. I’d especially like to thank the folks at Retrosheet:
The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at 20 Sunset Rd., Newark, DE 19711.
There is one more absolutely necessary acknowledgement: my own fallibility. In creating DRA and tracking the results of other fielding systems, I had to do a tremendous amount of cutting and pasting and hand-coding of data. I have done my best, but I’m sure there are some errors, though I don’t believe any of them are significant.
I look forward to hearing from you. Don’t hesitate to e-mail with questions, criticisms and corrections.