Measuring defense for players back to 1956 (Part 2)

In January, I introduced the TotalZone system. Using data available from Retrosheet, I was able to create quasi-play-by-play defensive ratings for all players from 1956 on.

To measure a player’s defensive performance, you need to know how many plays he made, how many opportunities he had, and something about the context. We already have estimates for number of plays: putouts and assists allow you to do that. Using Retrosheet data allows you to make better estimates, particularly for infielders. Retrosheet allows an improvement on traditional defensive statistics in terms of estimating a player’s opportunities, as I did in Part 1.

What’s New

I received great feedback on the article, and some suggestions on how to improve it. Here are some that have been incorporated:

1. The TotalZone stat did not account for double plays. After revision it still doesn’t, because I see that as a separate skill. Double plays are an important part of infield defense, and I have created a separate estimate of infielder double play runs to supplement the TotalZone numbers.

2. In the initial dataset, hits were charged based on batter out percentages with a groundball/flyball adjustment for the pitcher. Pitcher handedness was not considered, but it should be. This is especially important for estimating hit distribution for switch hitters. When he was batting left handed, 17 percent of Mickey Mantle’s outs were made by the first baseman. When he was batting right handed, less than 1 percent of his outs went to first. Pitcher handedness is now a consideration for the system; i.e., Mantle vs. left and Mantle vs. right are considered two different hitters when I estimate where a hit was likely to land.

3. I have added data for 1987 to 2007. My original plan was to publish data for the years for which we are missing detailed play-by-play data, because if you can use detailed play-by-play data, you should. I still agree with that, but for sake of continuity I’ll calculate the results for everybody using the same system.

4. In the first article, Carl Yastrzemski’s rating was very high, and while he was a great fielder who won multiple Gold Gloves, it was suggested that the park factor may be overly generous for that field. In all of baseball’s defensive park adjustments, nothing stands out like that Fenway Park wall. In fact, if the Green Monster were torn down tomorrow (especially in light of the recent humidor-influenced seasons in Coors), we probably could get away with not park adjusting any defensive stats.

What I did find was that my Fenway adjustments were indeed too generous because I forgot to do something. I didn’t account for the road park factor—the Red Sox were playing in road parks that were much easier than average, because none of the road parks included Fenway. This has been corrected, and while Yaz still rates as a great left fielder, his rating is not quite as high.

In addition, I have added detail to the park factors. In the first data run, they could have been more accurately described as “franchise factors,” which is okay for a team that played in one park for 1956-1986, but I made sure to start a new park factor every time a team moved to a new park. I also considered Yankee Stadium as a new park after the mid-1970s remodeling, though I did not account for every time a team changed some detail of its ballpark. That would be more work than I can handle, and would greatly reduce my sample size on multi-year park factors. The Yankee Stadium situation was extreme enough (greatly reduced Death Valley) for me to consider it a new park.

5. Retrosheet is missing data for some years, mostly in the National League in 1969 and 1970. The missing games have a disproportionate effect on just a few teams, such as the Braves, Pirates and Astros. I did not properly account for these plays, as I was not giving fielders any credit for making outs (hard to do when you don’t know who fielded it) but still charging them for the hits, as I had the pitcher and batter information.

I decided the easiest way to deal with the data was to assign fractional outs to the fielders for those games with missing plays. Fielders will be treated as essentially average for those games, doing no harm to their overall record. In the first run, Hall of Merit outfielder Jimmy Wynn is shown with a -26 rating for 1969. After the fix, he’s at -9 for that year.

6. I tabulated the data and league averages for infielders with runner on first or first base unoccupied, because this makes a difference in positioning. The difference is most notable at first base, since the fielder must hold the runner on. It also has an impact on middle infielders, as they play closer to second base, in double play position. I doubt that it impacts third basemen or outfielders much, but the same split data will be available for all positions anyway. You can decide for yourself if it’s meaningful.

7. I had the Jim Palmer problem. The Orioles of the late 1960s and early 1970s had incredible defensive numbers. The teams allowed far fewer hits on balls in play than an average team. If we give all this credit to Brooks Robinson, Paul Blair, Mark Belanger and Bobby Grich, then we have to take some credit away from Palmer. Could it be that it was the other way around, that Palmer’s great pitching made those fielders look better by getting batters to hit into easy outs?

It’s not any easy problem to solve, but I made a stab at it. I looked at how many balls in play a pitcher allowed, relative to his teammates, over his career, and regressed this amount by 50 percent when a pitcher has 3,550 balls in play allowed. Regress more if they have fewer balls in play, less if they have pitched more than that.

Palmer was indeed a skilled pitcher, and after regression, he should be expected to allow only 95.4 percent as many hits on balls in play as the average pitcher, assuming an average defensive behind him. This is an outstanding figure, one of the best of all time, and nearly on the level of the career knuckleballers. To adjust for this, when I charge the fractional hits to each fielder, for every hit Palmer allows 1/.954 or 1.048 hits are charged and split among his fielders. For a pitcher who is batting practice, less than 1.00 hits will be charged to the fielders every time he gives up a hit.

8. Another suggestion was to look at the assumption that hits are distributed by batters in a similar pattern to their out distribution. How much the estimates are off can be measured by looking at seasons with complete hit location data (1993-1998).

A Hardball Times Update
Goodbye for now.

We know that ground balls turn into hits at a greater rate than fly balls, so my hit estimates in this system will be lower than the actual number that went through the infield. It’s not a big deal, though, because the underestimate applies to all infielders—their relative position to the league average would not change.

A bigger deal is that fly balls that are pulled are much more likely to become hits than those hit to the opposite field. I tried to make an adjustment by reducing the hit percentage charged relative to outs to the opposite field, with an increase in hit charge percentage for pulled balls.

A simple example: If a right-handed batter hit 50 percent of his outfield outs to left and 50 percent to right, I would charge 60 percent of hits to left and 40 percent to right. Those numbers are merely for illustration, and vary from batter to batter. I scrapped the idea because it did not improve correlation with more advanced fielding measures (and in fact lowered them) and a few of the results did not pass the smell test. In the end I’m using the same estimated hit distribution as in the original data. It may be possible to improve on the estimations, but I wasn’t able to do it.

Does it improve the accuracy?

Sadly, I can’t find any evidence that it does, when testing against 2003-2006 data. The correlations to UZR improve slightly at three positions, and decline slightly at four. I get the same result for correlation with the fan’s scouting report. While I can’t prove statistically that this is any better or worse than the initial dataset, the adjustments I’ve made seem logical and reasonable, so I prefer to use the model that accounts for more information.

At this point I don’t think any more fine-tuning is going to make enough of a difference to justify the time it takes to run this, although perhaps a new approach that I haven’t thought of would give more accurate results.

A bonus: double plays

The double play ratings are much simpler; there are no complex guesstimations of where hits went. Double play opportunities are ground balls, fielded by an infielder, with a runner on first and less than two out. I am not concerned with ground ball hits in these situations, as I’m not trying to re-measure range here, only double play turning.

I take the number of double plays turned in these situations and compare it to an average that is based on which infielder fielded the ball, the year, and the handedness of the batter.

The run value I used is 0.44, which is similar to the value of a caught stealing. An extra out is recorded, and a runner is taken off base. This is the value of a double play above and beyond the value of getting one out. Credit for the double play is shared equally with the player who starts it and the one who turns it. Middle infielders split credit on double plays started by the second baseman or shortstop. On balls hit to third, the third baseman and second baseman, and the shortstop and first baseman split credit when the first baseman starts the double play.

On to the new results

First base: Keith Hernandez (+110) and George Scott (+85) still top the list. Mark Grace (+70), John Olerud and Todd Helton (+69) are recent additions. Mo Vaughn (-69) takes over as the worst defensive first baseman in more than 50 years. Dick Stuart is still worse than Vaughn on a per-season level.

Second base: Frank White (+113) still comes in just a bit ahead of Bill Mazeroski (+110). Mazeroski picks up another 40 runs for his double play turning. Juan Samuel had the worst second base rating (-68) and was also 25 runs below average in turning double plays.

Shortstop: Mark Belanger takes a slight hit (see the Palmer problem above) but is still No. 1 (+232) followed by Ozzie Smith (+167) now that we have his whole career. Ozzie’s TotalZone ratings for later in his career are good but not as outstanding as his zone ratings. Luis Aparicio (+143) is mostly unchanged. Derek Jeter (-92) is now at the bottom, followed by Rafael Ramirez (-66). Ozzie picks up another 23 runs from turning and starting double plays, second only to Cal Ripken (+38). Ripken was +54 in TotalZone.

Third base: Brooks Robinson still dominates. His TotalZone drops from +299 to +269, but he also leads all third basemen with a +26 double play rating. The next best are Buddy Bell (+165), Clete Boyer (+149), Terry Pendleton (+138), Mike Schmidt (+136) and Graig Nettles (+136). Scott Rolen (+98) is well on his way toward joining this group. Dean Palmer (-96) takes over as the worst career defensive third baseman.

Left field: Yastrzemski is still the No. 1 player, but the corrected park factors drop him down significantly, from +115 to +81. The next best left fielders are between +79 and +68: Jose Cruz, Willie Wilson, Roy White, Rickey Henderson and Barry Bonds. At the bottom, Gary Matthews Sr. (-86) falls just below Luzinski (-78). Adam Dunn (-53) and Pat Burrell (-55) are working on catching them. Philadelphia obviously views left field as the NL designated hitter spot.

Center field: Blair (+140) still tops the list, followed by Curt Flood (+115) and Willie Mays (+109). Mays might actually be the real leader; we’ll find out as Retrosheet adds back seasons. If Willie was +10-15 per year for 1951, 1954 and 1955 it’s going to be close. Andruw Jones (+102) has joined them in the +100 club. Matty Alou (-85 before) is now -45. I think he was a victim of missing plays. Rick Monday is at -84, but has been surpassed as the greatest center field defensive liability by Ken Griffey Jr. (-94)

Right field: Roberto Clemente leads at +120, and this doesn’t even include his arm rating. In second is Sammy Sosa +114. These two are followed by Al Kaline (+80), Brian Jordan (+77), Jesse Barfield (+72), Hank Aaron (+68) and Larry Walker (+60). Danny Tartabull (-96) played right field poorly for a few years, and then was moved to a position that better suited his skills, designated hitter.

Sosa was a surprise. I was initially prepared to apologize for the rating. I do not claim to have invented the perfect system. I encourage everyone to use the more precise systems that take full advantage of as much information as possible in rating modern (1987 and later) players. While most of the players at the top of my list were acknowledged to be great defenders, Sammy does not have that reputation.

Sosa, however, was a very underrated defender. From 2000 to 2005, his UZR rating is +19 runs. My estimate has him at +23 for those years. Mitchel Lichtman provided me with Sosa’s UZR ratings per 150 games for 1993 on. They are very close to my figures. Converting the per 150 games figures to actual runs, Sosa is about +69 from 1993 to 1999 in UZR, and +74 in Total Zone! We even get the same defensive peak for Sammy, 1994 to 1996. His UZR for those three years is about +18 per 150 games, and by TotalZone it’s about +20.

More data:

I will provide an updated spreadsheet with all revised ratings from 1956 to 2007, except for 1999, and the double play ratings. In addition, I am working with Sean Foreman to add these ratings to When the data are ready you can access these ratings from player pages, and also see the split data, such as home and away fielding. As another bonus, Sean has obtained play-by-play data for the 1999 season, so you will be able to see the ratings from that year on his site as well. Download the spreadsheet here.

References & Resources
The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at 20 Sunset Road, Newark, Del. 19711.

Comments are closed.