On Players’ and Coaches’ Skepticism of Defensive Metrics
 
Kevin Kiermaier is one of MLB’s best defenders, whether he believes the metrics or not. (via Arturo Pardavila III)
Kevin Pillar led major league center fielders during the 2016 season in defensive runs saved (DRS), Ultimate Zone Rating per 150 games (UZR/150), out of zone plays, all the things that are supposed to tell you in 2016 who’s the best in the field. Advanced defensive metrics are the reason that a former 32nd-round draft pick with a .688 career OPS is regarded, rightfully, as a star.
The proliferation of defensive numbers has made it easier to concretely communicate just how good a player like Pillar is, rather than just saying, “yeah, he’s good out there, but he still doesn’t hit enough.” Because of the defensive aspects incorporated in Wins Above Replacement (WAR), we know that Pillar is a three-win player, and quite valuable to a Blue Jays team that’s made back-to-back trips to the American League Championship Series. The numbers love Pillar.
“I don’t love them back,” Pillar says. “I don’t get them. I don’t get any of the defensive stats that they’re throwing out there except for good play. The eye test. I believe in the eye test. I think a good outfielder or a good defender, you can see with your eyes. You don’t need numbers to tell you how good they are.”
It’s not a new concept that professional baseball players would be averse to fancy stats, and if you talk to enough outfielders about DRS and UZR (Ultimate Zone Rating), there are some common phrases that pop up, starting with an assertion that a player’s only responsibility is to contribute on the field.
“They just put me in the lineup and I play,” says Nationals left fielder Jayson Werth. “I do know that the UZR stat’s convoluted because it’s dependent on a person to collect the data. Then, you know, WAR’s affected by UZR as well – one version, depends on where you get it, I guess. So, I don’t put a lot of weight in UZR and I don’t think it’s that accurate of a stat.”
Because the stats are flawed, players will say that they don’t follow the numbers, but then often have specific examples to cite of players who are done wrong by the numbers.
“I’ve seen the defensive statistics that say that Hechy (Adeiny Hechavarria) is not a good shortstop, and that’s just stupid,” says Marlins outfielder Christian Yelich. “You can’t say that and have any sort of credibility at the same time. I think a lot of sabermetrics and a lot of the numbers don’t tell the whole story. You’ve got to watch the game, as well. You can’t just look at a sheet of paper, look at what it says, and say, ‘This guy’s good, that guy’s not good,’ just based on looking at paper.”
Sticking up for teammates is commonplace, but there are also rivals who are viewed as benchmarks at their positions, and it can be jarring to see them rated poorly by difficult-to-grasp metrics.
“I feel like it can be skewed at times, because there are some guys who don’t fare well with all those [numbers] and I know they’re better outfielders than what these certain metrics say they are,” says Kevin Kiermaier of the Rays, who became aware of DRS when it was constantly pointed out to him in 2015 that his total was soaring to a record. “So, some of the stuff, I don’t really buy into too much, and I don’t know how I feel about it, to be honest. Some things can be accurate, but other parts, you say, how is Adam Jones not in the top five center fielders in baseball? I watch him play 19 times a year, and I think he’s a stud. It just doesn’t make me a firm believer in all the research and the data that’s provided.”
Jones is an interesting case because, when asked about the best defensive outfielders he sees, Boston’s Jackie Bradley Jr. – no slouch himself – names the Orioles’ stalwart first. It’s not a case of recency bias, either, because at the time he’s asked the question, it has been been nearly a month since the Red Sox last saw Baltimore.
“I get the info on that stuff, and I see some where they might rate so-and-so better than another guy and it shocks me, because we play these guys and I’ll just go, ‘There’s no way in hell that’s the case,’ in my mind,” Toronto manager John Gibbons says. “I don’t know how they factor all of it in. I know there’s some things that they calculate. I know, myself, and the coaching staff, being baseball guys for so long, we’ve got a pretty good idea of who’s good out there and who your better players are. So, we look at it, but take it with a grain of salt, too, sometimes.”
It’s good to know when numbers don’t line up with what “baseball guys” with “a pretty good idea of who’s good out there” are seeing. The rise of metrics does not mean that the eye test is to be disregarded entirely, because just as there are things that human observation can miss, there are things that the numbers can miss. A fielder’s decision-making process, for instance, might not show up in his defensive metrics.
“Ball in the gap with a man on first, should you throw to third to try to get a guy, or try to keep a double play in order?” says Boston’s Mookie Betts. “I don’t know if that kind of thing goes into statistics, but pitchers, people in the game, know that’s the game. You keep the double play in order, and the pitcher’s one pitch away from getting out of an inning. Those type of things are more important than some of the statistical stuff.”
It does, however, go both ways. Take the case of Jones and his four Gold Gloves. Jones tallied a -10 DRS in 2016, and ranked 15th in FanGraphs’ defensive rating among 17 qualifying center fielders. The advanced stats haven’t always been friendly to Jones, which points to an overall conflict between analytics and observation, but the fact that the 2016 numbers showed a dramatic slip should raise a red flag. Jones turned 31 in August, and the suggestion that he’s lost a step would come from not only his fielding numbers slipping, but the fact that he’s basically stopped stealing bases and has, over the past few years, taken far fewer extra bases as a runner.
Jones has been a really good center fielder in his career. Gold Gloves are notorious as an indicator, but you don’t win four of them by accident. There comes a point, though, when a player can be established in hearts and minds as something different from what he is becoming or has become in the actual field of play. That’s where statistics can have a kind of utility that an eye test for outfielders will have a harder time catching, because of all the variables that go into outfield play. At the same time, it cuts the other way.
“Here’s where I’m into the numbers – it’s for the guy that’s not so obvious, the guy that hasn’t made his mark yet,” says Cubs manager Joe Maddon. “If you can accumulate enough information about that particular guy, and you can tell me why he’s going to be good in advance of him being good, that’s where I like the number. But I already know who’s good. You know who’s good, he knows who’s good, we all know who’s good. To what extent he’s good, maybe this gives you a little bit greater indicator of that, but a lot of the sabermetric numbers for me are about acquisitions – they’re acquisitional tools that I think unearth the guy that’s been hitting a little bit and hasn’t gotten the big play yet.
“Velocity off the bat is an example of that. A guy’s been hitting into some bad luck, and we take a chance based on the fact that we like this number – although his numbers have not been that good – you can project upon a guy like that.”
Exit velocity for hitters is collected by Statcast, and baseball’s video system has plenty of defensive goodies, too. When the data collected from pinpoint video can be complete and turned into statistics that we don’t yet have, there then may be greater acceptance of defensive metrics among on-field personnel. Kiermaier pointed to a play on which George Springer of the Astros robbed a grand slam but didn’t get four defensive runs saved. There will always be naysayers, but Werth likes that Statcast has baked-in objectivity, while Pillar sees the system grading players for the right things.
“I think Statcast is awesome,” Pillar says. “It’s not a perfect science, but it gives you an idea of a guy’s ability to use instincts and read. Obviously, that’s judged by first-step quickness. Route efficiency is great to tell you how well guys run to the ball. I think those are useful statistics. … I think that’s some of the most important things for being a good outfielder – your ability to get a read, and your ability to run the route. Top speed, obviously, is important, but if I have a quicker first step and run a better route, I’m going to cover more ground than the guy who’s faster. So, those numbers are leading toward better defensive metrics.”
In the meantime, the baseball world works with the numbers it has, because they’re not meaningless. Better defense means more outs, and more outs are good. Defensive runs saved might not be perfect, but six of the top nine teams in the category in 2016 made the playoffs, including the Cubs with 82 – 31 more than AL-leading Houston, a team that missed the postseason by only five games.
So, it’s worth taking seriously. The smartest people and teams in baseball want as much information as they can get, whether it’s spit out by computers or scratched out by pencil on paper. The point is to get as full a picture as possible in order to run your team to the best of its capability.
“There’s obviously some validity to outfield metrics,” says Dodgers manager Dave Roberts, himself a former outfielder. “A little bit of positioning, initial positioning, plays into that, where until you can get exactly the positioning on every ball hit to the outfield, to get the exact kind of metrics. I think it’s a good baseline, but I definitely don’t think it tells the whole story for me.”
Positioning is important enough to Roberts and the Dodgers that on a May trip to New York, there was a controversy about Los Angeles using a laser rangefinder to determine the best spots for outfielders, then marking the field.
“When I come up to hit, guys are standing basically where I hit the ball, my hot spots, and that makes it more difficult to hit,” says Giants center fielder Denard Span. “Defensively, they put you in better positions. … The game has changed. From my rookie year to now, there was no such thing a UZR. Then the WAR started. People weren’t sold on it. Now you see teams swearing by it, and guys are getting paid by their WAR and not their actual numbers like before. I don’t get it, but I don’t get paid to get it.”
Jason Heyward also doesn’t get paid to get it. He gets paid to get the ball, something he did so well that he was a win and a half above replacement level in 2016 despite hitting .230/.306/.325 with a career-low seven home runs.
We know that the numbers say a lot about how high the quality of Heyward’s defense is. What we don’t know is the extent to which a player of Heyward’s caliber influences his teammates’ performances. Dexter Fowler clocked in at -12 DRS as a center fielder in 2015 while primarily playing alongside Jorge Soler and Chris Coghlan. With Soler and Heyward as his primary wingmen in 2016, Fowler was up to +1. While that’s an interesting case, Peter Bourjos went from +4 with the 2014 Cardinals to -7 DRS in 2015, when Heyward played alongside him.
It’s tough to say exactly what the impact level is for each of the variables that swing these numbers from year to year, but what is constant is the idea of trying to get better, however better might be measured.
“Everybody’s talked about Dexter, his numbers being below average,” Maddon said in spring training, before Fowler put up his first positive DRS season since 2010. “I think a lot of that has to do with him playing more shallow, as opposed to deep, as an example. Of course, if you play deeper, he’ll get to more balls over his head, and I think the numbers will come back up, as an example. So, some of that might be positioning. Some of that would be that he feels more comfortable in and we’ve got to get him more deep. I would bet, and I think his numbers did come back up last year as we got him to play more deeply.”
Positioning is a tricky thing, because you’re not going to be able to fulfill the desire expressed by veteran righty Mat Latos: “I would like them to play exactly where the ball’s going to be hit. Normally, that’s where I’d like them to play.” But there’s a give and take to it. A shallow center fielder would figure to save more singles, while a deep position would guard against doubles and triples.
“I’m more of a deep guy, but I’m trying to play more shallow to take away the line-drive base hit, take away the balls that fall in front of us,” says Adam Eaton. “So, I’m trying to make an adjustment there. I think most outfielders are more comfortable keeping everything in front of them, but you want to take away those base hits, help your pitcher out, and if a ball’s hit over your head, you tip your hat.”
You’ve got Eaton coming in, Fowler going out, and both outfielders putting up significantly better defensive numbers in 2016 than they did in 2015 – Eaton in right field after moving over from center, more of a challenge than you might think, because as long-time outfielder Shane Victorino says, “Center field is the easiest read.”
It just goes to show that there’s no set answer on positioning, which by necessity has to factor in the type of contact a team’s pitchers might give up, how much range the infielders have to track down pop-ups, and on and on. It’s not hard to see why baseball has taken so long to even start to quantify defense in a way more advanced than counting plays made and errors, and why it’s not as simple as saying a grand slam-saving catch is worth four defensive runs saved – no matter how intuitive that notion might be.
“I don’t buy any of that crap,” Eaton says. “I think it’s all worthless. I think baseball, you play with your eyes. You’ve played with your eyes for the last 135 years, and now all of a sudden, they want to create some jobs, so we’ve got all these numbers.”
Rest assured, baseball statistics are not made up as a way to create jobs. The point of player evaluation, whether the way it’s been done for more than a century, or in a way that’s newer, is to gain a competitive advantage in building a team.
“You know what?” Fowler says. “This whole metrics stuff is skewed. If anybody asks, just ask the pitcher, ‘Hey, who do you want out there?’ That’s what they need to do, is start asking pitchers.”
Or, ask the outfielders themselves. Says Fowler: “I’m actually really good at it.” Fowler and his now-former manager are not alone in that belief.
“Adam Jones has won consecutive Gold Gloves, before Kevin [Kiermaier] last year, and Kevin Pillar is real good,” Bradley says. “Jacoby [Ellsbury] is really good as well. You have Dexter Fowler who’s also pretty good, Leonys Martin. I could go on and on. The reason why they’re playing is because they play at such high levels.”
The numbers are all over the place with Bradley’s list, but that’s not a bad thing. When you can match superior performance on the eye test to superior stats, you know that someone is indeed playing at such a high level. As far as center fielders go, that’s Pillar and Kiermaier. Not that they care what the numbers say.
“I’m just trying to make plays for my team, my pitchers, and those numbers will be calculated somehow,” Kiermaier says. “I take that information with a grain of salt. I know my defense is elite, and I couldn’t care less what the numbers say about it, to be honest.”
References & Resources
- FanGraphs
- Baseball-Reference

Nice piece. It is so problematic how few non-routine plays OF’s get in a given season which make or break their defensive stats.
I like Yelich’s point about robbing a potential GS not being 4 defensive runs saved. A model where the likely runs created by each opportunity (if missed) summed up and compared to the expected runs saved by the plays actually made would be interesting.
There’s also a hand/eye coordination aspect that gets underrated by metrics which focus mostly on range or player footsore/acceleration. There was an excellent piece on Odubel Herrera at FG by Eno Sardis yesterday chronicling several plays where he gets there but just can’t get his glove on the ball. Perhaps it is in this aspect that Adam Jones is exceptional but getting underrated by the numbers as measured/intuited.
*footspeed
*Sarris
I wish this site had an edit function on the comments.
I’ve gotta say that the “why isn’t it 4 DRS” comment struck me as silly because it is a criticism that shows the guy didn’t understand what the stat is supposed to say. It is comparing to an average fielder’s runs saved. If an average fielder *never* makes the play, then great, have your 4 runs. Otherwise, it won’t be worth 4 runs since some of the time an average outfielder will rob a home run…and there are some home runs that average major leaguers will almost always rob if they are hit high, etc.
It’s more than that. The stat is supposed to be context-independent, which means robbing a hitter of a GS is worth no more than robbing a hitter of a solo shot—just as hitting a GS produces no more offensive value than hitting a solo shot. A fielder doesn’t control how many men are on base when he makes that catch, just as a hitter doesn’t when he hits a HR.
A HR on average is worth about 1.4 runs, a value that varies a little from season to season. If an OF robs a batter of a HR–no matter how many men are on base–he starts with saving 1.4 runs. How much he actually gets credit for is then determined by how likely it’s considered an average fielder would have made that catch.
I should add that “average” actually comes into play twice in this calculation. When it’s said that a HR is worth 1.4 runs, that is runs above average, i.e., runs above what a batter on average would do in a random PA. Then, using that value, one has to determine how often that catch is made on average.
It seems silly to even assign value based on the offensive value of the HR. Robbing a HR is great, but it isn’t necessarily more impressive than any other OF play that required good read/speed/jump even if just to turn a shallow foul pop up with nobody on base into an out. Furthermore, some parks (Wrigley) or segments of OFs (Green Monster)completely take away the OF ability to rob a HR.
It’s actually a proven fact that the more runners on base the harder it is to catch a hard hit ball above the fence line. 😉
“I don’t get them. I don’t get any of the defensive stats that they’re throwing out there except for good play. The eye test. I believe in the eye test. I think a good outfielder or a good defender, you can see with your eyes. You don’t need numbers to tell you how good they are.”
But isn’t that like me saying, “I don’t need 0-60 or quarter mile times to tell if a car is fast”. Sure, run a Corvette Stingray and a Dodge Charger SRT392 side-by-side and everyone can tell the Vette is faster.
The former gets to 60 in about 3.7 seconds and finished the quarter mile in 12.0 seconds and the latter 4.2s and 12.6 seconds respectively.
But if you aren’t seeing them head-to-head, I doubt there are too many who’d be able to tell the difference. Even swap in the much slower Charger R/T [5.1s and 13.6s] and without them side-by-side [or even the same day or same conditions] I’m guessing 99% of the population wouldn’t be able to tell the difference.
The eye test apparently can’t tell that Rafael Palmiero isn’t a Gold Glover at Firstbase in a season where he was primarily a DH…
Very interesting piece that captures the tension between the eye test vs. sabermetrics. For me, the two most telling phrases in this article are: “the stats are flawed” and “statistics can have a kind of utility”; what they say to me is: the metrics are just one of many tools to measure baseball performance, and they’re not perfect. I have always had a serious problem with privileging flawed analytics over “baseball guys” who “have a pretty good idea of who’s good out there” (not to mention an issue with sabermetricians putting scare quotes around those two phrases). As the author suggests here, you need both in order to succeed.
That’s true right now, at the current state of defensive stats. It may or may not be true in the future, depending on how much defensive stats improve.
For comparison, look at offensive stats, particularly just batting. One really doesn’t need an eye test there, the stats are good enough to evaluate a hitter quite precisely, given a large enough sample size. If defensive stats were ever to become as accurate and reliable as offensive stats, then they would and should replace the eye test.
Only in MLB can we see awards for defense be based almost entirely on offense, and yes, you can win Gold Gloves on accident, because they aren’t based on defense, they are based on offense totals.
I’m not fond of the way Baseball Info Solutions does its Defensive Runs Saved evaluations, mainly because I don’t think they have enough eyes on each play. I asked John Dewan (head of BIS) how many people he had evaluate each play, and he barked, “Fifty!” and walked away.
I think the big disconnect for the players and fans when judging defense is that the baseline for CF or SS is ridiculously high. Almost everyone passes the eye test. The stats are trying to separate great from very good and the dynamic range is narrow. The best CF is truly special, while the 15th best is still very very good at playing CF. Combine that lack of understanding with the fact that few opportunities dictate the stat outcome and you have your ammunition to hate UZR or DRS.
MGL himself went on a solid rant against his own method (UZR) a few months ago:
https://mglbaseball.com/2016/03/04/how-important-is-bayes-in-advanced-defensive-metrics/
“I like Yelich’s point about robbing a potential GS not being 4 defensive runs saved. A model where the likely runs created by each opportunity (if missed) summed up and compared to the expected runs saved by the plays actually made would be interesting.”
UZR and DRS and most of the defensive metrics based on PBP data are designed to estimate “context-neutral” fielding skill just like metrics like wOBA (or RC) and FIP. In other words, you don’t get more credit for catching a ball with the bases loaded than if there were no one on base. It doesn’t have to be that way but it is.
The reason it is that way is because these metrics are designed to be used to isolate the value of a player’s skill and we know from extensive research that players don’t vary much at all in their abilities to catch balls or hit balls, depending on how many runners are on base (or score, etc.). So giving a player credit for 4 runs when he saves a GS HR (assuming that the average fielder NEVER makes the play) is just not the way it is done because it will introduce a lot of noise to the metric, at least relative to the question of skill. It’s not “noise” if your question is different.
If you wanted to use these metrics for awards or “pats on the back,” then MAYBE you would design them to include context (like who is on base or even the score – since a game-saving HR catch is worth more “pats on the back” than a HR catch on a 16-0 game), but as I said, that’s not generally what we design these metrics for. WPA is a good offensive stat if you want to add in a little context. We could do something similar with defense.
To be honest we (at least I do) mostly design these metrics so we can use them for projections. And including context that is not related to skill is not a good way to do that since we know from extensive research that performance that includes most of that context (like runners on base) is not repeatable.
For example if player A played exactly the same defense as player B, but player A made his catches with lots of runners on base and with the game on the line, and player B didn’t, you might want to create a metric that gave player A more credit – I have no problem with that. But you have to tell me what you are using that metric for. As I said, if you want to use that metric to describe skill or you want to use it to make inferences about future defensive performance, you are much better off not including context. Player B in that example will likely perform exactly as well as player A in the future, so including context in a metric would give you a terrible projection for those players relative to each other.
So it’s not “bad” or “good” whether and how much you want to include context that is not related to a player’s skill. It depends on what you want to use those metrics for and what questions you want to answer with their assistance. Most of the time, the inventors of these metrics are trying to answer questions about skill and not trying to represent or reflect “pat on the back” type value, so we usually choose to remove as much context as possible.
Getting back to the gist of the article, we’ve said a million times (give or take about a million) that the best evaluations of anything are some combination of a “metric” and “scouting” or “observation.” Both are based on data and data is data. The important thing is simply to make sure that your data is as accurate as possible and that it is used and interpreted as well as possible given exactly what you are trying to answer. Choosing one over the other is a false dichotomy. All of the good teams use both. And they all recognize that both can and do have weaknesses and biases and the key is to figure out what they are and how to account for them as best as possible.
What this article illustrates about players is what we already know. Players don’t understand how metrics are created, what they mean, or how they are used in the grand scheme of evaluating players and making decisions. Some more than others but on the whole they are ill-informed. Which is to be expected. Like anything else, it takes aptitude, study, desire and practice to be fluent in sabmermetrics. That is not the players’ domain nor should it be.
They do of course elicit some degree of foolishness in opining about something they know little to nothing about, but then again, that kind of foolishness is shared by most people and it is not exactly an important trait or one that is selected for, in order to be a good baseball player.
I don’t really get the point of the article. Your “sample size” is limited to quotes you choose to publish with a small amount of coaches and players. But it is your article. I say of course they prefer the eye test, They’re Jocks for crying out loud! It not a bad thing or a good thing, its just a thing. When or if UZR fits into the working man’s vocabulary, is the day that it will be applied to a man’s pay scale. And I don’t believe WAR will be considered a common man’s stat until people get a better laymans grasp of defensive metrics. In other words, simple to understand and use.
If you really want to sell someone skeptical, like me, on the value of defensive metrics you would really need to show me so I can verify that it makes sense. Show me the video for all the plays that a defender is involved in for a week and show me how the metric values each of those. If the value does not seem obvious, explain why it is different. Then compare it to someone else at the position and show me why that player is worth more or less than the other one.
The thing about offensive statistics is I can understand it intuitively. A single is a single. A home run is a home run. Yes, there can be scoring anomalies where there is a debate on a hit or error. Park factors come into play. But most of the counting stats here are objective and easy to understand. Fielding metrics do not have that advantage. Since I don’t understand the logic of fielding metrics all that well, it brings the metrics into question. How much is science and how much is magic?
The fact that different metrics can, at times, produce wildly different results does not help either.
In some ways, DRS is its own worst enemy because it hasn’t really solved the scaling problem. Nobody doubts that Mookie Betts is an exceptional right fielder, but only John Dewan and his employees believe that he really reduced the Red Sox’ runs-against by 32 as against an average performer. That damages the credibility of what seems to be a pretty sound methodology.
In the stat community we tend to downplay the manager, but in fact positioning is an aspect of coaching. We’ve all heard about Andrew McCutchen and how the Pirates positioned him a certain way this year which lead to poor metrics. I think next level stats will need to take this in to account. If he was positioned poorly, that is on the coaching staff and should be used to evaluate THEM. Just like you don’t blame a great hitter who is directed to sacrifice with a man on first and one out. I think coaching staff evaluations in the future will help us pinpoint whether managers and coaches put their players in the best positions to succeed – whether it is bullpen usage, positioning, platooning, etc.
I don’t really care what players think about advanced stats because many, if not most, buy into the “he just knows how to win” meme.