Comparing Rookies to Veterans with Empirical Bayesian Analysis

by Jimbob Sweatypants

It’s no surprise Aaron Judge’s barrel rate is higher than Miguel Andujar’s.

Yankees fans, how much would you bet that Miguel Andujar is a better hitter than AL Rookie of the Year Aaron Judge? Why are you laughing? If you go by his rate of Barrels per plate appearance, Andujar is the superior slugger. His career Barrel rate of 25 percent far exceeds Judge’s 11.9 percent Barrel rate. Why not bet on Andujar?

Because Andujar has a mere eight big-league plate appearances and Judge has 773—that’s why. Andujar’s smaller sample size means we can’t trust his on-field performance as much as we can Judge’s. In fact, you may already have verbalized at this point in reading this article, eight PA is such a small sample size as to be absolutely meaningless.

Perhaps. But talent evaluators, fans, and analysts face this problem all the time. If it isn’t about Andujar, it’s about another high-profile rookie like Cody Bellinger, Rhys Hoskins or Matt Olson. How are the Cubs supposed to judge Ian Happ’s talents or the Orioles Trey Mancini’s?

Most fans and analysts will say you can’t make any conclusions about these players’ true talents until they reach some sample size of PA. Let’s say it’s 350 PA, something like a half a season at the plate. This seems reasonable. Now we can analyze Judge, Happ and Mancini. But we’re left with a few problems:

We still can’t analyze promising hitters like Hoskins and Olson.
Our study is biased. Good players play more often, so studying players who exceed some PA threshold means we’re likely studying good players, or at least players with good on-field results.
Even within our sample, comparing Happ’s 413 PA to Judge’s 773 seems less than ideal. That’s quite a gap in PA.
Judging these players by their on-field results ignores the probability they’d achieve these results in the first place.

What do I mean by that last bit? Even though we’ve observed Andujar with a Barrel rate of 25 percent, no player with at least 650 PA in a season has recorded a single-season Barrel rate higher than 12.8 percent (Judge, 2017). Over a three-year period of at least 1,800 PA, the highest observed Barrel rate rate belongs to Nelson Cruz at 9.6 percent.

These facts mean Andujar is unlikely to sustain a 25 percent rate for 650 PA and even less likely to sustain it for 1,800. Despite observing his on-field Barrel rate of 25 percent, we should be confident his true-talent rate is lower.

That’s a lot of problems to encounter. How do you compare players with different amounts of playing time to each other? How do you account for the rarity of a feat when analyzing a player’s on-field accomplishments? How do you estimate a player’s true talent based on what he did on the field?

An answer lies in empirical Bayesian analysis. This involves three steps:

Use data on hand to build a model that establishes prior beliefs about a player.
Observe the player.
Use Bayes’ theorem to define a probability distribution that combines our prior beliefs with the observations we make. Because this distribution comes after our analysis, we call it a posterior distribution. At first, this distribution will resemble the model. Over time, as a player demonstrates on-field results, this distribution will take the shape of the player’s true talent.

Because we’re talking in probabilities, and because we’re using a model, we can compare players to each other whether they’ve stepped to the plate three times or 3,000. I prefer Bayes’ theorem because it operates the way humans operate. We all have some kind of prior belief about a player’s abilities, and as a player accumulates plate appearances, we all continually update those beliefs. Over time, we use these beliefs to become confident about who a player “is.” And whether we know it or not, we all have some range of uncertainty around our beliefs and predictions. Empirical Bayesian analysis quantifies this process and gives it structure.

This article uses empirical Bayesian analysis to compare rookies not only to each other but also to established major leaguers. Note that I won’t go into any mathematical details. For those specifics, I encourage you to read this book by David Robinson, from which I’ve adapted the methods and examples used here.

Before we get to the process, let’s talk about Barrels.

Measuring Hitting Talent with Barrels

For a hitter, what’s the best outcome of a plate appearance he can influence the most? Something like, “Hit the ball hard and on a line.” That’s what a Barrel is, a ball that, based on its exit velocity and launch angle off the bat, has the following characteristics:

Results in a hit more than half the time
Results in a slugging percentage of at least 1.500

Not all Barrels are hits, but that’s because parks are different sizes, wind changes direction, and defenders patrol the field. Should we penalize hitters for hitting a ball at 105 mph but right at the shortstop or into a stiff breeze blowing in? I don’t think so, and I think many baseball fans and analysts would agree.

But we can’t just count the number of Barrels for each hitter, because hitters have different amounts of playing time. We need a rate metric. The Statcast leaderboard provides two to choose from: Barrels/PA and Barrels/Batted Ball Event (BBE). Each answers a different question:

Barrels/PA answers the question: When this hitter steps to the plate, what are the chances this plate appearance will result in a Barrel?
Barrels/BBE answers the question: When this hitter makes contact, what are the chances that contact will result in a Barrel?

If we use the latter stat, we’re left with other questions: How often does this hitter swing, and how often does this hitter make contact? Barrels/PA renders these questions irrelevant, letting us compare strikeout-prone sluggers with contact hitters and everyone in between.

Andujar vs. Judge: Defining Prior Expectations

Let’s return to our core problem, comparing Miguel Andujar to Aaron Judge based on their observed Barrel rates. To do this, we must build a model. Managers control players’ playing time, and managers are more likely to give playing time to players who exhibit a high Barrel rate. Thus, a good prior guess about Barrel rate must incorporate playing time in the form of PA.

To define priors, I used beta-binomial regression involving the log of the player’s total PA. The result is a bunch of probability distributions, each one telling us the player’s expected Barrel rate if we knew only his PA. The following graph shows the distributions at 10, 100, and 1,000 PA.

The distributions move to the right, showing that the more PA a player has accrued, the higher his likely Barrel rate is. Takeaways from this graph:

If we know only that a hitter has 10 PA, we should expect that hitter to Barrel .001 balls per PA. That’s where the probability distribution peaks—a Barrel rate of 0.1 percent.
If we know a hitter has 100 PA, we should expect a 1.5 percent Barrel rate from that guy.
If a hitter’s good enough to reach the 1,000 PA threshold, we should expect he has a 2.9 percent Barrel rate.

This graph shows that managers do indeed allocate more playing time to better hitters, which is why each player needs his own prior guess.

Observing Results

This part is easy. I already mentioned Andujar’s two Barrels in eight PA and Judge’s 92 Barrels in 773 PA. The data are available on the Statcast leaderboard. Done!

Computing Posterior Distributions

Let’s start with Andujar. The following graph shows two distributions.

The prior distribution, in red, is our prior expectation of his Barrel rate, given his playing time.
The posterior distribution, in blue, is our best guess of his true-talent Barrel rate right now, given both our prior expectations and the two Barrels in eight PA we’ve observed from him.

Since Andujar has only eight PA, his prior distribution looks a lot like the one for a player with 10 PA, shown in the graph above this one. Having observed two barrels in eight PA, his posterior distribution has the following characteristics:

It ranges from about 0.01 Barrels/PA to 0.09, showing the range of probable Barrel rates from about one percent to nine percent. For context: a one-percent Barrel rate is like Erick Aybar; a nine-percent Barrel rate is like Mike Trout.
The probability density at a nine-percent Barrel rate is quite low, indicating we shouldn’t bet a lot of money that Andujar hits like the best all-around player in baseball.
It peaks at 0.034, or a Barrel rate of 3.4 percent. This peak is Andujar’s most likely true-talent Barrel rate.
Its density is lower than the density of his prior curve. This lower height is expected, since we have many PA from players in the prior distribution, but only eight PA from Andujar to form the posterior distribution.

Andujar may not sustain a 25-percent Barrel rate, but his estimated true-talent rate of 3.4 percent is good. Having run the analysis above for 854 hitters, this rate ranks at the 57th percentile. Yankees fans should feel optimistic about his future.

Now let’s compare him to his teammate:

Judge’s estimated Barrel rate is 11.1 percent, nearly quadruple that of his teammate’s. We’re now very sure he’s the better hitter.

Think about what we’ve done here. We’ve taken a player with eight career PA and not only estimated his true talent, but also meaningfully compared him to a player with almost 100 times as much playing time. Makes you re-think the whole “wait until 500 PA” approach, doesn’t it?

Credible Intervals

When characterizing the posterior distributions above, I used only their peak values. But examining the entire curve is useful, because hitters’ true talents lie across a range of possibilities. Bayesians speak of the 95-percent credible interval, or the interval in which we’re 95 percent certain the true value lies.

The following graph is the same as the one above but with 95 percent credible intervals added:

Andujar’s 95 percent credible interval lies between 0.007 and 0.08. Judge’s 95 percent credible interval lies between 0.09 and 0.13. Judge’s credible interval is narrower because he has nearly 100 times the PA Andujar does. The larger number of PA means we’re more certain of his true talent than we are of Andujar’s.

When comparing many players to each other, graphs like the above get crowded. We express our best guess of players’ abilities in the following plots.

Each dot represents the peak of each player’s Barrel rate distribution, each bar the 95-percent credible interval.

Comparing the 2017 Rookies

We’re ready to compare all rookies’ Barrel rates, despite their wide disparity in PA. Here are the top 25 rookies, with at least three PA in 2017, by Barrel rate:

Judge leads the pack here, followed by White Sox slugger Matt Davidson. Seeing Davidson here allows me to bring up a good point: Barrel rate isn’t everything. Davidson hits the ball hard but this year featured a poor 4.3 percent walk rate and sky-high 37.2 percent K rate. The power on contact is there, but the guy needs to work at recognizing pitches.

Rounding out the top three is fellow Matt, Matt Olson of the Oakland A’s. Olson took the baseball world by storm in the second half, bashing 24 home runs in 219 trips to the plate. Unlike Davidson, Olson’s 10.2 percent walk rate and 27.8 percent K rate indicate a good understanding of the strike zone. A’s fans should be pleased to have not only him, but also Matt Chapman and Chad Pinder, who also appear on this list.

The rest of this leaderboard includes the guys you’d expect to see. Ian Happ and Trey Mancini appear, followed by NL Rookie of the Year Cody Bellinger. Further down the list, Cardinals fans can take heart in seeing Jose Martinez and Paul DeJong. And Rockies fans may not have been excited about Tom Murphy’s .077 wOBA this year, but Statcast thinks he has above-average bat-to-ball skills.

Is Aaron Judge a Better Hitter than Matt Davidson?

In the graph above, Matt Davidson’s 95-percent credible interval overlaps Judge’s. This overlap means there’s a non-zero probability the two hitters have an equivalent true-talent Barrel rate. What’s the probability Judge’s Barrel rate exceeds Davidson’s?

Empirical Bayesian analysis answers this question with an A/B test. We’ll show the two players’ distributions and calculate the probability that values of Judge’s curve are higher than those of Davidson’s.

The following graph unflattens each distribution and labels the areas of interest:

There is no chance Davidson has the higher Barrel rate, a small chance they have the same one, and a large chance Judge’s Barrel rate is higher. We can measure the latter portion in two ways and see if they agree.

Simulation

I drew a million values from both players’ distributions, which is similar to simulating a million seasons for each. In 97.3 percent of the simulations, Judge had the higher Barrel rate. That’s a strong vote in favor of Judge being the better hitter.

Numerical Integration

When you multiply the two posterior distributions together, they form a joint distribution that you can imagine as a density cloud. The following graph shows the majority of this cloud favors Judge:

Darker red indicates higher probability. How much of the cloud favors Judge? Using numerical integration, the answer is 97.1 percent. That’s close enough to our simulated result for me to be confident Judge’s true-talent Barrel rate is higher than Davidson’s.

Comparing Rookies to Veterans

We’ve compared a player with 773 career PA (Judge) to one with eight career PA (Andujar) and another with 443 career PA (Davidson). Clearly, playing time isn’t a limiting factor with empirical Bayesian analysis. What’s stopping us from comparing rookies with veterans who have 2,000 PA between 2015 and 2017? Nothing.

The following graph shows the top estimated Barrel rates of all hitters who’ve recorded at least three plate appearances from 2015 to 2017.

With an estimated Barrel rate of 11.1 percent, Judge may be not only the best-hitting rookie, but also the best-hitting active player. The guy to whom he’s often compared, Giancarlo Stanton, ranks second in estimated Barrel rate with 10.8 percent. Remember that we’re using Barrels/PA, which accounts for the fact that these guys strike out and/or walk a lot.

The leaderboard shows other familiar names. J.D. Martin ez is still underrated, in my opinion, despite hitting Barrels nearly as often as Stanton and Judge. These three are the top tier of pure hitting talent right now.

Other names in this list remind us Barrel rate isn’t everything. Chris Carter, Pedro Alvarez, and Ryan Howard can’t crack the major leagues any more. Chris Davis and Kyle Schwarber go through long stretches of futility. But overall, this is a list of the best pure hitters in baseball.

Barrel Rate Derby: Stanton vs. Judge

Prior to the 2017 Home Run Derby, fans looked forward to a Stanton vs. Judge showdown. Judge won, but when the game’s played between the lines, the question remains: Who is the better hitter? Who hits the ball hard, and on a line, more frequently?

As shown above, we can approximate this answer by using an A/B test. Let’s show the hitters’ Barrel rate distributions.

Judge faces stiffer competition here than he did with Davidson. A large portion of his Barrel rate distribution overlaps with Stanton’s. Simulating a million seasons of each player reveals that 60 percent of Judge’s seasons exceed Stanton’s. We’re less sure about this matchup than we are about Judge vs. Davidson, but the odds still favor the Yankee.

Numerical integration gives a similar result, as 58 percent of the joint distribution favors Judge.

Pretty impressive for a rookie, no? Right now, Judge is a safe bet as the better hitter. And to think, without empirical Bayesian analysis, we’d be left saying things like, “You can’t compare a rookie to an eight-year veteran!”

Conclusion

This analysis is more descriptive than it is predictive. It answers the question, “Given our previous expectations about each hitter, and given what we’ve seen them do, whose true-talent Barrel rate is the highest?” This is a great starting point for a 2018 prediction, but it isn’t one. For example, this analysis doesn’t factor in age.

But while this analysis makes no predictions, we can update players’ posterior distributions as the 2018 season unfolds. If Stanton starts the season with zero Barrels in his first five PA, or if Judge starts the season with three in his first 50, we can use these new data points as signal. Empirical Bayesian analysis properly weights these PA against the hundreds or thousands each player already has provided.

Sorry, Miguel Andujar. You outperformed Aaron Judge this season, but the AL Rookie of the Year is the better hitter. The good news is that 2018 brings a new chance to prove yourself. Every time you step to the plate, Bayes’ theorem will be there, helping us understand your true talents.

References & Resources

MLB.com Glossary, “Barrel”
Wikipedia, “Bayes’ theorem”
David Robinson, Introduction to Empirical Bayes: Examples from Baseball Statistics
Baseball Savant, Statcast Leaderboard
Grant Brisbee, SB Nation, “Giancarlo Stanton and Aaron Judge is the Home Run Derby final the universe deserves”

BAL	CHW	ATH
BOS	CLE	HOU
NYY	DET	LAA
TBR	KCR	SEA
TOR	MIN	TEX

ATL	CHC	ARI
MIA	CIN	COL
NYM	MIL	LAD
PHI	PIT	SDP
WSN	STL	SFG