Predicting Today’s Hall of Fame Voting Results

Mariano Rivera is projected to be a near-unanimous inductee to the National Baseball Hall of Fame. (via slgckgc)

This year’s Hall of Fame ballot is an emotional one. Edgar Martinez is in his final year of eligibility. He’s this year’s version of Tim Raines and Bert Blyleven—guys whose voting totals started out small but who slowly gained voting share and were elected toward the end of their times on the ballots. Fueling the emotion is the fact that this is Edgar’s last chance; if he doesn’t make it in this year, he’ll have to wait until the next committee cycle.

Emotion also surrounds legendary Yankees closer Mariano Rivera. Rivera checks all the boxes: all-time leader in saves, dominant closer, stellar postseason numbers, 13-time All-Star, five World Series rings, played for only one team his entire career…you name it. He’s a class act all the way, one of the most beloved and respected players ever. And that’s even before you get to the fact that he did it all with just one pitch he happened to discover while fooling around in the bullpen. (Which just confirms we know absolutely nothing about baseball.) The question is not whether he’ll be inducted, it’s whether he’ll be inducted unanimously.  

But Roy Halladay’s name creates the most sentiment on this ballot. In a relatively short but high-peak career, Halladay won two Cy Young Awards, pitched a perfect game, and threw a postseason no-hitter. One of the best pitchers of his era, fans knew him as a focused, intense workhorse. But he could also connect with them as people and had a great sense of humor. He likely would have been inducted on his merits alone, but his death in November of 2017 increased the emotion around his candidacy. Premature deaths have a way of connecting directly to our emotional cores as human beings; Halladay’s certainly did.  

Will any of these guys make it in? And what about guys like Barry Bonds and Roger Clemens, historically dominant players whose legacies many believe are tainted by performance-enhancing drugs? What about all-around great Scott Rolen or the idiosyncratic Manny Ramirez? Will voters continue to penalize Larry Walker for his time spent in Coors Field?

We have some clues as to what will happen today. Ryan Thibodaux and his fine team of folks at bbhoftracker.com scour the Internet for Hall of Fame ballots and post the results on their site for all to see. The problem is that through Monday, only 217 ballots are known, approximately 53 percent of the total. Basing conclusions off this limited data set is fraught with peril. 

Or is it? Bayesian analysis provides a way to estimate results based on incomplete data sets like these. The technique combines observed data with prior expectations to produce a posterior probability distribution. As you observe more data, you update the posterior distribution. In this way, predicting Hall of Fame voting results is like watching a player hit—you start with how good you think he is and update that expectation as you watch him perform at the plate.

Let’s use Walker as an example. At the time of this writing, he’s tracking at 65.4 percent. How likely is it he will finish with this vote share?

The first thing you need as a Bayesian is a sense of your prior expectations. Many techniques exist to model prior expectations, but for this task I used beta binomial regression with empirical data. From the perspective of a player, Hall of Fame voting is like taking a test with over 400 “yes/no” questions. Each player goes to each voter and asks, “Would you vote for me?”

This voting is reasonably independent in the statistical sense; that is, one voter’s answer doesn’t really affect another’s. I’m sure voters talk to each other and influence each other; we’re all connected these days. But voters have to make up their own minds about whom they support and whom they don’t.    

For players returning to the Hall of Fame ballot, I found their vote share in a given year relates heavily to their vote share the year before:

This relationship holds true whether the player is a position player, starting pitcher, or reliever.

I was surprised at this simplicity until I dug further. It turns out voters, as a group, don’t change their minds much year to year:

So I used beta-binomial regression with Walker’s 2018 vote share to produce the following graph of our prior expectations of his 2019 vote share:

The lines at 0.05 and 0.75 show the elimination and induction thresholds, respectively. Recall that Walker garnered 34.1 percent of the vote last year. So it’s not surprising the curve for our prior expectations peaks at 0.334 or 33.4 percent. If we had no more information, we’d expect Walker to end up with this vote share in 2019.

The Art of the Changeup
Developing the perfect changeup is an art form. Trevor Bauer, Joey Cantillo, and Trevor Richards are trying to become masters of it.

But we do have more information! Through 217 ballots, Walker has tallied 142 votes. I combined this information with the prior distribution to produce Walker’s posterior distribution. The following graph shows both distributions:

The posterior distribution shows the range of probable vote share results, given both our prior expectations and the information we’ve observed. It ranges from 0.508 to 0.694; Walker’s estimated vote total lies where the distribution peaks, at 0.604, or 60.4 percent of the vote.

This graph shows another benefit of Bayesian analysis. Incorporating our prior expectations helps account for the fact that, in the past, unrevealed ballots have hurt Walker. Unrevealed ballots in 2018 pulled Walker’s vote share down from 37.5 percent to 34.3. Basing our prior expectations on last year’s final results prevents us from being overly optimistic about Walker’s final vote share this year. 

The following graph puts Walkers’ chances in context of other returning players’:

Showing posterior distributions for all 35 players will get crowded and confusing. So after modeling the chances for all returning players in addition to first-year ones (using their Hall of Fame Monitor scores listed on baseball-reference.com), I collapsed everyone’s curves into horizontal bars.

The points represent the most likely vote shares; the bars’ ends represent the 95 percent credible intervals.

In a surprise to no one, Rivera looks like a lock for induction. To the joy of many, Halladay, Martinez, and even Mike Mussina look like sure bets to get inducted. There’s a small chance Clemens, Bonds, and Curt Schilling get in, too. 

But how small a chance? Bayesian analysis gives us a range of probable vote shares, not just a single one. To find out, I calculated each player’s chance of induction and elimination. I defined their chance of induction as the posterior inclusion probability of 75 percent; this is the probability of their posterior distribution exceeding 75 percent, which is of course the threshold for Hall of Fame induction. I used a similar method to estimate the chance a player would be eliminated.

Projected 2019 Hall of Fame Voting
Name Estimated Vote Share (%) Chance of Induction (%) Chance of Elimination (%)
Mariano Rivera 99.8 100 0
Roy Halladay 92.3 100 0
Edgar Martinez 88.7 100 0
Mike Mussina 79.8 96.9 0
Roger Clemens 70.5 5.3 0
Barry Bonds 69.1 1.8 0
Curt Schilling 68 0.6 0
Larry Walker 60.4 0 0
Omar Vizquel 37.3 0 0
Fred McGriff 35.5 0 0
Manny Ramirez 23.7 0 0
Todd Helton 18.9 0 0
Scott Rolen 18.9 0 0
Jeff Kent 16.6 0 0
Billy Wagner 15.4 0 0
Gary Sheffield 13.1 0 0
Sammy Sosa 10.9 0 0
Andruw Jones 8.4 0 0.9
Andy Pettitte 6.9 0 10.7
Michael Young 1.8 0 99.5
Lance Berkman 0.3 0 100
Miguel Tejada 0.3 0 100
Roy Oswalt 0.1 0 100
Juan Pierre 0.1 0 100
Jason Bay 0.1 0 100
Freddy Garcia 0.1 0 100
Travis Hafner 0.1 0 100
Derek Lowe 0.1 0 100
Placido Polanco 0.1 0 100
Vernon Wells 0.1 0 100
Kevin Youkilis 0.1 0 100
Rick Ankiel 0 0 100
Jon Garland 0 0 100
Ted Lilly 0 0 100
Darren Oliver 0 0 100
SOURCE: baseball-reference, bbhoftracker.com

(Note that none of the chances are actually 100 percent or zero percent; I’ve rounded here to simplify the presentation. Also note this doesn’t account for the fact that McGriff will be eliminated due to his time being up, not his vote share falling below five percent.)

Bonds, Clemens, and Schilling probably won’t make it this year, but they’ll come close enough that induction in 2020 or 2021 is a good bet. Of course, next year’s ballot includes Derek Jeter, who is a lock for induction. His presence will take up a check box that counts toward the 10-player limit, and his squeaky-clean image could subconsciously give voters qualms about voting for Bonds, Clemens, or Schilling.  

Walker is a longer shot since he has only one year left after this one. But if he ends up in the 60 percent range this year, a bump to 75 percent next year isn’t unprecedented, especially with the changing voter mix. It could happen.

But barring any major surprises, Martinez can relax. His induction will please fans who’ve argued his merits for years. Ditto with people wondering about Rivera, as if there was ever any doubt he would be inducted on the first ballot. The only question now is whether he will get in unanimously or, failing that, beat Ken Griffey Jr’s record-setting vote share of 99.3 percent. (Based on the above analysis, he has a nine percent chance of doing so.) Halladay’s likely induction will warm the hearts of many, including perhaps the man who taught him his cutter. The outpouring of grief over his death should find home in Cooperstown.

References & Resources

  • Thanks to the folks behind bbhoftracker.com for working hard to to provide us with all this data.


Ryan enjoys characterizing that elusive line between luck and skill in baseball. For more, subscribe to his articles and follow him on Twitter.
newest oldest most voted
Eric Robinson
Member
Member

Good job!

Fredchuckdave
Member

Bainesian

knebelski
Member
knebelski

Wow! You nailed it.