‘Unskewing’ the Polls in the Hall of Fame Election by Nathaniel Rakich January 4, 2017 Vladimir Guerrero is right on the cusp of being inducted in his first year of eligibility. (via Keith Allison) By now, you’ve probably heard of Ryan Thibodaux’s BBHOF Tracker. The Tracker (and its predecessor at Baseball Think Factory, the Gizmo) has changed the way fans follow the annual election to the Baseball Hall of Fame. Instead of guessing at the final vote totals based on the imperfect baselines of the previous year’s results, Thibodaux compiles actual, preliminary results from this year by scouring Twitter for voters who have already shared their ballots with the general public. In essence, the Tracker provides a real-time exit poll of the Hall of Fame electorate, the 400-plus eligible writers of the Baseball Writers’ Association of America (BBWAA). However, as you may have heard somewhere recently, polls can be wrong. The BBHOF Tracker pointedly is not to be taken literally as a clone of the eventual results; it is merely a snapshot in time of one fraction of the electorate. To make actual predictions, you need to pull a Nate Silver and develop a model that smooths out the polls’ inherent error and separates the signal from the noise. For five years running now, I’ve employed such a model to predict Hall of Fame election results based on Thibodaux’s “polling.” Last year, the model correctly predicted every candidate’s vote totals within 3.5 percentage points or fewer; its average error was only 1.5 points. Most importantly, it successfully eliminated many of the red herrings in the raw polling data, where historical errors of eight or more points are not uncommon. This year, I’m bringing the same model—explained in excruciating detail below—back to bear on the 2017 election. And its preliminary forecast is great news for those who believe the 10-vote limit has created a harmful backlog of deserving players on the ballot. The 163 ballots collected in the BBHOF Tracker as of Jan. 3 augur a record-tying year for the Hall. Although the numbers below can and will change, as of this writing, the model forecasts that five players—the biggest class since the very first one in 1936—will be elected to the Baseball Hall of Fame: Jeff Bagwell, Tim Raines, Iván Rodríguez, Trevor Hoffman and Vladimir Guerrero. Almost as notably, Edgar Martínez, Barry Bonds and Roger Clemens will see significant (~20-point) jumps from their 2016 support, putting them in line for eventual election. Meanwhile, Curt Schilling’s support will slip noticeably, and Lee Smith will drop off the ballot in his 15th year of eligibility. Although Jorge Posada will be a close call, the model currently does not expect any serious candidates to fall beneath the five percent support threshold. Here is the full rundown of the model’s projections; the chart below will automatically update as more data is collected. The model operates on a simple premise: certain players consistently over- or underperform their polls. The type of voter who reveals his or her ballot in advance is a member of a self-selected demographic: the BBWAA’s forward thinkers, those who believe in transparency and are active on social media, where they often share their ballots. These same traits tend to overlap with a liberal approach to Hall of Fame voting: the use of advanced stats, a forgiving stance on performance-enhancing drugs (PEDs) and a preference for a big Hall over a small Hall. As a result, public ballots tend to overestimate players like Raines and Mike Mussina, whose Hall of Fame cases are best appreciated sabermetrically. They overstate support for the most infamous villains of the steroid era, especially Bonds and Clemens. At the same time, public ballots undershoot the final vote totals of candidates whose cases rely on narratives or traditional statistics such as saves. In recent years, relief pitchers like Hoffman and Smith have seen the biggest gains from public to private ballots. Here is a full list of last year’s numerical shifts from ballots made public before the results were announced to the final tallies: 2016 HALL OF FAME BALLOT NUMERICAL SHIFTS Player Public Ballots Private Ballots Final Results Priv – Pub Final – Pub Ken Griffey Jr. 100.0% 98.7% 99.3% -1.3% -0.7% Mike Piazza 86.4% 79.7% 83.0% -6.7% -3.4% Jeff Bagwell 77.5% 66.1% 71.6% -11.4% -5.9% Tim Raines 75.6% 64.3% 69.8% -11.3% -5.8% Trevor Hoffman 62.9% 71.4% 67.3% 8.5% 4.4% Curt Schilling 60.1% 44.9% 52.3% -15.2% -7.8% Roger Clemens 51.2% 39.6% 45.2% -11.6% -6.0% Barry Bonds 51.6% 37.4% 44.3% -14.2% -7.3% Edgar Martinez 46.9% 40.1% 43.4% -6.8% -3.5% Mike Mussina 50.2% 36.1% 43.0% -14.1% -7.2% Alan Trammell 44.1% 37.9% 40.9% -6.2% -3.2% Lee Smith 28.2% 39.6% 34.1% 11.4% 5.9% Fred McGriff 18.8% 22.9% 20.9% 4.1% 2.1% Jeff Kent 17.8% 15.4% 16.6% -2.4% -1.2% Larry Walker 14.1% 16.7% 15.5% 2.6% 1.4% Mark McGwire 12.2% 12.3% 12.3% 0.1% 0.1% Gary Sheffield 11.3% 11.9% 11.6% 0.6% 0.3% Billy Wagner 8.9% 11.9% 10.5% 3.0% 1.6% Sammy Sosa 7.5% 6.6% 7.0% -0.9% -0.5% Jim Edmonds 2.8% 2.2% 2.5% -0.6% -0.3% Nomar Garciaparra 0.5% 3.1% 1.8% 2.6% 1.3% Raw Ballots Cast 213 227 440 All the model needs to do, then, is estimate how much each candidate will rise or fall (usually fall) in private balloting. It turns out that these changes are fairly consistent from year to year. The model takes a straight average of the public-private differential over the past three elections to calculate a polling adjustment factor for each candidate. (If the candidate has been on the ballot for only one or two years, it just takes the average delta of those one or two years.) The model then adds or subtracts that adjustment to or from the player’s current percentage of public ballots to arrive at an estimated performance on private ballots—i.e., those yet to be revealed. Those two performances—public and private—are then combined in the proper proportions to arrive at a projected overall vote total. For example, on Jan. 3, the 163 known public ballots were combined with the model’s projections for an estimated 272 yet-to-be-revealed private ballots to arrive at a final forecast. (Based on last year’s turnout of 440 voters, the knowledge that a certain number of voters were “purged” this year, and his expectation of several first-time voters, Thibodaux anticipates that 435 ballots will be cast in this year’s election.) As a result, the model is currently heavily reliant on its predictions for private ballots, but as more ballots are made public, the forecast will become more accurate. Unfortunately, this method doesn’t account for players making their first appearances on the Hall of Fame ballot. This year, that list includes two serious threats to be inducted, Rodríguez and Guerrero, as well as two other debatable candidates in Manny Ramírez and Posada. Because these “rookies” have no voting history of their own, the model finds “veteran” candidates with whom the rookies’ votes are well correlated and adjusts their exit polls proportionally. For example, in the public ballots thus far, Posada’s results correlate most strongly with Smith’s. To estimate Posada’s support on private ballots, the model assumes that the catcher’s same strong performance with known pro-Smith voters and weak performance with known anti-Smith voters carry over to private ballots. Of course, as we learned above, the ratio of pro-Smith to anti-Smith ballots is different among private ballots than on public ones, so Posada will rise or fall in tandem (in this case, rise—just as Smith is expected to gain ground when private ballots are revealed, so too should Posada, ever so slightly). By contrast, Ramírez loses ground by this method. It should come as no surprise that his strongest correlations are with the comparably controversial Bonds and Clemens; almost everyone who has voted for Ramírez so far has also voted for Bonds and Clemens. Since past experience with those two has shown us that PED users are unpopular with private voters, we can be confident that Ramírez will drop in private balloting by a handful of points. Rodríguez also correlates pretty well with Bonds and Clemens, but there’s another candidate from Hall of Fame elections past who’s an even better match. Like Rodríguez, Mike Piazza was a catcher well above his position’s standards for enshrinement but dogged by the shadow of PED accusations; it’s little surprise that his and Rodríguez’s supporters strongly overlap. Rather than settle for a less robust correlation, I opted to treat Rodríguez as if he were simply Piazza reincarnated on the ballot; his −8.5 percent adjustment is what Piazza’s would have been had he not been elected last year. Finally, Guerrero is this year’s trickiest call. Not only is he the player closest to the 75 percent threshold for induction, but he isn’t a clear statistical doppelganger for any other candidate on the ballot. His strongest correlation is with Hoffman and his old-school supporters, but it’s not as dramatic as the other rookies’: 81 percent of Hoffman voters so far have opted for Guerrero, but 67 percent of non-Hoffman voters have too. That currently calculates out to a slender 1.2-point gain for Guerrero on private ballots, but there’s a large margin of error. For the candidate over whom there is the most suspense this year, that’s not ideal. With luck, time will demystify the situation. Based on past experience, Thibodaux still expects to add 50 ballots or more to his Tracker before the full results are officially announced on MLB Network on the evening of Jan. 18. I’ll be updating my projections in real time as more ballots become known, and you can follow along on Google Drive or just by revisiting this article as the election draws near. References & Resources Ryan Thibodaux, “BBHOF Tracker” Nathaniel Rakich, “Baseball Hall of Fame Exit Polling” Baseball Think Factory, “The 2015 HOF Ballot Collecting Gizmo!” Carl Bialik and Harry Enten, FiveThirtyEight, “The Polls Missed Trump. We Asked Pollsters Why.” Nathaniel Rakich, Baseballot, “How Accurate are Exit Polls—of the Hall of Fame?” Nathaniel Rakich, Baseballot, “Unskewed Polls: Hall of Fame Edition” Nathaniel Rakich, Baseballot, “‘Unskewing’ Polls of the 2015 Baseball Hall of Fame” Nathaniel Rakich, Baseballot, “How to Interpret the Polls of the 2016 Baseball Hall of Fame Election” Nathaniel Rakich, Baseballot, “What We Learned From This Year’s Hall of Fame Results” lone1c, Over the Monster, “‘The Rule of 10’: Why Hall of Fame voting is broken and unfair” Baseball Hall of Fame, “Class of 1936”