Searching for Biases in the First Round of the Draft
Major League Baseball’s amateur draft is something of a crapshoot. Even in the early portion of the first round, where teams reap significant value from their picks on average, busts aren’t uncommon. Delmon Young, Matt Bush, Bryan Bullington, Matt Hobgood. The list goes on. Meanwhile, Mike Trout fell all the way to the 25th overall pick, and quickly blossomed into a generational talent.
Some degree of volatility is expected due to the sheer difficulty of what teams are tasked with doing. Figuring out how good a 21-year-old college kid will be at age-25 is a tall order, while doing so for a high schooler is an even taller order. Amateur players are inherently risky assets, which almost certainly helps explain the unevenness of first round draft returns.
But I’d posit that at least some of that unevenness isn’t purely the result of random chance. Some of this is likely a case-by-case basis thing: More thorough scouting of the New Jersey area might have predicted Mike Trout’s star potential, for example. But perhaps there’s also something that runs deeper than that. Perhaps there are systemic biases in the way teams evaluate first-round talent, causing certain types of players to be overvalued or undervalued in the first round.
As you probably guessed, I did some math to search for these biases. If you’re not interested in reading about the nitty-gritty and would rather just read my conclusions, feel free to skip to the paragraph that starts with “That last paragraph was a bit wonky.” Everyone else, let’s get nerdy.
I ran some regressions to identify possible biases. My data set includes all players drafted within the first 30 picks from 2002-2009, which I split into hitters and pitchers. I excluded draftees who did not sign. It’s still a little early to know what to make of players drafted in 2010 and later, especially for some of the high school draftees like 2010 No. 2 overall pick Jameson Taillon, who just made his major league debut this week.
My dependent variable was WAR over a player’s first four years of team control. For players who haven’t eschewed their first four team control years yet, I filled in the remaining years using RoS 2016 projections from the FanGraphs depth charts. Most of the players I had to use projections for weren’t good enough for it to make a noticeable difference: Only six from this category were projected for more than 1 WAR this year.
To start, I included a variable for a player’s draft selection in my regression to act as a proxy for his perceived value. It isn’t uncommon for a player to fall a few picks in the first round of the draft for signability reasons, which muddies the calculus a bit. But by and large, the spot at which a player is drafted correlates strongly with his perceived value.
From that baseline, I tested out the following variables in my regressions: a player’s handedness, his height, and whether he was drafted out of high school or college. For hitters, I also tested defensive position at the time of the draft. If teams are acting optimally, the variables pertaining to a player — his background, handedness, height and position — would not turn up statistically significant. After controlling for draft position, a college draftee would be no more or less likely to achieve big league success than a high school draftee. Nor would a left-handed pitcher compared to a righty, or a short player compared to a taller one.
On the pitching side, this is exactly what I found. Nothing came up significant. None of the characteristics I looked at — handedness, educational attainment or height — appeared to be over- or under-valued in the first round. Teams seem to be acting optimally.
Things looked much more interesting on the hitting side, however. The data suggest teams haven’t been valuing all demographics appropriately in the draft.
Since I’m a good boy who tries to avoid overfitting my statistical models, I partitioned my data set into two separate pieces before I got started: one included draftees from 2002-2005 and the other included draftees from 2006-2009. Within both subsets, I found a similar-looking interaction between a hitter’s draft selection and whether he was drafted out of high school or college. Here are the resulting coefficients R spit out when I applied these variables to the full data set: 2002-2009.
Variable | Coefficient | P-Value |
Intercept | 15.564 | 0.00 |
Log(Pick) | -4.172 | 0.00 |
College Pitcher | -6.373 | 0.07 |
High School Hitter | -10.575 | 0.00 |
High School Pitcher | -10.593 | 0.05 |
Log(Pick) * College Pitcher | 2.025 | 0.14 |
Log(Pick) * High School Hitter | 3.916 | 0.01 |
Log(Pick) * High School Pitcher | 4.068 | 0.05 |
That last paragraph was a bit wonky, and the interpretation of the regression coefficients is the opposite of straightforward. But the main takeaway is this: In the early part of the first round, college hitters tend to outperform high school hitters by a substantial margin. A visual might help make this clear.
Here’s what it looks like when I also include pitchers in the regression.
The high school versus college trend holds for pitchers as well, though the gap wasn’t large enough to trip the “statistically significant” alarm when I looked exclusively at pitchers. That doesn’t necessarily mean the high school versus college disparity doesn’t also exist for pitchers. The data just aren’t as convincing as they are on the hitting side, where the effect is more pronounced. The relative lack of high school pitchers selected in the first round (less than 17 percent of my data set) might explain why nothing super-substantial turned up.
Those graphs and equations are cool and all, but I’ve barely named any of the players who made them look the way they do. Let’s change that. The table below lists the high school hitters selected with the first 10 picks in the first round from 2002-2009.
Year | Pick | Team | Name | WAR in First Four Years |
2005 | 1 | Diamondbacks | Justin Upton | 13.8 |
2003 | 1 | Devil Rays | Delmon Young | 0.2 |
2008 | 1 | Rays | Tim Beckham | 0.0 |
2004 | 1 | Padres | Matt Bush | 0.0 |
2002 | 2 | Devil Rays | Melvin Upton Jr. | 15.3 |
2007 | 2 | Royals | Mike Moustakas | 9.3 |
2008 | 3 | Royals | Eric Hosmer | 6.1 |
2007 | 3 | Cubs | Josh Vitters | 0.0 |
2009 | 3 | Padres | Donavan Tate | 0.0 |
2003 | 5 | Royals | Chris Lubanski | 0.0 |
2008 | 6 | Marlins | Kyle Skipworth | 0.0 |
2003 | 6 | Cubs | Ryan Harvey | 0.0 |
2002 | 7 | Brewers | Prince Fielder | 12.8 |
2002 | 8 | Tigers | Scott Moore | 0.0 |
2004 | 9 | Rockies | Chris Nelson | 0.0 |
2006 | 9 | Orioles | Billy Rowell | 0.0 |
2005 | 10 | Tigers | Cameron Maybin | 7.8 |
2003 | 10 | Rockies | Ian Stewart | 2.5 |
Median | 4.0 | 0.0 | ||
Average | 4.8 | 3.8 |
An awful lot of zeroes in there. To name names: Delmon Young, Tim Beckham, Matt Bush, Josh Vitters, Donovan Tate, Chris Lubaski, Kyle Skipworth, Ryan Harvey, Scott Moore, Chris Nelson, Billy Rowell. That’s 11 high school hitters selected in the single-digits over an eight-year span who were essentially useless.
The list of college hitters looks noticeably better.
Year | Pick | Team | Name | WAR in First Four Years |
2005 | 2 | Royals | Alex Gordon | 11.1 |
2009 | 2 | Mariners | Dustin Ackley | 7.0 |
2003 | 2 | Brewers | Rickie Weeks | 6.8 |
2008 | 2 | Pirates | Pedro Alvarez | 5.9 |
2006 | 3 | Devil Rays | Evan Longoria | 28.7 |
2005 | 3 | Mariners | Jeff Clement | 0.0 |
2005 | 4 | Nationals | Ryan Zimmerman | 17.6 |
2009 | 4 | Pirates | Tony Sanchez | 0.1 |
2008 | 5 | Giants | Buster Posey | 23.7 |
2005 | 5 | Brewers | Ryan Braun | 23.0 |
2007 | 5 | Orioles | Matt Wieters | 14.2 |
2005 | 7 | Rockies | Troy Tulowitzki | 16.0 |
2003 | 7 | Orioles | Nick Markakis | 14.2 |
2008 | 7 | Reds | Yonder Alonso | 3.8 |
2007 | 7 | Brewers | Matt LaPorta | 0.0 |
2006 | 8 | Reds | Drew Stubbs | 9.2 |
2008 | 8 | White Sox | Gordon Beckham | 5.7 |
2008 | 10 | Astros | Jason Castro | 6.3 |
2002 | 10 | Rangers | Drew Meyer | 0.0 |
Median | 5.0 | 7.0 | ||
Average | 5.3 | 10.2 |
Evan Longoria, Ryan Zimmerman, Buster Posey, Ryan Braun, Troy Tulowitzki. Several hitters from this group were (or still are) among the best players in baseball. Even the “flops” weren’t complete zeroes in most cases: Dustin Ackley and Rickie Weeks look like stars next to Delmon Young and Josh Vitters. You don’t need a fancy regression model to see the difference between these two lists.
Here are the high school pitchers.
Year | Pick | Team | Name | WAR in First Four Years |
2002 | 3 | Reds | Chris Gruler | 0.0 |
2002 | 4 | Orioles | Adam Loewen | 1.4 |
2004 | 5 | Brewers | Mark Rogers | 0.9 |
2002 | 5 | Expos | Clint Everts | 0.0 |
2009 | 5 | Orioles | Matt Hobgood | 0.0 |
2002 | 6 | Royals | Zack Greinke | 10.2 |
2009 | 6 | Giants | Zack Wheeler | 5.0* |
2006 | 7 | Dodgers | Clayton Kershaw | 23.7 |
2004 | 7 | Reds | Homer Bailey | 7.6 |
2003 | 9 | Rangers | John Danks | 12.5 |
2007 | 9 | Diamondbacks | Jarrod Parker | 5.0 |
2009 | 9 | Tigers | Jacob Turner | 0.6 |
2007 | 10 | Giants | Madison Bumgarner | 17.4 |
Median | 6.0 | 5.0 | ||
Average | 6.5 | 6.5 |
And here are the college pitchers.
Year | Pick | Team | Name | WAR in First Four Years |
2007 | 1 | Devil Rays | David Price | 19.5 |
2009 | 1 | Nationals | Stephen Strasburg | 15.3 |
2006 | 1 | Royals | Luke Hochevar | 6.6 |
2002 | 1 | Pirates | Bryan Bullington | 0.0 |
2004 | 2 | Tigers | Justin Verlander | 17.3 |
2006 | 2 | Rockies | Greg Reynolds | 0.0 |
2004 | 3 | Mets | Philip Humber | 2.6 |
2003 | 3 | Tigers | Kyle Sleeth | 0.0 |
2004 | 4 | Devil Rays | Jeff Niemann | 6.5 |
2008 | 4 | Orioles | Brian Matusz | 4.9 |
2003 | 4 | Padres | Tim Stauffer | 3.4 |
2007 | 4 | Pirates | Daniel Moskos | 0.2 |
2006 | 4 | Pirates | Brad Lincoln | 0.1 |
2006 | 5 | Mariners | Brandon Morrow | 7.7 |
2005 | 6 | Blue Jays | Ricky Romero | 8.6 |
2007 | 6 | Nationals | Ross Detwiler | 4.0 |
2006 | 6 | Tigers | Andrew Miller | 2.4 |
2004 | 6 | Indians | Jeremy Sowers | 2.2 |
2009 | 7 | Braves | Mike Minor | 6.9 |
2003 | 8 | Pirates | Paul Maholm | 9.6 |
2009 | 8 | Reds | Mike Leake | 5.7 |
2007 | 8 | Rockies | Casey Weathers | 0.0 |
2005 | 8 | Devil Rays | Wade Townsend | 0.0 |
2002 | 9 | Rockies | Jeff Francis | 10.8 |
2005 | 9 | Mets | Mike Pelfrey | 8.9 |
2006 | 10 | Giants | Tim Lincecum | 25.9 |
2004 | 10 | Rangers | Thomas Diamond | 0.0 |
Median | 5.0 | 4.9 | ||
Average | 5.2 | 6.3 |
On the whole, the list of college pitchers does not look any better than the list of high school arms, though, at the very top, David Price, Stephen Strasburg and Justin Verlander blow away the high school kids who selected with the first few picks.
The far left-hand side of the above graph is the real story here, though what’s happening on the right side is also noteworthy. It seems the script flips towards the end of the first round: High school picks turn out better than college picks. I’m hesitant to say there’s much to this trend. We’re talking a difference of just a couple of WAR on average over several years, which isn’t enough to get too worked up about, especially in a small sample. Furthermore, some of this can be explained by the beautiful outlier that is Mike Trout. Take him out of the mix, and the lines for hitters move much closer together. What jumps out to me is that high schoolers taken at the beginning of the first round haven’t fared much better than their counterparts taken toward the end. This suggests the gap between the elite high school players and second-tier high school players might not be as large as the industry perceives it to be.
One of the more annoying quirks of analyzing prospects is that you have to wait a few years to really know how they turn out. Due to this limitation, my analysis looks exclusively at players who were drafted several years ago. This means that any bias that existed toward high school players vis-à-vis college players may have already been corrected.
Anecdotally, it seems the gap has narrowed, particularly due to better results on the high school side. Bryce Harper, Manny Machado, Carlos Correa and Francisco Lindor were all effectively drafted out of high school, and have already blossomed into stars. At the same time, though, Bubba Starling, Michael Choice, Courtney Hawkins and Dylan Bundy have failed hard.
Additionally, there are players who have had disappointing starts to their careers and are beginning to teeter on the fence of the failed prospect graveyard. This group includes Byron Buxton, Delino DeShields, Alex Jackson, Archie Bradley, Max Fried, Tyler Kolek and Jameson Taillon. It’s tough to say anything definitive about the recent drafts without knowing what will become of the Buxtons and Jacksons. But unlike with the 2002-2009 group, the list of failures doesn’t completely overwhelm the list of successes.
With all this in mind, let’s consider what it all might mean for this year’s crop of draftees. I’ve organized the top 15 from Keith Law’s recent ranking of draft prospects in the table below.
Rank | Name | Position | Type |
1 | Corey Ray | OF | College |
2 | Jason Groome | LHP | High School |
3 | Delvin Perez | SS | High School |
4 | Mickey Moniak | OF | High School |
5 | A.J. Puk | LHP | College |
6 | Braxton Garrett | LHP | High School |
7 | Blake Rutherford | OF | High School |
8 | Kyle Lewis | OF | College |
9 | Matt Manning | RHP | High School |
10 | Nick Senzel | 3B | College |
11 | Nolan Jones | SS | High School |
12 | Joey Wentz | LHP | High School |
13 | Riley Pint | RHP | High School |
14 | Ian Anderson | RHP | High School |
15 | Forrest Whitley | RHP | High School |
Twelve out of the top 15 prospects are high schoolers (or high school-aged players) — the prospect archetype that has been most prone to failure in the past. Only three of the top 15 are college hitters — the archetype that’s been most successful.
Using nothing but their rank on this list (which I’m using as a proxy for draft slot) and their player type (high school hitter, college hitter, high school hitter or college pitcher), let’s see what my math suggests these players will do over their first four years of team control. To be perfectly clear, these “projections” don’t take stats, scouting or any other knowledge into account. They’re dumb, terrible projections that are dumb and terrible on purpose. They’re just meant to demonstrate the magnitude of the varying production for each demographic of draftee.
Rank | Name | Position | Type | WAR in First Four Years |
1 | Corey Ray | OF | College | 15.6 |
2 | Jason Groome | LHP | High School | 4.9 |
3 | Delvin Perez | SS | High School | 4.7 |
4 | Mickey Moniak | OF | High School | 4.6 |
5 | A.J. Puk | LHP | College | 5.7 |
6 | Braxton Garrett | LHP | High School | 4.8 |
7 | Blake Rutherford | OF | High School | 4.5 |
8 | Kyle Lewis | OF | College | 6.9 |
9 | Matt Manning | RHP | High School | 4.7 |
10 | Nick Senzel | 3B | College | 6.0 |
11 | Nolan Jones | SS | High School | 4.4 |
12 | Joey Wentz | LHP | High School | 4.7 |
13 | Riley Pint | RHP | High School | 4.7 |
14 | Ian Anderson | RHP | High School | 4.7 |
15 | Forrest Whitley | RHP | High School | 4.7 |
This study had its flaws: My sample size was small and most of the data I used were several years old. A fixed number of players are drafted in the first round each year, which makes it difficult to do a rigorous analysis without analyzing decades of data. And since big league front offices are getting smarter and smarter, they’re probably not making the same mistakes their forefathers did 15 years ago. If I were to condense my findings into one sentence, that sentence would be closer to “In the recent past, college hitters (and likely pitchers too, though the evidence isn’t as strong) have been undervalued relative to their high school counterparts in the first few picks of the draft” than “College players are better bets than high school players in the first few picks of the draft.” Still, I think these findings are both recent enough and significant enough to keep in mind when teams make their picks today.
None of this is to say teams should avoid drafting high school players in the early portion of the first round. It would have been idiotic for the Nationals to pass on Bryce Harper just because Delmon Young and Matt Bush turned out worse than Evan Longoria. At any given pick, it’s entirely possible that a high schooler truly is the best player available. Although the history of high school hitters isn’t pretty, Delvin Perez and Mickey Moniak might actually be the best players available when their names are called on draft day. However, the Cubs thought the same about Josh Vitters. Ditto the Royals with Chris Lubanski, the Orioles with Billy Rowell and Matt Hobgood and the Rays with Tim Beckham and Delmon Young. An outsized share of teams that took high school players — particularly high school hitters — with their single-digit first round picks in the recent past ultimately regretted their decision, and it’s happened often enough that it probably isn’t a coincidence.
References and Resources
- Keith Law, ESPN, “MLB draft Big Board: Ray tops list, Groome next up“
Would be interesting to see if age is a significant predictor among first-round HS hitters. Overall, love the concept here.
Nice work! This appears to dovetail with what we know regarding pitchers having much less of an aging curve during their 20s than hitters do. So college hitters would be farther along in their development, theoretically easier to project, and closer to their peak than high school hitters.
Nice work. This refines previous work that implies teams are taking too many pitchers too high, and would get more payback from hitters — it’s specifically college hitters they’d get more from, this says.
Can the data set support splitting hitters by side of the defensive spectrum? I feel like taking 1B high is a disaster…
What are the curves you show for wins versus draft position, are those a model you fitted? It looks like a pretty low-parameter model; I’d be curious to see a LOESS smoothing for comparison.
Someone at BP did a similar study and found that it wasn’t even necessarily a split between HS/College as much as it was being young for whatever level the player is at. For example, a top HS prospect who is 19 when drafted has statistically done far worse than a a top HS prospect thats drafted at 17.
Wouldn’t bonus paid be a better “value” proxy than draft position? Or try that and get back to us! See how close the results are/aren’t.