Searching for Biases in the First Round of the Draft

by Chris Mitchell
June 9, 2016

The numbers heavily favor Corey Ray being productive early in his career. (via Univ. of Louisville Sports Info.)

Major League Baseball’s amateur draft is something of a crapshoot. Even in the early portion of the first round, where teams reap significant value from their picks on average, busts aren’t uncommon. Delmon Young, Matt Bush, Bryan Bullington, Matt Hobgood. The list goes on. Meanwhile, Mike Trout fell all the way to the 25th overall pick, and quickly blossomed into a generational talent.

Some degree of volatility is expected due to the sheer difficulty of what teams are tasked with doing. Figuring out how good a 21-year-old college kid will be at age-25 is a tall order, while doing so for a high schooler is an even taller order. Amateur players are inherently risky assets, which almost certainly helps explain the unevenness of first round draft returns.

But I’d posit that at least some of that unevenness isn’t purely the result of random chance. Some of this is likely a case-by-case basis thing: More thorough scouting of the New Jersey area might have predicted Mike Trout’s star potential, for example. But perhaps there’s also something that runs deeper than that. Perhaps there are systemic biases in the way teams evaluate first-round talent, causing certain types of players to be overvalued or undervalued in the first round.

As you probably guessed, I did some math to search for these biases. If you’re not interested in reading about the nitty-gritty and would rather just read my conclusions, feel free to skip to the paragraph that starts with “That last paragraph was a bit wonky.” Everyone else, let’s get nerdy.

I ran some regressions to identify possible biases. My data set includes all players drafted within the first 30 picks from 2002-2009, which I split into hitters and pitchers. I excluded draftees who did not sign. It’s still a little early to know what to make of players drafted in 2010 and later, especially for some of the high school draftees like 2010 No. 2 overall pick Jameson Taillon, who just made his major league debut this week.

My dependent variable was WAR over a player’s first four years of team control. For players who haven’t eschewed their first four team control years yet, I filled in the remaining years using RoS 2016 projections from the FanGraphs depth charts. Most of the players I had to use projections for weren’t good enough for it to make a noticeable difference: Only six from this category were projected for more than 1 WAR this year.

To start, I included a variable for a player’s draft selection in my regression to act as a proxy for his perceived value. It isn’t uncommon for a player to fall a few picks in the first round of the draft for signability reasons, which muddies the calculus a bit. But by and large, the spot at which a player is drafted correlates strongly with his perceived value.

From that baseline, I tested out the following variables in my regressions: a player’s handedness, his height, and whether he was drafted out of high school or college. For hitters, I also tested defensive position at the time of the draft. If teams are acting optimally, the variables pertaining to a player — his background, handedness, height and position — would not turn up statistically significant. After controlling for draft position, a college draftee would be no more or less likely to achieve big league success than a high school draftee. Nor would a left-handed pitcher compared to a righty, or a short player compared to a taller one.

On the pitching side, this is exactly what I found. Nothing came up significant. None of the characteristics I looked at — handedness, educational attainment or height — appeared to be over- or under-valued in the first round. Teams seem to be acting optimally.

Things looked much more interesting on the hitting side, however. The data suggest teams haven’t been valuing all demographics appropriately in the draft.

Since I’m a good boy who tries to avoid overfitting my statistical models, I partitioned my data set into two separate pieces before I got started: one included draftees from 2002-2005 and the other included draftees from 2006-2009. Within both subsets, I found a similar-looking interaction between a hitter’s draft selection and whether he was drafted out of high school or college. Here are the resulting coefficients R spit out when I applied these variables to the full data set: 2002-2009.

REGRESSION COEFFICIENTS PREDICTING WAR FOR DRAFTEES

Variable	Coefficient	P-Value
Intercept	15.564	0.00
Log(Pick)	-4.172	0.00
College Pitcher	-6.373	0.07
High School Hitter	-10.575	0.00
High School Pitcher	-10.593	0.05
Log(Pick) * College Pitcher	2.025	0.14
Log(Pick) * High School Hitter	3.916	0.01
Log(Pick) * High School Pitcher	4.068	0.05

That last paragraph was a bit wonky, and the interpretation of the regression coefficients is the opposite of straightforward. But the main takeaway is this: In the early part of the first round, college hitters tend to outperform high school hitters by a substantial margin. A visual might help make this clear.

Here’s what it looks like when I also include pitchers in the regression.

The high school versus college trend holds for pitchers as well, though the gap wasn’t large enough to trip the “statistically significant” alarm when I looked exclusively at pitchers. That doesn’t necessarily mean the high school versus college disparity doesn’t also exist for pitchers. The data just aren’t as convincing as they are on the hitting side, where the effect is more pronounced. The relative lack of high school pitchers selected in the first round (less than 17 percent of my data set) might explain why nothing super-substantial turned up.

Those graphs and equations are cool and all, but I’ve barely named any of the players who made them look the way they do. Let’s change that. The table below lists the high school hitters selected with the first 10 picks in the first round from 2002-2009.

HIGH SCHOOL HITTERS DRAFTED IN THE EARLY FIRST ROUND, 2002-2009

Year	Pick	Team	Name	WAR in First Four Years
2005	1	Diamondbacks	Justin Upton	13.8
2003	1	Devil Rays	Delmon Young	0.2
2008	1	Rays	Tim Beckham	0.0
2004	1	Padres	Matt Bush	0.0
2002	2	Devil Rays	Melvin Upton Jr.	15.3
2007	2	Royals	Mike Moustakas	9.3
2008	3	Royals	Eric Hosmer	6.1
2007	3	Cubs	Josh Vitters	0.0
2009	3	Padres	Donavan Tate	0.0
2003	5	Royals	Chris Lubanski	0.0
2008	6	Marlins	Kyle Skipworth	0.0
2003	6	Cubs	Ryan Harvey	0.0
2002	7	Brewers	Prince Fielder	12.8
2002	8	Tigers	Scott Moore	0.0
2004	9	Rockies	Chris Nelson	0.0
2006	9	Orioles	Billy Rowell	0.0
2005	10	Tigers	Cameron Maybin	7.8
2003	10	Rockies	Ian Stewart	2.5
Median	4.0			0.0
Average	4.8			3.8

An awful lot of zeroes in there. To name names: Delmon Young, Tim Beckham, Matt Bush, Josh Vitters, Donovan Tate, Chris Lubaski, Kyle Skipworth, Ryan Harvey, Scott Moore, Chris Nelson, Billy Rowell. That’s 11 high school hitters selected in the single-digits over an eight-year span who were essentially useless.

The list of college hitters looks noticeably better.

COLLEGE HITTERS DRAFTED IN THE EARLY FIRST ROUND, 2002-2009

Year	Pick	Team	Name	WAR in First Four Years
2005	2	Royals	Alex Gordon	11.1
2009	2	Mariners	Dustin Ackley	7.0
2003	2	Brewers	Rickie Weeks	6.8
2008	2	Pirates	Pedro Alvarez	5.9
2006	3	Devil Rays	Evan Longoria	28.7
2005	3	Mariners	Jeff Clement	0.0
2005	4	Nationals	Ryan Zimmerman	17.6
2009	4	Pirates	Tony Sanchez	0.1
2008	5	Giants	Buster Posey	23.7
2005	5	Brewers	Ryan Braun	23.0
2007	5	Orioles	Matt Wieters	14.2
2005	7	Rockies	Troy Tulowitzki	16.0
2003	7	Orioles	Nick Markakis	14.2
2008	7	Reds	Yonder Alonso	3.8
2007	7	Brewers	Matt LaPorta	0.0
2006	8	Reds	Drew Stubbs	9.2
2008	8	White Sox	Gordon Beckham	5.7
2008	10	Astros	Jason Castro	6.3
2002	10	Rangers	Drew Meyer	0.0
Median	5.0			7.0
Average	5.3			10.2

Evan Longoria, Ryan Zimmerman, Buster Posey, Ryan Braun, Troy Tulowitzki. Several hitters from this group were (or still are) among the best players in baseball. Even the “flops” weren’t complete zeroes in most cases: Dustin Ackley and Rickie Weeks look like stars next to Delmon Young and Josh Vitters. You don’t need a fancy regression model to see the difference between these two lists.

Here are the high school pitchers.

HIGH SCHOOL PITCHERS DRAFTED IN THE EARLY FIRST ROUND, 2002-2009

Year	Pick	Team	Name	WAR in First Four Years
2002	3	Reds	Chris Gruler	0.0
2002	4	Orioles	Adam Loewen	1.4
2004	5	Brewers	Mark Rogers	0.9
2002	5	Expos	Clint Everts	0.0
2009	5	Orioles	Matt Hobgood	0.0
2002	6	Royals	Zack Greinke	10.2
2009	6	Giants	Zack Wheeler	5.0*
2006	7	Dodgers	Clayton Kershaw	23.7
2004	7	Reds	Homer Bailey	7.6
2003	9	Rangers	John Danks	12.5
2007	9	Diamondbacks	Jarrod Parker	5.0
2009	9	Tigers	Jacob Turner	0.6
2007	10	Giants	Madison Bumgarner	17.4
Median	6.0			5.0
Average	6.5			6.5

*Estimated based on projections

And here are the college pitchers.

COLLEGE PITCHERS DRAFTED IN THE EARLY FIRST ROUND, 2002-2009

Year	Pick	Team	Name	WAR in First Four Years
2007	1	Devil Rays	David Price	19.5
2009	1	Nationals	Stephen Strasburg	15.3
2006	1	Royals	Luke Hochevar	6.6
2002	1	Pirates	Bryan Bullington	0.0
2004	2	Tigers	Justin Verlander	17.3
2006	2	Rockies	Greg Reynolds	0.0
2004	3	Mets	Philip Humber	2.6
2003	3	Tigers	Kyle Sleeth	0.0
2004	4	Devil Rays	Jeff Niemann	6.5
2008	4	Orioles	Brian Matusz	4.9
2003	4	Padres	Tim Stauffer	3.4
2007	4	Pirates	Daniel Moskos	0.2
2006	4	Pirates	Brad Lincoln	0.1
2006	5	Mariners	Brandon Morrow	7.7
2005	6	Blue Jays	Ricky Romero	8.6
2007	6	Nationals	Ross Detwiler	4.0
2006	6	Tigers	Andrew Miller	2.4
2004	6	Indians	Jeremy Sowers	2.2
2009	7	Braves	Mike Minor	6.9
2003	8	Pirates	Paul Maholm	9.6
2009	8	Reds	Mike Leake	5.7
2007	8	Rockies	Casey Weathers	0.0
2005	8	Devil Rays	Wade Townsend	0.0
2002	9	Rockies	Jeff Francis	10.8
2005	9	Mets	Mike Pelfrey	8.9
2006	10	Giants	Tim Lincecum	25.9
2004	10	Rangers	Thomas Diamond	0.0
Median	5.0			4.9
Average	5.2			6.3

On the whole, the list of college pitchers does not look any better than the list of high school arms, though, at the very top, David Price, Stephen Strasburg and Justin Verlander blow away the high school kids who selected with the first few picks.

The far left-hand side of the above graph is the real story here, though what’s happening on the right side is also noteworthy. It seems the script flips towards the end of the first round: High school picks turn out better than college picks. I’m hesitant to say there’s much to this trend. We’re talking a difference of just a couple of WAR on average over several years, which isn’t enough to get too worked up about, especially in a small sample. Furthermore, some of this can be explained by the beautiful outlier that is Mike Trout. Take him out of the mix, and the lines for hitters move much closer together. What jumps out to me is that high schoolers taken at the beginning of the first round haven’t fared much better than their counterparts taken toward the end. This suggests the gap between the elite high school players and second-tier high school players might not be as large as the industry perceives it to be.

One of the more annoying quirks of analyzing prospects is that you have to wait a few years to really know how they turn out. Due to this limitation, my analysis looks exclusively at players who were drafted several years ago. This means that any bias that existed toward high school players vis-à-vis college players may have already been corrected.

Anecdotally, it seems the gap has narrowed, particularly due to better results on the high school side. Bryce Harper, Manny Machado, Carlos Correa and Francisco Lindor were all effectively drafted out of high school, and have already blossomed into stars. At the same time, though, Bubba Starling, Michael Choice, Courtney Hawkins and Dylan Bundy have failed hard.

Additionally, there are players who have had disappointing starts to their careers and are beginning to teeter on the fence of the failed prospect graveyard. This group includes Byron Buxton, Delino DeShields, Alex Jackson, Archie Bradley, Max Fried, Tyler Kolek and Jameson Taillon. It’s tough to say anything definitive about the recent drafts without knowing what will become of the Buxtons and Jacksons. But unlike with the 2002-2009 group, the list of failures doesn’t completely overwhelm the list of successes.

With all this in mind, let’s consider what it all might mean for this year’s crop of draftees. I’ve organized the top 15 from Keith Law’s recent ranking of draft prospects in the table below.

TOP 15 2016 DRAFT PROSPECTS

Rank	Name	Position	Type
1	Corey Ray	OF	College
2	Jason Groome	LHP	High School
3	Delvin Perez	SS	High School
4	Mickey Moniak	OF	High School
5	A.J. Puk	LHP	College
6	Braxton Garrett	LHP	High School
7	Blake Rutherford	OF	High School
8	Kyle Lewis	OF	College
9	Matt Manning	RHP	High School
10	Nick Senzel	3B	College
11	Nolan Jones	SS	High School
12	Joey Wentz	LHP	High School
13	Riley Pint	RHP	High School
14	Ian Anderson	RHP	High School
15	Forrest Whitley	RHP	High School

SOURCE: Keith Law

Twelve out of the top 15 prospects are high schoolers (or high school-aged players) — the prospect archetype that has been most prone to failure in the past. Only three of the top 15 are college hitters — the archetype that’s been most successful.

Using nothing but their rank on this list (which I’m using as a proxy for draft slot) and their player type (high school hitter, college hitter, high school hitter or college pitcher), let’s see what my math suggests these players will do over their first four years of team control. To be perfectly clear, these “projections” don’t take stats, scouting or any other knowledge into account. They’re dumb, terrible projections that are dumb and terrible on purpose. They’re just meant to demonstrate the magnitude of the varying production for each demographic of draftee.

DRAFT PROSPECTS’ EXPECTED WAR BASED ON PLAYER TYPE

Rank	Name	Position	Type	WAR in First Four Years
1	Corey Ray	OF	College	15.6
2	Jason Groome	LHP	High School	4.9
3	Delvin Perez	SS	High School	4.7
4	Mickey Moniak	OF	High School	4.6
5	A.J. Puk	LHP	College	5.7
6	Braxton Garrett	LHP	High School	4.8
7	Blake Rutherford	OF	High School	4.5
8	Kyle Lewis	OF	College	6.9
9	Matt Manning	RHP	High School	4.7
10	Nick Senzel	3B	College	6.0
11	Nolan Jones	SS	High School	4.4
12	Joey Wentz	LHP	High School	4.7
13	Riley Pint	RHP	High School	4.7
14	Ian Anderson	RHP	High School	4.7
15	Forrest Whitley	RHP	High School	4.7

This study had its flaws: My sample size was small and most of the data I used were several years old. A fixed number of players are drafted in the first round each year, which makes it difficult to do a rigorous analysis without analyzing decades of data. And since big league front offices are getting smarter and smarter, they’re probably not making the same mistakes their forefathers did 15 years ago. If I were to condense my findings into one sentence, that sentence would be closer to “In the recent past, college hitters (and likely pitchers too, though the evidence isn’t as strong) have been undervalued relative to their high school counterparts in the first few picks of the draft” than “College players are better bets than high school players in the first few picks of the draft.” Still, I think these findings are both recent enough and significant enough to keep in mind when teams make their picks today.

None of this is to say teams should avoid drafting high school players in the early portion of the first round. It would have been idiotic for the Nationals to pass on Bryce Harper just because Delmon Young and Matt Bush turned out worse than Evan Longoria. At any given pick, it’s entirely possible that a high schooler truly is the best player available. Although the history of high school hitters isn’t pretty, Delvin Perez and Mickey Moniak might actually be the best players available when their names are called on draft day. However, the Cubs thought the same about Josh Vitters. Ditto the Royals with Chris Lubanski, the Orioles with Billy Rowell and Matt Hobgood and the Rays with Tim Beckham and Delmon Young. An outsized share of teams that took high school players — particularly high school hitters — with their single-digit first round picks in the recent past ultimately regretted their decision, and it’s happened often enough that it probably isn’t a coincidence.

References and Resources

Keith Law, ESPN, “MLB draft Big Board: Ray tops list, Groome next up“

5 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

evo34

9 years ago

Would be interesting to see if age is a significant predictor among first-round HS hitters. Overall, love the concept here.

jdbolick

Nice work! This appears to dovetail with what we know regarding pitchers having much less of an aging curve during their 20s than hitters do. So college hitters would be farther along in their development, theoretically easier to project, and closer to their peak than high school hitters.

Jetsy Extrano

Nice work. This refines previous work that implies teams are taking too many pitchers too high, and would get more payback from hitters — it’s specifically college hitters they’d get more from, this says.

Can the data set support splitting hitters by side of the defensive spectrum? I feel like taking 1B high is a disaster…

What are the curves you show for wins versus draft position, are those a model you fitted? It looks like a pretty low-parameter model; I’d be curious to see a LOESS smoothing for comparison.

SideshowRaheem

Someone at BP did a similar study and found that it wasn’t even necessarily a split between HS/College as much as it was being young for whatever level the player is at. For example, a top HS prospect who is 19 when drafted has statistically done far worse than a a top HS prospect thats drafted at 17.

gary

Wouldn’t bonus paid be a better “value” proxy than draft position? Or try that and get back to us! See how close the results are/aren’t.

BAL	CHW	ATH
BOS	CLE	HOU
NYY	DET	LAA
TBR	KCR	SEA
TOR	MIN	TEX

ATL	CHC	ARI
MIA	CIN	COL
NYM	MIL	LAD
PHI	PIT	SDP
WSN	STL	SFG