Geographic Biases in the MLB Draft by Dan Meyer and Alex Smith December 3, 2014 Stephen Strasburg actually comes from the undervalued four-year category. (via Cathy T) Historical data on the MLB Rule 4 draft makes it clear that the most domestic talent comes from the states of California, Texas, and Florida. This is without question. However, the purpose of this report is to examine where the best MLB talent comes from and if there are any inherent geographic biases in the way teams have drafted. Further, are there specific groups and types of players who are continually undervalued? To begin, we divided players into six unique categories: Four-year college pitchers Junior college pitchers High school pitchers Four-year college position players Junior college position players High school position players One note on these categories. When we say four-year college, we don’t mean the player in question had to attend all four years. We’re simply distinguishing between four-year colleges and junior colleges. Using MLB draft data provided by Baseball-Reference, we broke down each of the above groups by state and found the total numbers of players selected, their average draft position, and whether or not they made it to the majors. From there, we calculated the percentage of players from each state that reached the majors, each state’s percent share of total drafted players across all states, and each state’s percent share of total major leaguers across all states. The variable we will key in on is the difference between the percent share of players drafted and the percent share of major leaguers. In a world where teams draft perfectly, this difference would be zero. That is, in each state players of a certain subgroup are drafted in the same proportion that they are represented in the big leagues. Any deviation from zero represents an inefficiency. For example, of all four-year college pitchers drafted, three percent were selected from South Carolina schools. We also find three percent when we examine South Carolina’s share of all four-year college pitchers who made the majors. Thus, South Carolina four-year pitchers have a “difference” of zero percent, and we conclude that these pitchers are being properly evaluated. On the other hand, California high school position players represented 23.3 percent of all high school position players drafted. Yet, 32.1 percent of all high school position players who were drafted and made it to the majors were from California. This whopping difference of -8.8 percent indicates that California high school position players historically have been severely undervalued. Our first hypothesis was that MLB teams were not drafting enough players from the heavily populated baseball areas or baseball power states. To test this, we examined the differences between the percent share of players drafted and percent share of major leaguers for each state in all six subgroups. We then found the correlation between total players drafted from each state and that difference. A negative correlation would imply that a large negative difference, like the one we saw with California high school position players, is associated with a large number of draftees from that state. As the graph shows, there is in fact a strong negative correlation between total players drafted from a given state and the difference between the state’s share of total drafted players and its share of total drafted major leaguers. The largest negative correlation (-.89) falls with high school position players, in large part due to California (though the correlation remains strong even with California removed). There could be several underlying reasons for the undervaluing of players from states with high MLB draftee numbers, including: Teams overrate tools on high school position players from non-baseball power states Teams underrate the value of having consistent at-bats against quality competition Teams underrate the value of better hitting instruction in baseball power states Teams do not want to invest too heavily in one region Scouts are inefficiently allocated throughout the country Following the conclusion that teams were not drafting enough players from baseball power states, we decided to narrow our focus to recent drafts. Was this merely a historical trend that teams had improved upon over time? To accomplish this we performed the same study, but only for players drafted between the years 2004 and 2008. This graph shows that teams have been eliminating much of the geographic bias, particularly with junior college players and high school pitchers. However, there is still a significant amount of geographic bias in how high school position players are drafted (-.66). Over this period California high school position players also still had a significant difference of -4.8 percent. We originally thought this bias existed primarily because of the overvaluing of the tools of high school position players from non-baseball power states. It must be considered, though, that while a plus-tools player from a non-baseball state might be less likely to make the major leagues, he may be more likely to become an impact major league player than a player with below-average tools from a baseball power state. To look into this, we performed the same analysis, but instead of examining whether or not a player simply made it to the big leagues, we looked at whether or not they remained in The Show as a regular. This time, difference represents the percent share of players drafted from a specific state minus the percent share of MLB regulars from a specific state. For the purpose of this study, a regular was a player who played four full seasons (150 games x four for position players and 25 appearances x four for pitchers). When only MLB regulars are considered, the negative correlations for most of the groups sharply tilt towards zero. However, there is still a large geographic bias against baseball power states when it comes to high school position players.A Hardball Times Updateby RJ McDanielGoodbye for now. Once we established this bias existed, we wanted to see what types of players from which states were specifically being over- or undervalued. The tables below show the five most over- and undervalued types of players at each level since the creation of the draft. Undervalued Four-Year Position State # Drafted ADP # Making MLB % Made It Draft Share MLB Share Difference Pos Player CA 2027 503.0 305 15.1% 16.8% 20.1% -3.3% Pos Player AZ 356 404.6 91 25.6% 3.0% 6.0% -3.0% Pitcher CA 1618 506.0 278 17.2% 14.7% 16.5% -1.8% Pitcher TX 840 516.6 153 18.2% 7.6% 9.1% -1.4% Pitcher LA 392 525.8 76 19.4% 3.6% 4% -0.9% Overvalued Four-Year Position State # Drafted ADP # Making MLB % Made It Draft Share MLB Share Difference Pitcher OR 157 470.4 12 7.6% 1.4% 0.7% 0.7% Pitcher PA 271 572.0 29 10.7% 2.5% 1.7% 0.7% Pitcher NY 325 563.1 37 11.4% 2.9% 2.2% 0.8% Pos Player NC 509 557.5 45 8.8% 4.2% 3.0% 1.3% Pos Player NY 319 585.0 12 3.8% 2.7% 0.8% 1.9% Undervalued High School Position State # Drafted ADP # Making MLB % Made It Draft Share MLB Share Difference Pos Player CA 2258 571.1 296 13.1% 23.3% 32.1% -8.8% Pitcher CA 1478 596.5 186 12.6% 19.2% 21.3% -2.0% Pitcher TX 600 620.4 83 13.8% 7.8% 9.5% -1.7% Pos Player FL 1074 679.1 115 10.7% 11.1% 12.5% -1.4% Pitcher LA 118 617.6 19 16.1% 1.5% 2.2% -0.6% Overvalued High School Position State # Drafted ADP # Making MLB % Made It Draft Share MLB Share Difference Pos Player MD 122 672.1 4 3.3% 1.3% 0.4% 0.8% Pitcher OK 169 586.4 12 7.1% 2.2% 1.4% 0.8% Pos Player CO 116 855.7 3 2.6% 1.2% 0.3% 0.9% Pos Player NJ 231 567.2 12 5.2% 2.4% 1.3% 1.1% Pitcher Canada 184 888.0 11 6.0% 2.4% 1.3% 1.1% Undervalued Junior College Position State # Drafted ADP # Making MLB % Made It Draft Share MLB Share Difference Pitcher CA 1436 534.1 136 9.5% 30% 34% -5% Pos Player CA 1711 486.7 131 7.7% 36.0% 40.4% -4.5% Pos Player FL 745 565.3 60 8.1% 15.7% 18.5% -2.9% Pos Player GA 153 501.3 17 11.1% 3.2% 5.2% -2.0% Pitcher CO 45 548.9 8 17.8% 1% 2% -1% Overvalued Junior College Position State # Drafted ADP # Making MLB % Made It Draft Share MLB Share Difference Pitcher CA 1436 534.1 136 9.5% 30% 34% -5% Pitcher GA 124 540.1 4 3.2% 3% 1% 2% Pos Player TN 88 514.8 0 0.0% 1.8% 0.0% 1.8% Pitcher WA 212 540.4 9 4.3% 4% 2% 2% Pos Player AZ 232 567.1 9 3.9% 4.9% 2.8% 2.1% Pos Player WA 153 487.9 3 2.0% 3.2% 0.9% 2.3% These tables clearly illustrate the most overvalued groups tend to come from non-baseball power states such as New York, Pennsylvania, Washington, Maryland, Colorado, and Oregon, while the undervalued groups tend to come from the baseball power states of California, Texas, and Florida. Arizona junior colleges are the one major exception to this rule. Also notable is that teams are most efficient when drafting players from four-year colleges. The range from most overvalued to most undervalued for four-year college players was only 5.2 percent, much lower than the range for junior college players of 7.3 percent, and high school players of 9.9 percent. We mentioned earlier that teams are getting better at eliminating the bias, but as our chart showed, there still was significant correlation between share difference and amount of players drafted from that state. The sample sizes are smaller, so there is more variability, but here are the top five most over- and undervalued types of players at each level from 2004 to 2008 (junior college is excluded because the sample size was too small). Undervalued Four-Year Position State # Drafted ADP # Making MLB % Made It Draft Share MLB Share Difference Pos Player CA 249 570.6 46 18.5% 16.3% 20.4% -4.1% Pos Player AZ 35 354.9 13 37.1% 2.3% 5.8% -3.5% Pitcher CA 255 546.6 51 20.0% 15.5% 18.5% -3.1% Pitcher OH 41 618.2 15 36.6% 2.5% 5.5% -3.0% Pos Player TN 56 576.3 12 21.4% 3.7% 5.3% -1.7% Overvalued Four-Year Position State # Drafted ADP # Making MLB % Made It Draft Share MLB Share Difference Pitcher FL 109 674.1 15 13.8% 6.6% 5.5% 1.2% Pitcher OK 49 694.3 5 10.2% 3.0% 1.8% 1.2% Pos Player KY 25 617.6 1 4.0% 1.6% 0.4% 1.2% Pitcher PA 41 703.5 1 2.4% 2.5% 0.4% 2.1% Pos Player NY 40 805.9 0 0.0% 2.6% 0.0% 2.6% Undervalued High School Position State # Drafted ADP # Making MLB % Made It Draft Share MLB Share Difference Pitcher TX 69 667.0 17 24.6% 9.7% 15.6% -5.9% Pos Player CA 149 752.8 21 14.1% 18.3% 23.1% -4.8% Pos Player FL 112 729.5 16 14.3% 13.8% 17.6% -3.8% Pitcher CA 117 724.0 21 18.0% 16.5% 19.3% -2.8% Pitcher NJ 23 909.6 6 26.1% 3.2% 5.5% -2.3% Overvalued High School Position State # Drafted ADP # Making MLB % Made It Draft Share MLB Share Difference Pitcher PA 13 856.5 0 0.0% 1.8% 0.0% 1.8% Pos Player PR 52 641.9 4 7.7% 6.4% 4.4% 2.0% Pitcher WA 22 872.5 1 4.6% 3.1% 0.9% 2.2% Pitcher OK 16 947.8 0 0.0% 2.3% 0.0% 2.3% Pitcher Canada 30 833.4 2 6.7% 4.2% 1.8% 2.4% There is slightly more variation, with Florida four-year pitchers swinging the other way; however, the recent data remain fairly consistent with the overall findings. Again, the most overvalued groups tend to be northern or non-baseball power states, while the most undervalued are still year-round baseball power states. Moving forward, it’s important to consider the nature of the amateur evaluation scene and how national showcase tournaments and events are organized with this data in mind. For example, in the Area Code Games and East Coast Professional Showcase–two of the most prominent events on the summer circuit for high school players–individuals are selected by scouts to participate on teams based on their region, with only one team representing each region. However, if significantly more players are drafted from regions such as Southern California, Texas, and Florida, and more of those players are likely to be major leaguers and impact major leaguers, teams would find further value if a greater percentage of participants in these events came from those regions. In a given year, there likely are dozens of players from baseball power regions who are more talented than their counterparts from non-baseball power regions, but who lose the opportunity to compete in these events, in front of hundreds of scouts, due to the location of their home town. If national showcase events over-represent players from non-baseball power states and leave potential draftees and MLB regulars off the roster, can these players left home at least expect to be noticed by their area scouts? Through Baseball America’s scouting directory, we have mapped the location of most team’s area scouts versus the locations of successful and unsuccessful draftees. A dot represents the presence of a scout or player hailing from that town. Therefore, if several players are coming out of one school, it will still show up as one dot. The concentration of players and scouts located in southern California makes it harder to interpret, but from the looks of it, scouts are well deployed throughout the country. Many scouts are prepared to watch and evaluate the volume of players in the power states, while fewer are in the non-baseball power states and regions like the Northeast. California, Texas, and Florida are being undervalued, but it does not appear to be due to a lack of eyes on them. If teams are properly allocating their scouts, then something else must be accounting for the under-appreciation of these players. As outlined above, it could be due to scouts underrating the value of facing quality competition and the lasting effects that has on players as they turn pro and work through the minor leagues. Perhaps it is these players’ more advanced approach and feel for the game that allows them to reach the majors at a higher rate. There is no doubt scouts take these things into account in their evaluations, making the true reason for this imbalance difficult to pin down. Regardless, we have shown that there is a real, persistent bias in the draft that undervalues players from baseball power states and overvalues players from non-baseball power states. Though more scouts would help (we’re in favor of creating more baseball jobs), based on the distribution of scouts and players throughout the country we can begin to rule out lack of manpower as a reason for letting this talent go unidentified. It might be as simple as not wanting to lean too much on one region. Nevertheless, a shift needs to take place, as the imbalance is too large to ignore. References and Resources Percent “difference” reflects (Drafted State/Total Drafted) – (Majors State/Total Majors) Draft data courtesy of Baseball-Reference Scouting locations courtesy of Baseball America