The young and the aging

As soon as a player starts racking up numbers in the big leagues, we know a lot about where his performance might go from there. Researchers have worked hard to understand how players develop at that level, and rightfully so: You can’t make respectable projections without at least some approximate knowledge in that department.

What’s more difficult—and, accordingly, has received less attention—is how players develop before they hit the majors. For minor leaguers, we can use major league equivalencies (MLEs) and approximate, but the the wide range of run environments and competition levels in the minors means that we need to make more and more assumptions. The results are ever more approximate.

This is one of the few areas where working with college data might actually be a plus. Usually, the paucity of available stats, along with the vast number of players who will never go pro, means that college-level analysis is daunting. Here it solves some problems. We don’t need to worry about players changing leagues, or the biases involved in focusing only on players who reached a certain level. Many college players give us two, three, even four years of performance in the same park against the same competition.

College and the minors

What college data can give us, then, is approximate development patterns for ages 19-22. Not every college player is the same age, of course, but since date of birth is hard to come by for many players, it’s more practical to assume that freshmen are 19, sophomores are 20, and so on. Athletes tend to be a bit old for their grade, and baseball is played in the spring semester, so it’s a reasonable assumption.

Naturally, we need to use extreme caution in applying these development patterns outside of college ball. If we look at only those players who end up in the pros, those who go to college and those who don’t profile very differently. A very polished 18-year-old may not go to college at all, while an 18-year-old who hasn’t convinced scouts of his ability to play is much more likely to start work on a degree.

Even if we are limited to using college results for college players, there’s a lot to be gained. Let’s turn to the numbers and see what they have to tell us.

Preliminary aging patterns

To generate year-to-year multipliers, I identified all the players over the last few years who amassed at least 150 at-bats in consecutive seasons. I assigned each pair of seasons to one of three groups: Freshman/Sophomore, Sophomore/Junior, or Junior/Senior. I then adjusted the numbers for park and strength of schedule. Each one can vary a bit from year to year for a given school, and the adjustment also allows us to include transfers.

Finally, for each of the three groups, I averaged every pertinent pair of seasons, weighting for playing time. For instance, a player who had 250 at-bats in each season is weighted more heavily than one who had 170 and 220. Across all of Division One college baseball, here’s what I came up with:

Pair      Players    $H  $D+T   $HR   $BB    $K   $SB $SB/ATT  
Fr -> So      335  1.05  1.26  1.73  1.19  0.98  1.34    1.11  
So -> Jr      612  1.04  1.18  1.70  1.18  1.01  1.32    1.09  
Jr -> Sr      746  1.03  1.18  1.57  1.18  1.01  1.30    1.07

For a small table, there’s a lot of information here. Each entry is a multiplier representing the average increase in a given statistical category. For instance, the average freshman with 150 or more at-bats got 5 percent more hits (per at-bat) as a sophomore. A few more observations:

  • “Players” refers to the number of pairs of seasons I was able to use. The small number of Fr/So is no surprise since relatively few freshmen are given the opportunity to start.
  • It’s also as expected that while most stats increase over time, extra-base hits increase more than hits, and home runs even more. Not only are players naturally getting bigger and stronger, but many factors increase their ability and motivation to get bigger. Better nutrition advice is available, high-tech college gyms are at hand, and coaches and peer groups expect players to take advantage of them.
  • I expected that we’d see more improvement in strikeout rate. As it is, strikeout rate barely changes. It’s possible that, as players get stronger and learn to swing for the fences, their batting eye improves but they accept a higher number of swing-and-misses. Perhaps the two effects roughly cancel out.
  • $SB refers to stolen bases per time on base. Thus, since the average player gets on base more often as he ages, he steals even more than these multipliers suggest. Compared to the increase in SB success rate, it’s clear that players run a lot more as they get older. This may have more to do with coaching (and a coach’s confidence in his players) than pure skill.

Elite conferences

But wait—maybe we can do better. As I noted above, we can make some assumptions about a player simply based on whether he goes pro or goes to college. They might not hold in all instances, but for a project like development patterns, they may be very pertinent.

What if the same is true for different segments of Division One? Certainly it seems wrong to treat heavily recruited hitters who end up at LSU the same way we treat the guys who end up starting at Iona or Alcorn State. Intuitively, the same sort of difference exists between high-profile programs and other programs that exists between 19-22-year-olds in college and 19-22-year-olds in the minors.

Most of us care more about those higher-profile players, so let’s focus on them. Using the conference strength data I shared about a year ago, I’ve arbitrarily divided D-1 into “elite conferences” (the 82 teams in the ACC, SEC, Pac-10, WCC, Big 12, Big East, Big West and Colonial USA) and everybody else. It’s not a perfect distinction, but for projects that are explicitly geared toward identifying or analyzing draft-worthy talent, focusing on “elite” conferences seems more appropriate.

With that in mind, let’s look at the same table, only for players in these elite conferences:

Pair      Players    $H  $D+T   $HR   $BB    $K   $SB  $SB/CS  
Fr -> So      133  1.02  1.30  1.68  1.18  1.00  1.49    1.10  
So -> Jr      221  1.01  1.15  1.62  1.18  1.03  1.25    1.06  
Jr -> Sr      188  1.01  1.10  1.57  1.11  1.02  1.35    1.12

With just a few exceptions, there’s less improvement from year to year for players in elite conferences than for the average D-1 player. This makes sense. In general, the more impressive the player in prime recruiting years, the less room for improvement. The typical stud hitter who heads to Rice or Florida State has already received years of high-quality coaching, while someone at a second-tier school may be hearing helpful tips for the first time.

Perhaps most marked in this second table is the difference between Soph/Jr and Jr/Sr improvements. Keep in mind that the amateur draft has a lot to say about which juniors stick around for a senior year. Hundreds of players are plucked from the college ranks each year, a disproportionate number of them juniors from elite conferences. Thus, there are fewer pro-level prospects in the Jr/Sr pool than in the Soph/Jr pool.

A Hardball Times Update
Goodbye for now.

Illustration: Zach Cox

The numbers we’ve seen so far are awfully abstract. Let’s look at a few illustrations to get a firmer grasp of what these multipliers mean.

Let’s start with a 2010 sophomore. Zach Cox of Arkansas State has only one year of college experience, but he’s draft-eligible this year, making his spring campaign particularly closely watched. The Red Wolves play in the Sun Belt conference, outside of my “elite” group. We could make an argument that a premium talent like Cox should be treated differently, but for today, let’s use the multipliers for all of Division One.

Here are Cox’s numbers from 2009, along with “projected” 2010 numbers, using only the raw ’09 stats and the generic multipliers:

Year   AB   H  2B  3B  HR  BB  SO    AVG    OBP    SLG  
2009  199  53  15   2  13  20  65  0.266  0.345  0.558  
2010  199  56  19   3  22  24  64  0.281  0.370  0.739

Before we get too excited, this approach is not a projection, it’s just an illustration. If we were to boldly claim that we believe Cox will post numbers like this, we’d be assuming not only that he has something like an “average” profile for a college sophomore, but also that his 2009 numbers represented his actual talent level, uninfluenced by too much luck.

Again, it’s worth noting the tiny observed change from year to year in strikeout rate. The knock on Cox is just that. As you can see from his ’09 stats, he K’d about one in three trips to the plate. If his freshman-to-sophomore transition works in the generic manner, he’ll whiff just as often this year. Of course, Cox isn’t generic, so we’ll have to wait and see if he can improve his contact rate.

Illustration: Christian Colon

Let’s move on to a junior. Fullerton shortstop Christian Colon is perhaps the most highly-touted junior for this year’s draft, making him worthy of our attention.

In this case, I can show you two years of his stats, again along with what his 2010 numbers would look like if he ages like the generic college junior.

Year   AB   H  2B  3B  HR  BB  SO  SB  CS    AVG    OBP    SLG  
2008  243  80  12   2   4  19  25  13   4  0.329  0.406  0.444  
2009  255  91  16   2   8  24  24  15   7  0.357  0.442  0.529  
2010  255  92  18   2  13  28  25  20   8  0.361  0.452  0.600

The typical Soph/Jr improvement, especially for a player in an elite conference, isn’t as striking as the Fr/Soph jump, so this doesn’t exactly show Colon becoming the next Matt LaPorta. But if he follows the generic path, his already superlative OBP will climb even higher, and his stolen base rate will creep past 70 percent.

Illustration: Blake Dean

To round out the set, let’s look at one more. Blake Dean, first baseman at Louisiana State, was selected in the 10th round by the Twins last year, but opted to return to school. Given the plateau his numbers hit from his sophomore to junior year, it’s understandable for him to think he could put together a big senior campaign and do better in his second try at the draft.

Here are Dean’s college stats, along with a 2010 line generated from his ’09 performance and the typical Jr/Sr improvement observed in elite conferences:

Year   AB   H  2B  3B  HR  BB  SO    AVG    OBP    SLG  
2007  206  65  12   3   7  20  25  0.316  0.366  0.505  
2008  269  95  18   3  20  35  46  0.353  0.432  0.665  
2009  259  85  18   0  17  50  37  0.328  0.432  0.595  
2010  259  86  20   0  27  56  38  0.332  0.445  0.722

Once again, this isn’t a projection. But in this case, given a sophomore season in which Dean might have outperformed his skill level and a junior year when he didn’t, the numbers seem plausible.

There are a lot of directions to go from here. “Elite” conferences aren’t the only way we can break down aging patterns. In the tried-and-true path of projectors throughout history, we could look at aging patterns by position, by body type, or any number of other variables.

Combined with appropriate regression to the mean (and maybe some summer league stats thrown in for good measure), we might just have ourselves the beginning of a projection system for college players.

newest oldest most voted
eric errickson
eric errickson

Neat idea and nice work.  However you are missing an important part of the data as most of these players particulary those who are elite also play ball in the summer at a variety of programs. Its seems to me if you could follow all of a young player’s work the data would be better.

Jeff Sackmann
Jeff Sackmann

Well, this is embarrassing—Zack Cox plays for Arkansas, not Arkansas State.  Using the “elite” multipliers don’t work out that differently, though.

Jeff Sackmann
Jeff Sackmann

Using summer league data would be nice, but it is really, really complicated.

First, there’s the wood-bat translation aspect.

Second, there are about a dozen summer leagues that elite players go to, and the difficulty level is different for each one.  In order to incorporate that data and have any faith in the results, we’d need to be confident in our wood-bat translations, AND confident in our knowledge of the level of competition in each of those leagues relative to D1 as a whole.

Mike Rogers
Mike Rogers

Jeff, do you mean Conference USA when you say “Colonial USA” or the Colonial Athletic Conference? I’m assuming Conference USA. If not, I need to add the CAA to my college hitters spreadsheet.

Mike Rogers
Mike Rogers

Also, Dean took a step back last year, but I still think he was overall very, very close to his sophomore year due to an increased BB% of about 4% and a decrease in his K% by about 3%. I actually have his wRC+ (the way fangraphs calculates it) as being 125 his sophomore year and 119 his junior year.

Adam Foster
Adam Foster

Hey Jeff,

It seems like this study would be stronger if you used plate appearances over at-bats, especially given that you’re weighting playing time.


Jeff Sackmann
Jeff Sackmann

Mike – yes, Conference USA.

Adam – I ran it a few different ways, and the difference in results is tiny.

Adam Foster
Adam Foster

Why did you decide to go with at-bats over plate appearances Jeff?