KATOH: Forecasting Major League Hitting with Minor League Stats
During the summer I crunched some numbers in the FanGraphs Community section aiming to figure out how a minor league hitter’s age and minor league stats can predict his future in the major leagues. I named my methodology KATOH after Yankees prospect Gosuke Katoh, who spent all of the 2014 season playing second base for the Class-A Charleston RiverDogs of the South Atlantic League. At the time, Katoh was running a strikeout rate close to 40 percent, but at just 19 years old, he was two or three years younger than most of his competition.
Katoh’s situation caused me to realize that I had no idea what was truly important in evaluating the performance of a player like him. Should I be worried that he’s striking out four out of every10 times? Does his respectable 12 percent walk rate make his strikeouts less a problem? Should I ignore the stats altogether and just give him the benefit of the doubt for even playing in full-season ball as a teenager? I hadn’t the foggiest.
To get a better sense of what mattered most, I turned to the reams of minor league data available through Baseball-Reference. Using these data, I ran some probit regressions, which tell us how a variety of inputs can predict the likelihood of an event that has two possible outcomes. For example, it might give the probability of a prospect’s making it to the majors based on his age and league-adjusted strikeout percentage, walk percentage, isolated slugging, batting average on balls in play, and frequency of stolen base attempts.
For this first iteration of KATOH, I looked exclusively at minor league hitters and estimated the probability they would play in the majors. This time around, I’ve expanded the system to slap a probability on a wider variety of outcomes–namely WAR thresholds–that a player might achieve through age 28. The thresholds I chose were loosely based on the 20-80 scale for overall value laid out by Kiley McDaniel.
Career Value Through Age 28
WAR Threshold | Future Value |
At least one game in the majors | 35 |
>4 WAR | 40 |
>6 WAR | 45 |
>8 WAR | 50 |
>10 WAR | 55 |
>12 WAR | 60 |
>16 WAR | 65 |
I didn’t go any higher than 16 WAR, as the samples of players who met any higher thresholds were generally too small to do much with. To keep things relatively simple, and to allow for apples-to-apples comparisons across different WAR thresholds, I stuck with the same group of variables within each minor league level. So for each level, I included as many variables as I could while keeping the models statistically significant across most performance thresholds. I also used these probabilities to estimate a player’s expected WAR total through age 28.
The table below gives a summary of which stats proved to be significant at each minor league level. This analysis includes minor league data going back to 1990, the first year in which full-season A-ball was broken up into Class-A (A) and Class-A Advanced (A+). R+ refers to the advanced rookie leagues–the Appalachian and Pioneer Leagues, while R- includes the Arizona and Gulf Coast Leagues.
Significant Statistics by Level
Level | Age | BB% | K% | ISO | BABIP | SB% | Age2 | K%2 |
AAA | Yes | Yes | Yes | Yes | Yes | Yes | Yes | |
AA | Yes | Yes | Yes | Yes | Yes | Yes | ||
A+ | Yes | Yes | Yes | Yes | ||||
A | Yes | Yes | Yes | Yes | Yes | |||
A- | Yes | Yes | Yes | Yes | Yes | |||
R+ | Yes | Yes | Yes | Yes | Yes | Yes | ||
R- | Yes | Yes | Yes |
*SB% = (SB+CS) / (Singles + Walks + HBP)
I think the biggest takeaway here is that walk rate doesn’t matter very much at the lower levels of the minors. In fact, it’s not predictive at all for players in rookie ball or Low-A. And even as high as Double-A, a one percent change in strikeout rate affects a player’s projection by about 1.5 times as much as a one percent change in walk rate.
Intuitively, this makes sense. A hitter doesn’t need to be particularly good at hitting to run a high walk rate in the low minors, as pitchers at these levels often have little idea where the ball’s going. As a result, batters can get away with taking an ultra-passive approach in the hopes they’ll see four balls before they see three strikes. That strategy might work in Rookie ball or A-ball, but it can lose its effectiveness in the upper levels, where pitchers have a better handle on their control.
To be clear, this isn’t to say that a high-walk prospect is no more likely to make it than a low-walk prospect. Walk rate generally correlates with future success, but only because it’s collinear with ISO. Simply put, players who hit for power also tend to walk a lot, but it’s the power–rather than the walks–that predicts big league performance.
Trying to predict what any prospect will do in the majors is a fool’s errand, but it gets even more foolish with every step you take down the minor league ladder. A hitter who’s in Double- or Triple-A is facing somewhat advanced pitching, so his performance can give us at least some sense of how he’ll fare against big league pitching. For a players in rookie ball, however, the stats tells us very little. Most of these players are teenagers, the vast majority of whom will never even sniff the majors, and the few that will make it are still a good four or five years away.
As a result, KATOH is somewhat wishy-washy on players in the low minors. Even in just deciding whether a player will crack the majors, most of the projections fall somewhere between between one percent and 15 percent.
Unsurprisingly, KATOH isn’t great at predicting the successes of these low minors players. Considering all hitters with a KATOH projection since 1990, the table below shows the average residual–the difference between KATOH’s prediction and what actually happened (either zero or 100 percent)–divided by KATOH’s average prediction at each level. The greener the box, the better job KATOH did of guessing right. The system does pretty well with hitters in the high minors but has a really tough time with guys not yet in full-season ball, especially when it tries to do more than predict if a player will make it to the majors.
By no means should a methodology like this replace the scouting aspect of prospect evaluation. As Dayn Perry famously wrote way back in 2003, choosing between stats and scouting is like choosing between beer and tacos. In other words, its a choice nobody should have to make, since both are great and are perhaps even better when consumed together.
We all know there’s more to a player’s potential than his stat line, especially for minor leaguers; in many cases, a good scouting report can be worth a dog’s age of statistical regressions. KATOH has no idea if a hitter possesses traits like bat speed, a loose swing, or a feel for the bat head if these skills aren’t translating into on-field performance, which makes it prone to being low on toolsy players who are still learning the intricacies of hitting. What KATOH does do, however, is tackle prospect evaluation from a 100 percent objective point of view, which I think can be useful in identifying statistical factors that may have been overlooked by traditional prospect evaluations.
Consider Yankees prospect Aaron Judge as an example. Judge, a 6-foot-7 outfielder, turned in an impressive .308/.419/.486 campaign between Low- and High-A last year, which vaulted him onto the prospect radar and earned him a spot on Keith Law’s midseason top 50 prospect list. KATOH isn’t sold. The system pegged Judge’s odds of playing in the majors at just 54 percent and gave him a measly 13 percent chance of accumulating more than four WAR through his age 28 season.
Regardless of what KATOH says, I don’t actually think Judge has just a 50/50 shot of cracking the big leagues over the next six years. Professional scouts, who actually have watched him play, think he has the tools to be a middle-of-the-order power threat, and I don’t doubt they know what they’re doing. I do, however, think KATOH might be on to something and that some evaluators may be putting a little too much stock in Judge’s seemingly impressive A-ball numbers.
For one thing, KATOH dings Judge for being a 22-year-old in A-ball, which makes him older than much of his competition. Believe it or not, he’s also older than Bryce Harper and Manny Machado, who already are starting to feel like established big leaguers at this point. But even after accounting for his age, a .905 OPS is nothing to sneeze at.
But it’s how Judge arrived at that .905 OPS that’s reason for concern. He hit for a solid .308 average with modest amount of power, but a big chunk of his value came from his impressive 16 percent walk rate. Walks are cool and all, but as I showed earlier, the data suggest that walk rates mean next to nothing for hitters playing in the lower minors. Take away Judge’s walks, and his stat line suddenly looks like that of a nondescript minor leaguer with something of a strikeout problem.
The biggest flaw with KATOH is that it doesn’t consider defense. If an elite defensive shortstop and a lumbering first baseman had the same batting line, they would receive the same probabilities, which obviously doesn’t seem right. In an attempt to close this gap a bit, I developed some rules of thumb to apply to players at each position.
First, I took all hitters who played in the majors and had a KATOH projection from 1990 or later and assigned them to the position at which they played the most innings through age 28. From there, I looked to see how players from each position performed against their projections. Considering only major leaguers created some selection bias: A player who has made it to the majors is more likely than a randomly selected player to surpass any WAR threshold. So to tease out this factor, I adjusted the data in the table below to refer to each position’s performance relative to the average major leaguer.
Unsurprisingly, KATOH tends to underrate players who man premium defensive positions–like catcher, center field, and shortstop–whose offensive abilities may not be the most valuable part of their game. Keep in mind that this table refers to a player’s position in the major leagues, and not his current position. For example, a prospect playing shortstop in Rookie ball may be a second or third baseman long-term, so applying the shortstop adjustment to him may not be appropriate. This also doesn’t account for a player’s defensive ability compared to others at his respective position, so feel free to hedge up or down as you see fit.
Enough talk. Let’s apply all these models to current prospects and their 2014 stats. Without further ado, here are the players whose 2014 seasons give them the highest expected WAR through age 28 (minimum 200 plate apperances).
Significant Statistics by Level
Player | Age | Org | ’14 Level | MLB | >4 WAR | >6 WAR | >8 WAR | >10 WAR | >12 WAR | >16 WAR | WAR thru age 28 |
Mookie Betts | 21 | Red Sox | AA/AAA | 100% | 93% | 89% | 87% | 83% | 83% | 83% | 21.6 |
Joc Pederson | 22 | Dodgers | AAA | 100% | 81% | 81% | 79% | 74% | 66% | 65% | 18.3 |
Jose Ramirez | 21 | Indians | AAA | 100% | 83% | 74% | 66% | 58% | 58% | 55% | 16.3 |
Kris Bryant | 22 | Cubs | AA/AAA | 99% | 74% | 74% | 72% | 65% | 57% | 53% | 16.0 |
Jorge Soler | 22 | Cubs | R-/AA/AAA | 88% | 72% | 72% | 70% | 66% | 62% | 51% | 15.6 |
Gregory Polanco | 22 | Pirates | AAA | 99% | 72% | 66% | 61% | 49% | 49% | 49% | 14.6 |
Addison Russell | 20 | Athletics | A+/AA | 94% | 73% | 64% | 56% | 50% | 48% | 38% | 13.1 |
Arismendy Alcantara | 22 | Cubs | AAA | 99% | 60% | 56% | 49% | 40% | 40% | 40% | 12.3 |
Jon Singleton | 22 | Astros | AAA | 98% | 66% | 66% | 62% | 56% | 45% | 25% | 11.7 |
Joey Gallo | 20 | Rangers | A+/AA | 92% | 54% | 54% | 53% | 48% | 43% | 29% | 11.0 |
Alex Verdugo | 18 | Dodgers | R-/R+ | 82% | 55% | 44% | 41% | 41% | 40% | 34% | 10.8 |
Ozhaino Albies | 17 | Braves | R-/R+ | 80% | 53% | 52% | 44% | 37% | 36% | 34% | 10.6 |
Gleyber Torres | 17 | Cubs | R-/A- | 72% | 45% | 40% | 40% | 40% | 39% | 38% | 10.5 |
Willy Adames | 18 | Rays | A | 90% | 56% | 51% | 47% | 40% | 40% | 26% | 10.3 |
Dilson Herrera | 20 | Mets | A+/AA | 75% | 46% | 42% | 40% | 36% | 35% | 31% | 9.7 |
Wendell Rijo | 18 | Red Sox | A | 87% | 53% | 46% | 42% | 37% | 36% | 24% | 9.5 |
Ryan Mcmahon | 19 | Rockies | A | 87% | 50% | 45% | 42% | 36% | 36% | 23% | 9.3 |
Marcus Semien | 23 | A’s | AAA | 97% | 59% | 57% | 50% | 41% | 32% | 14% | 9.2 |
Alex Palma | 18 | Yankees | R- | 78% | 47% | 37% | 34% | 34% | 33% | 28% | 9.1 |
Nomar Mazara | 19 | Rangers | A/AA | 84% | 51% | 44% | 40% | 36% | 34% | 23% | 9.1 |
I’m currently working on a similar analysis for pitchers, which I anticipate will be more of a challenge. Any pitcher could go down with a career-altering injury at any time, which makes pitching prospects much more volatile than those of the hitting variety. As the saying goes: “There’s no such thing as a pitching prospect.” Throw in that there are no historical velocity data on minor leaguers, and creating statistical projections for pitching prospects almost feels like a waste of time. Almost.
Come what may, I’m going to take my best stab at it, so keep an eye out for that article in the next month or two. If you’re interested, I’ve also created a Google spreadsheet that includes projections for every player who logged at least one plate appearance in the minor leagues last year. Just keep in mind that the projections don’t mean much for players who have only a few plate appearances. Take these with a grain of salt the same way you would a player’s batting average or wOBA through only a handful of games. And if you’re really bored, I also made a separate spreadsheet containing calculated projections for historical seasons.
Chris – excellent analysis. I’ve been looking forward to this since your FanGraphs community page articles, and I’m really looking forward to checking out your spreadsheets when I get home tonight.
Is there any difference in how well KATOH performs for guys drafted out of college vs. drafted out of high school vs. international FA signees?
This might be part of why KATOH gave a low rating on Aaron Judge – even though he was 22 years old, it was just his first year since being drafted out of college.
I’ve always wondered about guys numbers vs top college competition (ACC, SEC, etc) as compared to say the Sally League. Are these guys not facing tougher competition until they reach Double-A?
Historically it’s been a mixed bag, partly because of the impact of switching to wooden bats in the pros. Some college sluggers had a much worse quality of contact even in the low minors, while others made the transition without losing much statistically.
I’d have to guess that the average level of competition at even the lowest minor-league level is higher than the best of college ball. For every Carlos Rodon anchoring a college pitching staff, there are several guys who won’t even get a sniff of an independent-league tryout. I do think there’s some value to be gleaned from college stats, but they may have less correlation to MLB success than the qualitative items that a scout would use.
Good point on Judge. KATOH might be a little hard on him considering its his first year since being drafted and he’s still adjusting to wooden bats. At the same time though, I’d argue that he was playing against some decent competition in college, and still wasn’t overly dominant (aside from the walks) in the low minors.
I agree with TZ’s comments about wooden bats and college stats.
I would also add that success in the Cape Cod Summer League do not seem to translate, a rare (only?) place where the hitters use wooden bats, as the Giants famously love to draft hitters who do well there, but the vast majority of them has fizzled out for one reason or another.
And college stats, yeah, huge difference, I remember the Giants drafted this catcher who hit .300 with some HR power, but all the rating services didn’t think much of him, and lo and behold, he struggled offensively in the minors and I don’t think he’s even made it up to the majors for a cup yet. And Gary Brown OUTHIT Evan Longoria in the same college conference (later though) and yet has struggled mightily in the upper minors. If that’s not a sign that college stats don’t matter as much as scouting for skills…
And even worse for high school stats, long ago, early in my understanding of amateurs, the Giants drafted his slick fielding HS SS, and he hit a whole bunch of homers, so I was salivating. He ended up being a slick fielding SS with not much power (Royce Clayton, if I remember right).
And great article, Chris, really enjoyed it. And thanks for sharing the spreadsheet.
Thanks, tz! You’ve been a loyal commenter since the beginning, and I appreciate your input.
Really enjoyed the depth of this article. Excellent work. Look forward to more.
Chris, great analysis. I’m working on something very similar to this (with some major distinctions) for my Honors Capstone at Syracuse University, and would love to pick your brain with regards to what you wrote.
Couple questions:
1) What are K2% and Age2?
2) Would minor league stats such as WRC+ be useful at all for this kind of analysis? Seems like if you’re using a stat that compares how a hitter is performing relative to other hitters at that same level that should have a fairly potent predictive power.
This is a good point. Did you adjust for league and such? Playing in the PCL is different than the International League, same with the Florida State League versus the CAL League.
1) Those are (K)^2 and (Age)^2.
2) All of the stats I used are all league-adjusted (but not park adjusted). Sorry, I should have mentioned that.
Great stuff!
Interesting read. A few comments/questions.
First, I agree with Carson above, you should attempt to factor in league/park differentials, or at least use something like wRC+.
Second, you say BB% has little significance, but I notice K% is almost always significant. How does BB/K ratio impact future predictions of these players?
Finally, as a prospect hound, I like to look at ISO*(BB/K) for prospects. Just curious if this is a worthwhile thing to look at, based on your data?
I did adjust all of the stats for league (but not for park). I did try to include a variable on the interaction between K% and BB% (ie. K%/BB%), but it didn’t add anything beyond just strikeout and walk numbers. In other words, strikeouts are bad and walks are good, but there’s no evidence that the ratio between them matters. ISO*(BB/K) seems like a decent thing to look at for players in the upper-levels of the minors, but not so much for players in the lower-levels, where BB% doesn’t tell us very much about a player’s future success.
Chris, I was wondering if you could include a variable when the hitter has a BB/K ratio of over, say, 1.00? (or is there a way to do fuzzy logic with that in center?)
A study I’ve seen noted that ratio as the tipping point to being a good/great hitter. I know these can be time consuming to do, so no worries if you can’t easily do it, but perhaps you can consider it the next time you tackle something like this.
In any case, looking at player’s numbers as often as I do, that does seem to be around the point where hitters take a jump, from bad to average, average to good, good to great. As you note, it all seems pretty random other than strikes bad and walks good, but something about that threshold does seem to matter, just anecdotally, it seems to me.
Chris,
Just curious how you defined a player “making it to the majors”? One AB, one game, full season?
Playing in one game counts as “making it to the majors”.
Wow …. the Cubs have 4 of the top 8.
Actually, the Cubs have 5. Addison Russell is erroneously listed as being in the A’s organization, but he was traded to the Cubs last year for Samardzija.
Oops. Sorry, you said 4 of the top 8. You are correct.
Wonderful!
If readily available, I would definitely be interested in seeing at least one prior year’s KATOH projections (maybe 2003, as almost all players will have reached age 28).
Were these based on analyzing just one season at a time? I would much prefer a weighted mean of the last several season (typically three, although the exact amount can vary by stat).
Once you’ve established a player’s true talent level, along with his age and which level he competed at, then I believe your process could be valuable in establishing the odds that a player will reach a certain level in the future.
Great stuff, Chris. As far as analysis goes, this is far from the haphazard, buffet-style analysis stuff I tend to read.
I wrote a paper last term (I’m a M.S. Economics candidate) using an ordered probit to predict minor league players’ chances of future success based on the stats accumulated, and accolades earned, of established major leaguers past and present. But rather than using WAR thresholds, I used nominal thresholds: MVP, Silver Slugger, All-Star, regular starter (at least one full season [502 PA] during his career), etc. It was a little simplistic but produced generally similar results. I’d love to chat if you’re interested — I would greatly appreciate feedback — but no big deal if not.
(I assume you are able to see the email address I’m required to provide.)
Hi Alex,
I’d be glad to chat. Don’t think I can see you email, but feel free to hit me up on twitter or to drop me a line at mitchell dot chris99 at gmail dot com.
Brett Lawrie and a few others have less WAR projected than what they’ve already accumulated? 😮
I am an unsabermetric guy who enjoys the different and sometimes entirely correct perspective which only sabermetrics provide. To early to judge your system, which in any case will evolve, but by my eye Betts is by far the best combination of talent and application listed in your review.
Wow – very interesting article! Curious though about the implications…does this mean that Gleyber Torres that MLBPipeline had as the 14th best Cubs prospect, should be rated higher, and as you had him, the 4th best position player prospect ahead of: Almora, Schwarber, McKinney, etc?
Just a thought on your Dependent Variable. I’ve been playing with something similar to this, my results have been very similar to yours, but what if instead of using MLB WAR as a DV you used only MLB offensive WAR? That takes out a little bit of the defense-based noise, although obviously defense still matters for probability of making the majors and the amount of playing time a player is given, so you can’t get rid of that entirely, but it seems like you’re adding in a little extra by using total WAR.
Yep. Bringing in WAR from baserunning and defense only confuses the analysis.
I’m really excited by this system, Chris. Putting hard numbers on prospect evaluations is a hell of an ambition.
I see that Miguel Sano didn’t get featured in your spreadsheet, since he didn’t play a game in 2014. I’m guessing he’d have a favorable KATOH, with his massive ISO and reaching AA at age 20, but then there’s also the prodigious strikeout rate and lack of speed. Would you be able to say what his projections are based on his 2013 numbers?
Excellent stuff! I always love to read statistical analysis that tries to place a value on previously uncharted territories.
I do have a question: Is K% not being statistically significant at the A+ level merely noise in the analysis? I would think if it were significant at the level above and the level below it would be significant at that level, but maybe I’m missing something?
Thanks again.
Curious as to why Soler came out with an 88% chance of making MLB, easily the lowest of the top ten guys here.
Love the work.
Curious as to what led Soler to come out with an 88% chance of making MLB, easily the lowest of the top ten guys here. For the other levels (and total projected WAR) he doesn’t deviate from Bryant by more than 2% except at >12 yet Bryant is 11% more likely to make MLB. Is it a function of Soler having far fewer at-bats in American professional baseball than most of the players analyzed?
Theoretical question of course, as he has already played in MLB.
Great article. I can’t wait to read more of your work. I’ve always wondered the correlation of minor league stats to major league success.
One question: What level of correlation is required for a certain stat to be deemed “significant” by your standards? (referring to table 2)
Interesting reading. I pulled the data and meandered through for the Braves prospects. After eliminating those who are already there (Bethancourt for example) Jose Peraza’s MLB was more than Albies (96% to 80%) yet Albies was projected about 1.3 WAR greater through 28. I’m guessing that’s because of age but at the same time he played in Rookie ball while Peraza split his season between A and AA.
It also looks like the projection that Peraza’s WAR totals drops sharply after six years keeps him off your list even though he projects to have a higher total through age 28. Is that about right?
KATOH thinks Peraza is more likely to play in the majors, but thinks Albies is more likely to be a star. In other words, Peraza’s more of a low-ceiling/high-floor guy, while Albies is more of a high-ceiling/low-floor guy. Teenagers in the lower levels of the minors tend to fall into the latter category pretty often. I’ll be doing something on Albies for FanGraphs this week, so be on the lookout for that.
Chris,
Just stumbled onto this. Great read. My methodology that I use at Top500Prospects.Com also is not impressed with Aaron Judge. Seems we are both looking at something similar.
I realize I’m somewhat late for the party reading this three months after it was posted, but had a quick question:
Should the team they play for impact their chances to play in the majors and also impact their potential pre-28 WAR?
I can’t imagine any CF prospect for the Angels will see playing time in the majors anytime in the next few years, while other teams who are struggling or are unsure with their starter (or who are prone to injuries) may be more apt to give a prospect a chance.
Or, do high-end prospects that are “blocked” by a star end up moving to another position or get traded to another team anyway? Does the team they play for have much of an impact?
Thanks for the insight…
I am just finding this article as well, but I don’t see position played being a factor. In your CF example, look at the Pirates. They have 3 quality CFer’s playing in their OF. And don’t forget, the Angels have already moved Trout off CF once in favor of the superior defender Bourjos.
From what I understand, KATOH does not factor in defense, only offensive contributions. If a player is going to contribute significantly offensively, I would imagine the club will find a place for him, or trade him in order to fill another hole. Going back to the Pirates, Josh Bell is not going to earn an OF spot, barring injury, so he has begun working at 1B. He should hit his way into their lineup at that spot, but not the OF.