KATOH: Forecasting Major League Hitting with Minor League Stats by Chris Mitchell December 30, 2014 Mookie Betts projects to be quite the productive major leaguer, based on his minor league stats. (via Dennis Heller) During the summer I crunched some numbers in the FanGraphs Community section aiming to figure out how a minor league hitter’s age and minor league stats can predict his future in the major leagues. I named my methodology KATOH after Yankees prospect Gosuke Katoh, who spent all of the 2014 season playing second base for the Class-A Charleston RiverDogs of the South Atlantic League. At the time, Katoh was running a strikeout rate close to 40 percent, but at just 19 years old, he was two or three years younger than most of his competition. Katoh’s situation caused me to realize that I had no idea what was truly important in evaluating the performance of a player like him. Should I be worried that he’s striking out four out of every10 times? Does his respectable 12 percent walk rate make his strikeouts less a problem? Should I ignore the stats altogether and just give him the benefit of the doubt for even playing in full-season ball as a teenager? I hadn’t the foggiest. To get a better sense of what mattered most, I turned to the reams of minor league data available through Baseball-Reference. Using these data, I ran some probit regressions, which tell us how a variety of inputs can predict the likelihood of an event that has two possible outcomes. For example, it might give the probability of a prospect’s making it to the majors based on his age and league-adjusted strikeout percentage, walk percentage, isolated slugging, batting average on balls in play, and frequency of stolen base attempts. For this first iteration of KATOH, I looked exclusively at minor league hitters and estimated the probability they would play in the majors. This time around, I’ve expanded the system to slap a probability on a wider variety of outcomes–namely WAR thresholds–that a player might achieve through age 28. The thresholds I chose were loosely based on the 20-80 scale for overall value laid out by Kiley McDaniel. Career Value Through Age 28 WAR Threshold Future Value At least one game in the majors 35 >4 WAR 40 >6 WAR 45 >8 WAR 50 >10 WAR 55 >12 WAR 60 >16 WAR 65 I didn’t go any higher than 16 WAR, as the samples of players who met any higher thresholds were generally too small to do much with. To keep things relatively simple, and to allow for apples-to-apples comparisons across different WAR thresholds, I stuck with the same group of variables within each minor league level. So for each level, I included as many variables as I could while keeping the models statistically significant across most performance thresholds. I also used these probabilities to estimate a player’s expected WAR total through age 28. The table below gives a summary of which stats proved to be significant at each minor league level. This analysis includes minor league data going back to 1990, the first year in which full-season A-ball was broken up into Class-A (A) and Class-A Advanced (A+). R+ refers to the advanced rookie leagues–the Appalachian and Pioneer Leagues, while R- includes the Arizona and Gulf Coast Leagues. Significant Statistics by Level Level Age BB% K% ISO BABIP SB% Age2 K%2 AAA Yes Yes Yes Yes Yes Yes Yes AA Yes Yes Yes Yes Yes Yes A+ Yes Yes Yes Yes A Yes Yes Yes Yes Yes A- Yes Yes Yes Yes Yes R+ Yes Yes Yes Yes Yes Yes R- Yes Yes Yes *SB% = (SB+CS) / (Singles + Walks + HBP) I think the biggest takeaway here is that walk rate doesn’t matter very much at the lower levels of the minors. In fact, it’s not predictive at all for players in rookie ball or Low-A. And even as high as Double-A, a one percent change in strikeout rate affects a player’s projection by about 1.5 times as much as a one percent change in walk rate. Intuitively, this makes sense. A hitter doesn’t need to be particularly good at hitting to run a high walk rate in the low minors, as pitchers at these levels often have little idea where the ball’s going. As a result, batters can get away with taking an ultra-passive approach in the hopes they’ll see four balls before they see three strikes. That strategy might work in Rookie ball or A-ball, but it can lose its effectiveness in the upper levels, where pitchers have a better handle on their control. To be clear, this isn’t to say that a high-walk prospect is no more likely to make it than a low-walk prospect. Walk rate generally correlates with future success, but only because it’s collinear with ISO. Simply put, players who hit for power also tend to walk a lot, but it’s the power–rather than the walks–that predicts big league performance. Trying to predict what any prospect will do in the majors is a fool’s errand, but it gets even more foolish with every step you take down the minor league ladder. A hitter who’s in Double- or Triple-A is facing somewhat advanced pitching, so his performance can give us at least some sense of how he’ll fare against big league pitching. For a players in rookie ball, however, the stats tells us very little. Most of these players are teenagers, the vast majority of whom will never even sniff the majors, and the few that will make it are still a good four or five years away. As a result, KATOH is somewhat wishy-washy on players in the low minors. Even in just deciding whether a player will crack the majors, most of the projections fall somewhere between between one percent and 15 percent. Unsurprisingly, KATOH isn’t great at predicting the successes of these low minors players. Considering all hitters with a KATOH projection since 1990, the table below shows the average residual–the difference between KATOH’s prediction and what actually happened (either zero or 100 percent)–divided by KATOH’s average prediction at each level. The greener the box, the better job KATOH did of guessing right. The system does pretty well with hitters in the high minors but has a really tough time with guys not yet in full-season ball, especially when it tries to do more than predict if a player will make it to the majors. By no means should a methodology like this replace the scouting aspect of prospect evaluation. As Dayn Perry famously wrote way back in 2003, choosing between stats and scouting is like choosing between beer and tacos. In other words, its a choice nobody should have to make, since both are great and are perhaps even better when consumed together. We all know there’s more to a player’s potential than his stat line, especially for minor leaguers; in many cases, a good scouting report can be worth a dog’s age of statistical regressions. KATOH has no idea if a hitter possesses traits like bat speed, a loose swing, or a feel for the bat head if these skills aren’t translating into on-field performance, which makes it prone to being low on toolsy players who are still learning the intricacies of hitting. What KATOH does do, however, is tackle prospect evaluation from a 100 percent objective point of view, which I think can be useful in identifying statistical factors that may have been overlooked by traditional prospect evaluations. Consider Yankees prospect Aaron Judge as an example. Judge, a 6-foot-7 outfielder, turned in an impressive .308/.419/.486 campaign between Low- and High-A last year, which vaulted him onto the prospect radar and earned him a spot on Keith Law’s midseason top 50 prospect list. KATOH isn’t sold. The system pegged Judge’s odds of playing in the majors at just 54 percent and gave him a measly 13 percent chance of accumulating more than four WAR through his age 28 season. Regardless of what KATOH says, I don’t actually think Judge has just a 50/50 shot of cracking the big leagues over the next six years. Professional scouts, who actually have watched him play, think he has the tools to be a middle-of-the-order power threat, and I don’t doubt they know what they’re doing. I do, however, think KATOH might be on to something and that some evaluators may be putting a little too much stock in Judge’s seemingly impressive A-ball numbers. For one thing, KATOH dings Judge for being a 22-year-old in A-ball, which makes him older than much of his competition. Believe it or not, he’s also older than Bryce Harper and Manny Machado, who already are starting to feel like established big leaguers at this point. But even after accounting for his age, a .905 OPS is nothing to sneeze at. But it’s how Judge arrived at that .905 OPS that’s reason for concern. He hit for a solid .308 average with modest amount of power, but a big chunk of his value came from his impressive 16 percent walk rate. Walks are cool and all, but as I showed earlier, the data suggest that walk rates mean next to nothing for hitters playing in the lower minors. Take away Judge’s walks, and his stat line suddenly looks like that of a nondescript minor leaguer with something of a strikeout problem. The biggest flaw with KATOH is that it doesn’t consider defense. If an elite defensive shortstop and a lumbering first baseman had the same batting line, they would receive the same probabilities, which obviously doesn’t seem right. In an attempt to close this gap a bit, I developed some rules of thumb to apply to players at each position. First, I took all hitters who played in the majors and had a KATOH projection from 1990 or later and assigned them to the position at which they played the most innings through age 28. From there, I looked to see how players from each position performed against their projections. Considering only major leaguers created some selection bias: A player who has made it to the majors is more likely than a randomly selected player to surpass any WAR threshold. So to tease out this factor, I adjusted the data in the table below to refer to each position’s performance relative to the average major leaguer. Unsurprisingly, KATOH tends to underrate players who man premium defensive positions–like catcher, center field, and shortstop–whose offensive abilities may not be the most valuable part of their game. Keep in mind that this table refers to a player’s position in the major leagues, and not his current position. For example, a prospect playing shortstop in Rookie ball may be a second or third baseman long-term, so applying the shortstop adjustment to him may not be appropriate. This also doesn’t account for a player’s defensive ability compared to others at his respective position, so feel free to hedge up or down as you see fit. Enough talk. Let’s apply all these models to current prospects and their 2014 stats. Without further ado, here are the players whose 2014 seasons give them the highest expected WAR through age 28 (minimum 200 plate apperances). Significant Statistics by Level Player Age Org ’14 Level MLB >4 WAR >6 WAR >8 WAR >10 WAR >12 WAR >16 WAR WAR thru age 28 Mookie Betts 21 Red Sox AA/AAA 100% 93% 89% 87% 83% 83% 83% 21.6 Joc Pederson 22 Dodgers AAA 100% 81% 81% 79% 74% 66% 65% 18.3 Jose Ramirez 21 Indians AAA 100% 83% 74% 66% 58% 58% 55% 16.3 Kris Bryant 22 Cubs AA/AAA 99% 74% 74% 72% 65% 57% 53% 16.0 Jorge Soler 22 Cubs R-/AA/AAA 88% 72% 72% 70% 66% 62% 51% 15.6 Gregory Polanco 22 Pirates AAA 99% 72% 66% 61% 49% 49% 49% 14.6 Addison Russell 20 Athletics A+/AA 94% 73% 64% 56% 50% 48% 38% 13.1 Arismendy Alcantara 22 Cubs AAA 99% 60% 56% 49% 40% 40% 40% 12.3 Jon Singleton 22 Astros AAA 98% 66% 66% 62% 56% 45% 25% 11.7 Joey Gallo 20 Rangers A+/AA 92% 54% 54% 53% 48% 43% 29% 11.0 Alex Verdugo 18 Dodgers R-/R+ 82% 55% 44% 41% 41% 40% 34% 10.8 Ozhaino Albies 17 Braves R-/R+ 80% 53% 52% 44% 37% 36% 34% 10.6 Gleyber Torres 17 Cubs R-/A- 72% 45% 40% 40% 40% 39% 38% 10.5 Willy Adames 18 Rays A 90% 56% 51% 47% 40% 40% 26% 10.3 Dilson Herrera 20 Mets A+/AA 75% 46% 42% 40% 36% 35% 31% 9.7 Wendell Rijo 18 Red Sox A 87% 53% 46% 42% 37% 36% 24% 9.5 Ryan Mcmahon 19 Rockies A 87% 50% 45% 42% 36% 36% 23% 9.3 Marcus Semien 23 A’s AAA 97% 59% 57% 50% 41% 32% 14% 9.2 Alex Palma 18 Yankees R- 78% 47% 37% 34% 34% 33% 28% 9.1 Nomar Mazara 19 Rangers A/AA 84% 51% 44% 40% 36% 34% 23% 9.1 I’m currently working on a similar analysis for pitchers, which I anticipate will be more of a challenge. Any pitcher could go down with a career-altering injury at any time, which makes pitching prospects much more volatile than those of the hitting variety. As the saying goes: “There’s no such thing as a pitching prospect.” Throw in that there are no historical velocity data on minor leaguers, and creating statistical projections for pitching prospects almost feels like a waste of time. Almost. Come what may, I’m going to take my best stab at it, so keep an eye out for that article in the next month or two. If you’re interested, I’ve also created a Google spreadsheet that includes projections for every player who logged at least one plate appearance in the minor leagues last year. Just keep in mind that the projections don’t mean much for players who have only a few plate appearances. Take these with a grain of salt the same way you would a player’s batting average or wOBA through only a handful of games. And if you’re really bored, I also made a separate spreadsheet containing calculated projections for historical seasons.