Scouting the Minors Pitch by Pitch: Swinging Strike %

Blake Snell hasn't missed as many bats bats in the majors as he did in the minors. (via Arturo Pardavila III)

Blake Snell hasn’t missed as many bats bats in the majors as he did in the minors. (via Arturo Pardavila III)

Everyone has an irrational love for prospects, with the hope that the next great prospect will somehow fulfill some massive potential, never before seen in our lifetimes. Later, when a select few do reach such lofty heights (Mike Trout/Clayton Kershaw) we soon grow bored and look immediately for the NEXT one.

Along come the stat line scouts, who take a cold, hard look at these prospects and say, no, Gary Sanchez will not be 17 standard deviations better in the majors than at any other professional level! This article will attempt to expand (ever so slightly) our ability to scout these prospects, leveraging minor league pitch-by-pitch data, as recorded by MiLB stringers.

Swinging Strike Rate | SwStr%

Ah, the exalted swinging strike, one of the purest measures of a pitcher’s ability to compete at the major league level. A swinging strike is almost always a good thing and does not involve any variables other than the batter and the pitcher; even the umpire’s impact is limited to decisions on checked swings. Before I throw some colorful graphs at you, let’s do a little housekeeping and look at SwStr% by level going back to 2008:

MINOR LEAGUE SWSTR% – STARTERS
Class 2008 2009 2010 2011 2012 2013 2014 2015 2016
Majors  8.1%  8.1%  8.2%  8.4%  8.8%  8.9%  9.0%  9.4%  9.6%
AAA  9.2%  7.5% 14.9%  8.8%  8.9%  9.5%  9.4%  8.9%  9.3%
AA 10.7% 10.6% 15.9% 12.0%  9.5%  9.8%  9.0%  9.1%  9.7%
A+ 38.2% 40.0% 26.0% 21.7% 19.7% 19.1% 17.9% 11.0%
A 11.2% 21.0% 22.9% 26.9% 23.0% 21.3% 18.9% 17.1% 11.2%
A- 26.8% 25.4% 24.6% 25.0% 24.9% 13.7% 11.7%
R 31.9% 27.1% 26.5% 26.3% 27.1% 27.5% 28.5% 24.4%
SOURCE: MLB Advanced Media
MINOR LEAGUE SWSTR% – RELIEF PITCHERS
Class 2008 2009 2010 2011 2012 2013 2014 2015 2016
Majors  9.4%  9.3%  9.7%  9.8% 10.3% 10.6% 10.7% 11.0% 11.2%
AAA 10.5% 10.9% 23.4% 10.5% 10.5% 11.0% 11.0% 10.8% 11.2%
AA 12.9% 13.0% 15.3% 13.6% 11.4% 11.3% 10.7% 10.4% 11.5%
A+ 31.1% 28.6% 28.0% 24.0% 22.4% 21.9% 20.3% 12.3%
A 15.6% 38.8% 18.5% 29.9% 25.5% 24.1% 21.1% 19.2% 12.6%
A- 26.9% 28.6% 27.5% 28.1% 27.2% 16.1% 13.2%
R 19.7% 32.3% 26.6% 26.2% 26.5% 26.9% 28.1% 24.9%
SOURCE: MLB Advanced Media

I always like to look at data at the aggregate level to judge if it makes sense and jibes with what we’d expect to see. The above table tells me the following:

  1. Triple-A data consistently track major league data back to 2011 for starters and to 2012 for relievers. To me the difference in the data is minor for relief pitchers and consistent with future seasons so that I’m comfortable using Triple-A data going back to 2011 for both starters and relievers.
  2. Double-A data look clean dating back only to 2012, with prior years being too messy to use. I considered smoothing the data, but didn’t want to go in that direction as there were likely large chunks of data that would just be plain wrong.
  3. Single-A data look kosher for 2016 only, which is interesting and likely means that MLBAM put in some more resources there. This will mean that we can’t use any of the granular A ball data yet to project major league data.
  4. Rookie ball classifies every strike as a swinging strike.

AA SwStr% to MLB SwStr% – Starters | R-Squared 0.33

MLB SwStr% = 0.5*AA + 0.033

swstr-aa-to-majors

We see a relatively solid correlation between the two metrics, suggesting there is predictive power in Double-A SwStr rates. The data above are filtered to samples where we have at least 500 pitches thrown at Double-A and 500 pitches thrown in the majors (over total lifetime in both leagues); playing with sample sizes (250, 1,000) yields roughly the same R2 values. I want to note two outliers, which we will revisit shortly: Thoronto Syndergaard and Jacob deGrom.

AA SwStr% to MLB SwStr% – Relievers | R-Squared 0.19

swstr-aa-to-majors-rp

With relievers, we see a much looser link, which doesn’t hold up as well when we lower sample sizes (250 pitch threshold will yield an R2 of only 0.1). This indicates that great SwStr% for a relief pitcher in the minors is not a good signal that he will be a dominant major league reliever.

AAA SwStr% to MLB SwStr% – Starters | R-Squared 0.40

MLB SwStr% = 0.6*AAA + 0.027

swstr-aaa-to-majors-sp

This graph above is filtered to just pitchers who have thrown at least 1,000 pitches at Triple-A and in the majors as a starter. Look at Tampa Bay Rays Snell, Moore, Cobb and Colome all under-performing in the bigs, compared to Thor and DeGrom, who float far above the projected trend line; perhaps it’s the infamous Dan Warthen Slider? The obvious question that arises is whether this is due to issues with the data, or if it is perhaps a signal with respect to how pitchers are being developed with the Mets as opposed to the Rays. If it were a data issue, we would see inflated swinging strikes for non-Rays pitchers in Durham Bulls Athletic Park, as well as deflated swinging strike rates for non-Mets pitchers in Las Vegas.

Away Starter SwStr%, By AAA City

durham

When we look at opposing teams within the Rays and Mets ballparks, we don’t see any significant bias. However, when we look at just the home team pitchers we get a HUGE anomaly for Rays ptichers, indicating there may be measurement bias by the home stringers to record called strikes as swinging strikes.

Home Starter SwStr%, By AAA City

mets-rays-home-team-pitching

We also see Rays pitchers performing better in their home stadium and Mets pitchers performing worse, indicating there may be some inherent measurement bias.

mets-rays-home-team-pitching-2

This may show evidence of over-inflated swstr% for Rays pitchers at home, or it may just be that Rays pitchers are better at home than on the road. However, even if you took away a couple of points from Moore and Snell, they would still be well below their projections and vice versa with Thor and DeGrom. My opinion would be that the Mets pitchers over-performing looks more legitimate as a signal than the Rays under-performing, or more specifically, I don’t think that it’s quite that dramatic for Tampa Bay, but that there is a measure of truth to it.

Pitchers with AA and AAA experience | R-Squared 0.40

MLB SwStr% = 0.43*AAA + 0.29*AA + 0.013

Can we improve the model if we include both Double-A and Triple-A experience for pitchers?  Using the above formula and filtering for pitchers with at least 250 pitches in Double-A and Triple-A, as well as at least 500 major league pitches (starters only) we get the following chart (actual on the left, projected SwStr on the right):

aa-aaa

We see a much tighter correlation, with interesting cases like Burch Smith and Chad Green popping out as outliers. Nick Tropeano looks to be right in line with his projection (a tick or so higher), which might lend some validity to his 3.56 ERA last season.

Is Age a Factor?

From what I can see in the data, age has relatively little impact on projecting SwStr% in major league pitchers; this is a different result than Chris Mitchell’s KATOH, which ascribes a lot more predictive value to WAR to pitchers who are younger than the competition. We’re comparing apples/SwStr% to pears/WAR here, so I’ll defer to Mitchell’s knowledge and understanding of advanced statistics, over the more simplistic xy scatterplot approach I am employing here. Additionally, in discussions with Chris, it is quite likely that my data doesn’t have the long-term horizon to capture the full arc of a pitcher and more likely is constrained by the small 2008-2016 window.

Either way – this is what I see:

age

We see almost no correlation between age at High-A, Double-A or Triple-A with major league SwStr%. My assumption is that if a pitcher being younger is predictive of future success, we would see some growth in his major league numbers from where he was as a youngster. What these data show is that what pitchers do in Triple-A has a LOT more to say about what they’ll do when they first get to the majors than how old they were when they did it. This may make this system more useful for fantasy purposes, as opposed to actual baseball value.

Another way to look at this is if it increases the binary probability of a pitcher making the major leagues. What follows below is a graph of age against number of pitches thrown in the majors. The size of the bubble indicated how many pitches the pitcher has thrown in the majors:

mlb-yesno

We see very negligible correlation between the age and time spent, as well as age and yes/no in the majors, topping out with an R-2 of 0.05 for AAA. This all tells me that while it’s definitely better to be younger, it’s far more important to perform well. Again, I’m not ready to conclude that age isn’t important, due to the limited time frame, but I’m still looking for a way to show a natural correlation. The best I can show, which suggests that this approach is indeed missing the later years of the career arc, is this graph which shows SwStr% by age in the majors:

pitcher-age

We see (with significant sample sizes) a big jump from ages 20 & 21 to older ages, as well as a clear peak between ages 26 to 31 and decline thereafter. This would suggest and support the KATOH position that there is more room for growth if you start earlier and that likely this model is capturing the first steps in the majors, rather than the entire arc.

Pitcher Height and SwStr%

SWSTR% AND PITCHER HEIGHT
RP RP RP SP SP SP
Pitcher Height    AA   AAA Majors    AA   AAA Majors
 5-7 10.5% 10.4% 10.5%  7.7%  8.0%
 5-8  8.6%  5.4%  9.8% 12.4%  9.7% 8.9%
 5-9 11.7% 10.5% 10.9%  9.0%  8.5%
5-10 11.5% 10.9% 12.0% 10.4%  9.9% 8.1%
5-11 11.3% 11.0% 11.0%  9.2%  9.5% 8.6%
 6-0 11.2% 11.0% 10.6%  9.4%  9.2% 9.1%
 6-1 10.9% 11.4% 10.3%  9.5%  9.0% 8.7%
 6-2 11.2% 10.7% 10.8%  9.5%  9.1% 9.0%
 6-3 11.0% 10.7% 10.7%  9.4%  9.3% 9.2%
 6-4 11.1% 10.9% 10.9%  9.6%  9.3% 9.6%
 6-5 10.8% 11.0% 11.4%  8.9%  9.1% 9.1%
 6-6 10.9% 10.9%  9.9%  9.5%  9.4% 9.8%
 6-7 11.4% 10.5% 11.3%  8.7%  8.5% 8.3%
 6-8 11.0%  9.7% 11.9%  9.4%  8.3% 7.0%
 6-9  9.5% 10.0%  9.0% 10.9% 10.8% 9.1%
6-10  9.3%  5.1% 12.1%  9.3%  6.7% 8.4%

I’ve highlighted the two cells that really stand out to me: major league SwStr% for starters below six feet. We see a precipitous decline in SwStr% for pitchers 5-foot-10 and 5-foot-11, which may indicate that short starters can excel at Double-A and Triple-A, but if they’re below the magical threshold of six feet tall, they will struggle at the major league level. Interestingly, there doesn’t appear to be any effect whatsoever for relievers, suggesting that height isn’t so important for relief pitchers: 6-foot-3 to 6-foot-6 appears to be the sweet spot for pitchers.

SP MLB SwStr% – AAA SwStr%

mlb-aaa-sp

In graphical form we see the same story played out, where the clear optimal spot is 6-foot-4 to 6-foot-6 and anything extreme leads to very random results. This is quite likely due to shorter pitchers having a natural survivor bias (if you’re short and part of a major league team it’s because you were great, as opposed to some tall pitchers who stunk but were still at Triple-A) as well as taller pitchers being tough for Triple-A hitters to handle. Sample sizes for pitchers 6-foot-8 and above are very very small. For example, with 6-foot-8 pitchers, we’re talking about Volstad, Glasnow, Kameron Loe and Fister. Fister significantly under-performed compared to his Triple-A numbers; Glasnow/Volstad mostly in line. The conclusion I would draw is more that being shorter than six feet is bad signal for starters, and being 6-foot-4 to 6-foot-6 is a pretty good signal, with extremely tall pitchers carrying a lot more risk (which is pretty intuitive).

RP MLB SwStr% – AAA SwStr%

mlb-aaa-rp

For relievers we get almost the opposite effect, suggesting that one shouldn’t be too worried about a short reliever; alternatively, perhaps a short starter with great numbers in Triple-A should be converted to a relief pitcher in the majors.

Projections!

What good would this article be if I didn’t publish some projections for major league SwStr for pitchers who have limited major league experience, projections that you could use for fantasy baseball next year? For pitchers with only Double-A or Triple-A experience, I used the Double-A/ Triple-A only model; for pitchers with experience at both levels, I used the combined model. I’ve shown stats at both levels, even if the sample size was too small to use.

TOP SWSTR% PROSPECTS
Pitcher Pitcher DOB SwStr (AA) SwStr (AAA) Proj. MLB SwStr%
David Paulino 1994 16.7% 13.8% 12.1%
Danny Hultzen 1989 18.5% 11.3% 11.5%
John Ely 1986 14.5% 11.4%
Alex Torres 1987 14.4% 11.4%
Jose De Leon 1992 14.5% 13.6% 11.3%
Leuris Gomez 1986 14.4% 11.3%
Tyler Webb 1990 14.1% 11.2%
Ariel Pena 1989 12.8% 14.2% 11.1%
Jordan Montgomery 1992 11.7% 14.4% 10.9%
Cesar Valdez 1985 13.5% 10.8%
Alex Reyes 1994 13.0% 13.4% 10.8%
Brock Stewart 1991 13.8% 12.7% 10.7%
Lisalverto Bonilla 1990 12.4% 13.6% 10.7%
Andrew Barbosa 1987 14.7% 13.1% 10.6%
Jose Ramirez 1990 14.3% 11.7% 10.5%
Rob Wooten 1985 13.0% 10.5%
Adrian Salcedo 1991 14.2% 10.4%
John Straka 1990 14.2% 10.4%
Dinelson Lamet 1992 14.2% 14.1% 10.4%
Edwar Cabrera 1987 11.9% 13.0% 10.4%
Chris Withrow 1989 14.1% 10.3%
Daniel Gossett 1992 14.1%  5.2% 10.3%
Brandon Woodruff 1993 14.0% 10.3%
Sean Gleason 1985 13.9% 10.2%
Joan Gregorio 1992 11.8% 12.7% 10.2%
Diego Moreno 1987 12.5% 10.2%
Rett Varner 1988 12.4% 10.2%
Andrew Carraway 1986 18.8%  8.0% 10.2%
Buddy Boshers 1988 13.6% 10.1%
Austin Pruitt 1989 10.4% 13.5% 10.1%
Jed Bradley 1990 10.3% 13.5% 10.1%
Lucas Giolito 1994 11.4% 12.7% 10.1%
Seth Frankoff 1988 13.5%  7.9% 10.1%
Josh Hader 1994 12.6% 11.8% 10.0%
Lucas Sims 1994 11.3% 12.6% 10.0%
Jacob Faria 1993 13.7% 10.9% 10.0%
Merrill Kelly 1988 11.9% 12.2% 10.0%

KATOH also really likes David Paulino, who projects as an elite starter, based solely on his SwStr% potential. If you want to focus on other young prospects, you’ll probably be encouraged by De Leon, Montgomery and Alex Reyes. If you’re not concerned with prospect age, a guy like Tyler Webb, he of limited starting experience, may have some potential as a starter. Keep in mind, the data here are limited to pitchers with Double-A/Triple-A experience as starters. Again, the table above shows actuals at each level, but will only use the data if the sample size is higher than 250 pitches at that level.

Keep and eye out for: Dinelson Lamet, the N. 25 prospect for San Diego as per Eric Longenhagen, mostly because he has a cool name, but also because he has interesting stats.

Conclusions

I was surprised by the consistency of the data, as well as how strong some of the correlations are, which leads me to believe the data are pretty good overall. Further, when I dug into other data points (future article teaser) such as average batted ball distances, we get no correlation for pitchers, but relatively significant ones for batters. In other words, we see what we’d expect to see based on our understanding of the game.

This is the tip of the iceberg for minor league play-by-play data, with lots more to come!

References & Resources

This research was largely influenced and inspired by Chris Mitchell’s KATOH. I will point out that KATOH uses aggregate data which are far more reliable than granular, manual minor league pitch-by-pitch data, which may create discrepancies in our results. Additionally, the pitch-by-pitch data used herein are constrained to the 2008-2016 time frame at the maximum and shorter time frames for certain metrics. I’d like to also say thank you to Chris for his feedback on this piece and commentary on various elements within.


Eli Ben-Porat is a Senior Manager of Reporting & Analytics for Rogers Communications. The views and opinions expressed herein are his own. He builds data visualizations in Tableau, and builds baseball data in Rust. Follow him on Twitter @EliBenPorat, however you may be subjected to (polite) Canadian politics.
6 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Jonathan Sher
7 years ago

Curious what your projection would say for Matt Strahm, who pitched in the pen for the Royals in 2016 but had high k-rates per 9 as a mostly-starter in AA and is a candidate for KC’s rotation in 2017. His swinging strike rate as a reliever in the majors was 12.2%, and while one would expect slippage as a starter, I suspect he might grade out just behind Paulino — Strahm is 6’4″.

Jonathan Sher
7 years ago
Reply to  Eli Ben-Porat

Thanks Eli. Junis is an intriguing prospect and would be clearly in the running, along with Strahm, for the 5th rotation spot next Spring. Here’s what J.J. Picollo, assistant general manager for player personnel in KC, said about him in the end of July in an interview with the Kansas City Star:

“One guy that has taken a huge step is Jacob Junis in Double-A. He’s been outstanding all year long. He was a guy that sort of showed you short glimpses of him really being sort of a dominating guy. He threw 90-95, good breaking ball, good change-up. And then he would lose it. He’d lose it for a little bit and get back into it. Well, then this year, he just had a better mound presence and he just kept working. He’s probably been our most consistent guy since the beginning of the season. He’s been consistently throwing 93-95 and his change-up has been good every night. His curveball has been good. He’s learning how to add and subtract from those pitchers with velocity. He’s been a real shining light this season.”

MGL
7 years ago

Very good stuff. All of this is for sw str % of course. Is there any research out there you can point to that looks at the correlation btwn sw str % and K rate as well as sw str % and overall success (eg FIP+ or RA+) ?

Even if there is a correlation between sw st % and overall success, which I’m sure there is, it’s a long way to go from predicting sw st % in the majors to predicting success in the majors or even K rate.

I actually don’t know how to chain correlations. E.g. if r-squared from A to B is .4 and from B to C is .4, what is the r-squared from A to C? I suspect it is .16 but I’m not sure.

Matthew Trueblood
7 years ago

Terrific work, Eli. Do you happen to know whether the data you used for this piece is the same as the data that appears in minor league game logs on the register pages at Baseball Reference? Or, to ask another way: where can I most easily find the information that made up your data set?