Scouting the Minors Pitch by Pitch: Swinging Strike %

by Eli Ben-Porat
November 22, 2016

Blake Snell hasn’t missed as many bats bats in the majors as he did in the minors. (via Arturo Pardavila III)

Everyone has an irrational love for prospects, with the hope that the next great prospect will somehow fulfill some massive potential, never before seen in our lifetimes. Later, when a select few do reach such lofty heights (Mike Trout/Clayton Kershaw) we soon grow bored and look immediately for the NEXT one.

Along come the stat line scouts, who take a cold, hard look at these prospects and say, no, Gary Sanchez will not be 17 standard deviations better in the majors than at any other professional level! This article will attempt to expand (ever so slightly) our ability to scout these prospects, leveraging minor league pitch-by-pitch data, as recorded by MiLB stringers.

Swinging Strike Rate | SwStr%

Ah, the exalted swinging strike, one of the purest measures of a pitcher’s ability to compete at the major league level. A swinging strike is almost always a good thing and does not involve any variables other than the batter and the pitcher; even the umpire’s impact is limited to decisions on checked swings. Before I throw some colorful graphs at you, let’s do a little housekeeping and look at SwStr% by level going back to 2008:

MINOR LEAGUE SWSTR% – STARTERS

Class	2008	2009	2010	2011	2012	2013	2014	2015	2016
Majors	8.1%	8.1%	8.2%	8.4%	8.8%	8.9%	9.0%	9.4%	9.6%
AAA	9.2%	7.5%	14.9%	8.8%	8.9%	9.5%	9.4%	8.9%	9.3%
AA	10.7%	10.6%	15.9%	12.0%	9.5%	9.8%	9.0%	9.1%	9.7%
A+	38.2%		40.0%	26.0%	21.7%	19.7%	19.1%	17.9%	11.0%
A	11.2%	21.0%	22.9%	26.9%	23.0%	21.3%	18.9%	17.1%	11.2%
A-			26.8%	25.4%	24.6%	25.0%	24.9%	13.7%	11.7%
R		31.9%	27.1%	26.5%	26.3%	27.1%	27.5%	28.5%	24.4%

SOURCE: MLB Advanced Media

MINOR LEAGUE SWSTR% – RELIEF PITCHERS

Class	2008	2009	2010	2011	2012	2013	2014	2015	2016
Majors	9.4%	9.3%	9.7%	9.8%	10.3%	10.6%	10.7%	11.0%	11.2%
AAA	10.5%	10.9%	23.4%	10.5%	10.5%	11.0%	11.0%	10.8%	11.2%
AA	12.9%	13.0%	15.3%	13.6%	11.4%	11.3%	10.7%	10.4%	11.5%
A+	31.1%		28.6%	28.0%	24.0%	22.4%	21.9%	20.3%	12.3%
A	15.6%	38.8%	18.5%	29.9%	25.5%	24.1%	21.1%	19.2%	12.6%
A-			26.9%	28.6%	27.5%	28.1%	27.2%	16.1%	13.2%
R		19.7%	32.3%	26.6%	26.2%	26.5%	26.9%	28.1%	24.9%

SOURCE: MLB Advanced Media

I always like to look at data at the aggregate level to judge if it makes sense and jibes with what we’d expect to see. The above table tells me the following:

Triple-A data consistently track major league data back to 2011 for starters and to 2012 for relievers. To me the difference in the data is minor for relief pitchers and consistent with future seasons so that I’m comfortable using Triple-A data going back to 2011 for both starters and relievers.
Double-A data look clean dating back only to 2012, with prior years being too messy to use. I considered smoothing the data, but didn’t want to go in that direction as there were likely large chunks of data that would just be plain wrong.
Single-A data look kosher for 2016 only, which is interesting and likely means that MLBAM put in some more resources there. This will mean that we can’t use any of the granular A ball data yet to project major league data.
Rookie ball classifies every strike as a swinging strike.

AA SwStr% to MLB SwStr% – Starters | R-Squared 0.33

MLB SwStr% = 0.5*AA + 0.033

We see a relatively solid correlation between the two metrics, suggesting there is predictive power in Double-A SwStr rates. The data above are filtered to samples where we have at least 500 pitches thrown at Double-A and 500 pitches thrown in the majors (over total lifetime in both leagues); playing with sample sizes (250, 1,000) yields roughly the same R2 values. I want to note two outliers, which we will revisit shortly: Thoronto Syndergaard and Jacob deGrom.

AA SwStr% to MLB SwStr% – Relievers | R-Squared 0.19

With relievers, we see a much looser link, which doesn’t hold up as well when we lower sample sizes (250 pitch threshold will yield an R2 of only 0.1). This indicates that great SwStr% for a relief pitcher in the minors is not a good signal that he will be a dominant major league reliever.

AAA SwStr% to MLB SwStr% – Starters | R-Squared 0.40

MLB SwStr% = 0.6*AAA + 0.027

This graph above is filtered to just pitchers who have thrown at least 1,000 pitches at Triple-A and in the majors as a starter. Look at Tampa Bay Rays Snell, Moore, Cobb and Colome all under-performing in the bigs, compared to Thor and DeGrom, who float far above the projected trend line; perhaps it’s the infamous Dan Warthen Slider? The obvious question that arises is whether this is due to issues with the data, or if it is perhaps a signal with respect to how pitchers are being developed with the Mets as opposed to the Rays. If it were a data issue, we would see inflated swinging strikes for non-Rays pitchers in Durham Bulls Athletic Park, as well as deflated swinging strike rates for non-Mets pitchers in Las Vegas.

Away Starter SwStr%, By AAA City

When we look at opposing teams within the Rays and Mets ballparks, we don’t see any significant bias. However, when we look at just the home team pitchers we get a HUGE anomaly for Rays ptichers, indicating there may be measurement bias by the home stringers to record called strikes as swinging strikes.

Home Starter SwStr%, By AAA City

We also see Rays pitchers performing better in their home stadium and Mets pitchers performing worse, indicating there may be some inherent measurement bias.

This may show evidence of over-inflated swstr% for Rays pitchers at home, or it may just be that Rays pitchers are better at home than on the road. However, even if you took away a couple of points from Moore and Snell, they would still be well below their projections and vice versa with Thor and DeGrom. My opinion would be that the Mets pitchers over-performing looks more legitimate as a signal than the Rays under-performing, or more specifically, I don’t think that it’s quite that dramatic for Tampa Bay, but that there is a measure of truth to it.

Pitchers with AA and AAA experience | R-Squared 0.40

MLB SwStr% = 0.43*AAA + 0.29*AA + 0.013

Can we improve the model if we include both Double-A and Triple-A experience for pitchers? Using the above formula and filtering for pitchers with at least 250 pitches in Double-A and Triple-A, as well as at least 500 major league pitches (starters only) we get the following chart (actual on the left, projected SwStr on the right):

We see a much tighter correlation, with interesting cases like Burch Smith and Chad Green popping out as outliers. Nick Tropeano looks to be right in line with his projection (a tick or so higher), which might lend some validity to his 3.56 ERA last season.

Is Age a Factor?

From what I can see in the data, age has relatively little impact on projecting SwStr% in major league pitchers; this is a different result than Chris Mitchell’s KATOH, which ascribes a lot more predictive value to WAR to pitchers who are younger than the competition. We’re comparing apples/SwStr% to pears/WAR here, so I’ll defer to Mitchell’s knowledge and understanding of advanced statistics, over the more simplistic xy scatterplot approach I am employing here. Additionally, in discussions with Chris, it is quite likely that my data doesn’t have the long-term horizon to capture the full arc of a pitcher and more likely is constrained by the small 2008-2016 window.

Either way – this is what I see:

We see almost no correlation between age at High-A, Double-A or Triple-A with major league SwStr%. My assumption is that if a pitcher being younger is predictive of future success, we would see some growth in his major league numbers from where he was as a youngster. What these data show is that what pitchers do in Triple-A has a LOT more to say about what they’ll do when they first get to the majors than how old they were when they did it. This may make this system more useful for fantasy purposes, as opposed to actual baseball value.

Another way to look at this is if it increases the binary probability of a pitcher making the major leagues. What follows below is a graph of age against number of pitches thrown in the majors. The size of the bubble indicated how many pitches the pitcher has thrown in the majors:

We see very negligible correlation between the age and time spent, as well as age and yes/no in the majors, topping out with an R-2 of 0.05 for AAA. This all tells me that while it’s definitely better to be younger, it’s far more important to perform well. Again, I’m not ready to conclude that age isn’t important, due to the limited time frame, but I’m still looking for a way to show a natural correlation. The best I can show, which suggests that this approach is indeed missing the later years of the career arc, is this graph which shows SwStr% by age in the majors:

We see (with significant sample sizes) a big jump from ages 20 & 21 to older ages, as well as a clear peak between ages 26 to 31 and decline thereafter. This would suggest and support the KATOH position that there is more room for growth if you start earlier and that likely this model is capturing the first steps in the majors, rather than the entire arc.

Pitcher Height and SwStr%

SWSTR% AND PITCHER HEIGHT

	RP	RP	RP	SP	SP	SP
Pitcher Height	AA	AAA	Majors	AA	AAA	Majors
5-7	10.5%	10.4%	10.5%	7.7%	8.0%
5-8	8.6%	5.4%	9.8%	12.4%	9.7%	8.9%
5-9	11.7%	10.5%	10.9%	9.0%	8.5%
5-10	11.5%	10.9%	12.0%	10.4%	9.9%	8.1%
5-11	11.3%	11.0%	11.0%	9.2%	9.5%	8.6%
6-0	11.2%	11.0%	10.6%	9.4%	9.2%	9.1%
6-1	10.9%	11.4%	10.3%	9.5%	9.0%	8.7%
6-2	11.2%	10.7%	10.8%	9.5%	9.1%	9.0%
6-3	11.0%	10.7%	10.7%	9.4%	9.3%	9.2%
6-4	11.1%	10.9%	10.9%	9.6%	9.3%	9.6%
6-5	10.8%	11.0%	11.4%	8.9%	9.1%	9.1%
6-6	10.9%	10.9%	9.9%	9.5%	9.4%	9.8%
6-7	11.4%	10.5%	11.3%	8.7%	8.5%	8.3%
6-8	11.0%	9.7%	11.9%	9.4%	8.3%	7.0%
6-9	9.5%	10.0%	9.0%	10.9%	10.8%	9.1%
6-10	9.3%	5.1%	12.1%	9.3%	6.7%	8.4%

I’ve highlighted the two cells that really stand out to me: major league SwStr% for starters below six feet. We see a precipitous decline in SwStr% for pitchers 5-foot-10 and 5-foot-11, which may indicate that short starters can excel at Double-A and Triple-A, but if they’re below the magical threshold of six feet tall, they will struggle at the major league level. Interestingly, there doesn’t appear to be any effect whatsoever for relievers, suggesting that height isn’t so important for relief pitchers: 6-foot-3 to 6-foot-6 appears to be the sweet spot for pitchers.

SP MLB SwStr% – AAA SwStr%

In graphical form we see the same story played out, where the clear optimal spot is 6-foot-4 to 6-foot-6 and anything extreme leads to very random results. This is quite likely due to shorter pitchers having a natural survivor bias (if you’re short and part of a major league team it’s because you were great, as opposed to some tall pitchers who stunk but were still at Triple-A) as well as taller pitchers being tough for Triple-A hitters to handle. Sample sizes for pitchers 6-foot-8 and above are very very small. For example, with 6-foot-8 pitchers, we’re talking about Volstad, Glasnow, Kameron Loe and Fister. Fister significantly under-performed compared to his Triple-A numbers; Glasnow/Volstad mostly in line. The conclusion I would draw is more that being shorter than six feet is bad signal for starters, and being 6-foot-4 to 6-foot-6 is a pretty good signal, with extremely tall pitchers carrying a lot more risk (which is pretty intuitive).

RP MLB SwStr% – AAA SwStr%

For relievers we get almost the opposite effect, suggesting that one shouldn’t be too worried about a short reliever; alternatively, perhaps a short starter with great numbers in Triple-A should be converted to a relief pitcher in the majors.

Projections!

What good would this article be if I didn’t publish some projections for major league SwStr for pitchers who have limited major league experience, projections that you could use for fantasy baseball next year? For pitchers with only Double-A or Triple-A experience, I used the Double-A/ Triple-A only model; for pitchers with experience at both levels, I used the combined model. I’ve shown stats at both levels, even if the sample size was too small to use.

TOP SWSTR% PROSPECTS

Pitcher	Pitcher DOB	SwStr (AA)	SwStr (AAA)	Proj. MLB SwStr%
David Paulino	1994	16.7%	13.8%	12.1%
Danny Hultzen	1989	18.5%	11.3%	11.5%
John Ely	1986		14.5%	11.4%
Alex Torres	1987		14.4%	11.4%
Jose De Leon	1992	14.5%	13.6%	11.3%
Leuris Gomez	1986		14.4%	11.3%
Tyler Webb	1990		14.1%	11.2%
Ariel Pena	1989	12.8%	14.2%	11.1%
Jordan Montgomery	1992	11.7%	14.4%	10.9%
Cesar Valdez	1985		13.5%	10.8%
Alex Reyes	1994	13.0%	13.4%	10.8%
Brock Stewart	1991	13.8%	12.7%	10.7%
Lisalverto Bonilla	1990	12.4%	13.6%	10.7%
Andrew Barbosa	1987	14.7%	13.1%	10.6%
Jose Ramirez	1990	14.3%	11.7%	10.5%
Rob Wooten	1985		13.0%	10.5%
Adrian Salcedo	1991	14.2%		10.4%
John Straka	1990	14.2%		10.4%
Dinelson Lamet	1992	14.2%	14.1%	10.4%
Edwar Cabrera	1987	11.9%	13.0%	10.4%
Chris Withrow	1989	14.1%		10.3%
Daniel Gossett	1992	14.1%	5.2%	10.3%
Brandon Woodruff	1993	14.0%		10.3%
Sean Gleason	1985	13.9%		10.2%
Joan Gregorio	1992	11.8%	12.7%	10.2%
Diego Moreno	1987		12.5%	10.2%
Rett Varner	1988		12.4%	10.2%
Andrew Carraway	1986	18.8%	8.0%	10.2%
Buddy Boshers	1988	13.6%		10.1%
Austin Pruitt	1989	10.4%	13.5%	10.1%
Jed Bradley	1990	10.3%	13.5%	10.1%
Lucas Giolito	1994	11.4%	12.7%	10.1%
Seth Frankoff	1988	13.5%	7.9%	10.1%
Josh Hader	1994	12.6%	11.8%	10.0%
Lucas Sims	1994	11.3%	12.6%	10.0%
Jacob Faria	1993	13.7%	10.9%	10.0%
Merrill Kelly	1988	11.9%	12.2%	10.0%

KATOH also really likes David Paulino, who projects as an elite starter, based solely on his SwStr% potential. If you want to focus on other young prospects, you’ll probably be encouraged by De Leon, Montgomery and Alex Reyes. If you’re not concerned with prospect age, a guy like Tyler Webb, he of limited starting experience, may have some potential as a starter. Keep in mind, the data here are limited to pitchers with Double-A/Triple-A experience as starters. Again, the table above shows actuals at each level, but will only use the data if the sample size is higher than 250 pitches at that level.

Keep and eye out for: Dinelson Lamet, the N. 25 prospect for San Diego as per Eric Longenhagen, mostly because he has a cool name, but also because he has interesting stats.

Conclusions

I was surprised by the consistency of the data, as well as how strong some of the correlations are, which leads me to believe the data are pretty good overall. Further, when I dug into other data points (future article teaser) such as average batted ball distances, we get no correlation for pitchers, but relatively significant ones for batters. In other words, we see what we’d expect to see based on our understanding of the game.

This is the tip of the iceberg for minor league play-by-play data, with lots more to come!

References & Resources

This research was largely influenced and inspired by Chris Mitchell’s KATOH. I will point out that KATOH uses aggregate data which are far more reliable than granular, manual minor league pitch-by-pitch data, which may create discrepancies in our results. Additionally, the pitch-by-pitch data used herein are constrained to the 2008-2016 time frame at the maximum and shorter time frames for certain metrics. I’d like to also say thank you to Chris for his feedback on this piece and commentary on various elements within.

Eli Ben-Porat is a Senior Manager of Reporting & Analytics for Rogers Communications. The views and opinions expressed herein are his own. He builds data visualizations in Tableau, and builds baseball data in Rust. Follow him on Twitter @EliBenPorat, however you may be subjected to (polite) Canadian politics.

6 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Jonathan Sher

8 years ago

Curious what your projection would say for Matt Strahm, who pitched in the pen for the Royals in 2016 but had high k-rates per 9 as a mostly-starter in AA and is a candidate for KC’s rotation in 2017. His swinging strike rate as a reliever in the majors was 12.2%, and while one would expect slippage as a starter, I suspect he might grade out just behind Paulino — Strahm is 6’4″.

Eli Ben-Poratmember

8 years ago

Reply to Jonathan Sher

Strahm posted a 15.1% SwStr% as an RP and an 11.6% SwStr% as an SP (11.8% overall) while in AA. This projects to about 9.1% at the major league level. The 15.1% as an RP is encouraging, but was done over a sample size of 106 pitches. This model would be pessimistic on Strahm’s initial 20 IP in the majors, should he be a starter.

If we’re looking through KC pitching prospects only, Jake Junis http://www.fangraphs.com/statss.aspx?playerid=sa602414&position=P projects to a 9.5% SwStr% which is half-way decent.

Jonathan Sher

8 years ago

Reply to Eli Ben-Porat

Thanks Eli. Junis is an intriguing prospect and would be clearly in the running, along with Strahm, for the 5th rotation spot next Spring. Here’s what J.J. Picollo, assistant general manager for player personnel in KC, said about him in the end of July in an interview with the Kansas City Star:

“One guy that has taken a huge step is Jacob Junis in Double-A. He’s been outstanding all year long. He was a guy that sort of showed you short glimpses of him really being sort of a dominating guy. He threw 90-95, good breaking ball, good change-up. And then he would lose it. He’d lose it for a little bit and get back into it. Well, then this year, he just had a better mound presence and he just kept working. He’s probably been our most consistent guy since the beginning of the season. He’s been consistently throwing 93-95 and his change-up has been good every night. His curveball has been good. He’s learning how to add and subtract from those pitchers with velocity. He’s been a real shining light this season.”

MGL

8 years ago

Very good stuff. All of this is for sw str % of course. Is there any research out there you can point to that looks at the correlation btwn sw str % and K rate as well as sw str % and overall success (eg FIP+ or RA+) ?

Even if there is a correlation between sw st % and overall success, which I’m sure there is, it’s a long way to go from predicting sw st % in the majors to predicting success in the majors or even K rate.

I actually don’t know how to chain correlations. E.g. if r-squared from A to B is .4 and from B to C is .4, what is the r-squared from A to C? I suspect it is .16 but I’m not sure.

Eli Ben-Poratmember

8 years ago

Reply to MGL

As I explore these data more, I’ll evolve it into a model that can potentially predict ERA or FIP (and WAR for batters). The opening chapter here was intended to explore the purest of those signals and see how that extrapolated to the majors and then expand to other metrics, before rolling out a more comprehensive model. The final article that shows up in THT is basically me just formalizing my data discovery process, which starts with a hunch, followed by throwing stuff into Tableau and then exporting a bunch of pictures (usually x/y scatter plots) and talking about it. I’m definitely working towards building a comprehensive prospect model from the data but probably a ways away from that.

Agreed that SwStr% in and of itself doesn’t tell you much about future success, other than the ability to get swinging strikes. To me it’s the equivalent of scouting a basketball player as 7’0 and extremely athletic, useful information, but not nearly a complete picture.

Matthew Trueblood

8 years ago

Terrific work, Eli. Do you happen to know whether the data you used for this piece is the same as the data that appears in minor league game logs on the register pages at Baseball Reference? Or, to ask another way: where can I most easily find the information that made up your data set?

BAL	CHW	ATH
BOS	CLE	HOU
NYY	DET	LAA
TBR	KCR	SEA
TOR	MIN	TEX

ATL	CHC	ARI
MIA	CIN	COL
NYM	MIL	LAD
PHI	PIT	SDP
WSN	STL	SFG