Using Movement, Velocity & Location to Predict a Swinging Strike

by Eno Sarris

Masahiro Tanaka had the best expected swinging strike rate among starters last season. (via Hayden Schiff)

Editor’s Note: This article is co-authored by Eno Sarris and Andrew Perpetua.

We have had reasonably precise numbers on every pitch for years now. The location, velocity and movement for each pitch. Down to the second decimal. Add to that our knowledge of what areas at the plate produce the highest exit velocity—thanks to Statcast—and we have had even more numbers at our disposal the last three years.

While we’ve figured out that certain aspects of that movement are good for certain pitches, have we tried to put those together to predict outcomes using these process-based numbers? Could we add in that Statcast data to get an idea about which pitchers are throwing to good spots in the zone at the same time? Basically, we’re talking about putting a number to command and stuff, the two hardest things to quantify in pitching. Can we do it?

Turns out, maybe. We at least have a new framework with which to attempt it.

I asked Andrew Perpetua if we could try to use the location of the pitch, the movement and velocity of the pitch, and the count to predict a swinging strike.

Here, Andrew explains the process he undertook:

For this exercise, I selected seven variables: pitch location (x and z), pitch movement (x and z), the count (balls and strikes), and effective velocity.

I limited the selection of pitches to only those roughly in the area of the strike zone. That is, pitches that had an x coordinate between -20 and 20 inches and a z coordinate between 5 and 50 inches. There is a small number of pitches with anomalous movement readings, so I limited the movement rate to sensible numbers (between -24 and 21 inches in the x axis and between -16 and 26 on the z axis).

All pitches with an effective velocity below 72 mph were discarded, and all pitches above 99 mph were said to be 100 mph. Approximately 97 percent of pitches have a measured effective velocity, so for the three percent without such a measurement, I used standard pitch velocity in its place.

Initially, the lower limit of pitch velocity was set to 50 mph, but I raised the threshold because it led to better results both on the player and major league level. Approximately 0.55 percent of pitches were thrown below 72 mph over the past three seasons. Roughly one in 12 of these pitches was thrown by Jered Weaver.

Next, I created three grids each for movement and location. One is a six-inch by six-inch grid, and two others are three-inch by three-inch grids. Each grid is offset with one another on both the x and z axes so their edges do not overlap. Likewise, I split pitch velocity into 10-, five-, and two-mile per hour groups, and I threw in the strike and ball totals as well.

I combined these different grids and velocity groups in six ways. For example: six-inch location grid, six-inch movement grid, and two-mph velocity group versus three-inch location, three-inch movement, and five-mph velocity group. These six bucketing schemes are unique.

Overfitting the data can be an easy trap when using a bucket-type method like this, but with the methodology detailed above I have managed to have an average of 72 balls in each bucket, which I feel is large enough to get the job done.

After grouping the pitches and finding the average results for each bucket, I am left with six estimates for swinging strike rate for each pitch. These six estimates are then weighted and combined to create one final estimate.

The above process first was conducted on a random set of 30 pitchers who each threw a minimum of 750 pitches in both 2015 and 2016. Once trained, I applied the method to several batches of 50 random pitchers, where I tested the results for both the 2015 to 2016 seasons, but more importantly, applicability to the 2016 to 2017 seasons. The method was not originally trained for the 2016 to 2017 data, so success in this area was encouraging and spurred me on to applying the method to all 1,135 pitchers.

There’s a quirk to this approach that’s a benefit and a possible drawback. Those buckets Andrew created serve to link actual outcomes with a wide variety of inputs without using multivariable regression. You might say that’s a drawback, because we can’t then use the equation to regress to the mean with each input, but there is a type of regression: Andrew creates bigger and smaller buckets and weights each bucket based on its similarity. That means the biggest buckets—or, major league average production—are weighted in the results. But it’s not done the way many analysts do it. More on why this might make sense later.

Here are the starters with the highest expected swinging strike rates. Here’s where whiffs live, for the most part. Many of your actual swinging strike leaders grace this board: aces abound. Though it’s interesting to see No. 9 on this list, more on him later.

xSWSTR Leaders

Player	lgEV 17	SwStrk 17	xSwStrk 17
Masahiro Tanaka	90.69	14.9%	15.4%
Corey Kluber	89.61	15.6%	14.9%
Chris Archer	91.02	13.5%	14.7%
Robbie Ray	91.49	14.4%	14.0%
Max Scherzer	90.83	15.3%	13.8%
Chris Sale	90.29	14.7%	13.5%
Danny Salazar	91.32	16.2%	13.5%
Luis Severino	91.61	12.6%	13.1%
Dallas Keuchel	89.88	11.0%	12.9%
Zack Greinke	91.44	12.2%	12.8%
Jacob deGrom	90.72	13.2%	12.7%
Michael Pineda	91.18	12.2%	12.7%
Stephen Strasburg	90.89	13.5%	12.6%
Lance McCullers	90.93	12.2%	12.6%
Carlos Carrasco	91.91	13.5%	12.5%

Minimum 1,500 pitches thrown

Why is Dallas Keuchel interesting? Because he’s one of the pitchers who most underperformed his expected swinging strike rate. Here’s that leaderboard.

Underperformers

Name	lgev 17	xSwStrk 17	SwStrk 17	Diff 17
Mike Leake	91.56	11.0%	8.4%	-2.6%
Martin Perez	90.92	9.7%	7.2%	-2.5%
Jordan Zimmerman n	92.02	10.5%	8.1%	-2.4%
Miguel Gonzalez	91.00	9.1%	6.7%	-2.4%
Andrew Cashner	92.43	8.3%	6.0%	-2.3%
Wade Miley	90.80	10.1%	8.1%	-2.0%
Bartolo Colon	92.17	7.3%	5.3%	-2.0%
Dallas Keuchel	89.88	12.9%	11.0%	-1.9%
Jon Gray	91.65	10.8%	8.9%	-1.9%
Daniel Norris	92.21	11.0%	9.1%	-1.9%
Matt Harvey	92.38	9.3%	7.4%	-1.9%
Mike Montgomery	91.42	9.8%	8.1%	-1.7%
Matt Garza	90.85	9.7%	8.0%	-1.7%
Kyle Hendricks	89.55	10.1%	8.5%	-1.6%
Yovani Gallardo	91.00	9.9%	8.3%	-1.6%

Minimum 1,500 pitches thrown

There are enough sinkerballers on this list to consider whether some of these pitchers chose to chase weak contact and ground balls instead of going after whiffs. Don’t know what’s going on with Jon Gray here, but since his curveball varies so much between home and away starts, it’s possible there’s a Coors Field effect for him—overall movement looks better than actual movement at home, for example.

Can we look at the overperformers and learn anything from the dichotomy between the two groups? Here are the over-performers, the ones that got more swinging strikes than our equation predicts.

Overperformers

Player	lgev 17	xSwStrk 17	SwStrk 17	Diff 17
Clayton Kershaw	91.392	10.50%	13.30%	2.80%
Danny Salazar	91.318	13.50%	16.20%	2.70%
Drew Pomeranz	90.923	7.70%	9.90%	2.20%
Aaron Nola	92.444	9.30%	10.80%	1.50%
Max Scherzer	90.832	13.80%	15.30%	1.50%
Zack Godley	91.044	11.90%	13.20%	1.30%
Chris Sale	90.286	13.50%	14.70%	1.20%
Carlos Carrasco	91.908	12.50%	13.50%	1.00%
Stephen Strasburg	90.887	12.60%	13.50%	0.90%
Corey Kluber	89.605	14.90%	15.60%	0.70%
Jimmy Nelson	90.669	10.60%	11.30%	0.70%
Mike Fiers	91.199	8.30%	9.00%	0.70%
Jaime Garcia	91.243	10.60%	11.20%	0.60%
James Paxton	91.308	11.90%	12.50%	0.60%
Jacob deGrom	90.719	12.70%	13.20%	0.50%

Minimum 1,500 pitches thrown

Command may be one part of this group’s ability to perform above its movement and velocity, even if location was a variable. Of the top 10 overperforming starters, all but one had better command than the average of the top 10 underperforming pitchers by a command metric Andrew and I developed (lgEV). That metric represents the major league-average exit velocity in the locations the pitcher throws to, with a lower number being better.

That number was created after we contemplated the following visual, which showed the average exit velocity allowed by right-handed pitchers facing right-handed batters, bucketed by location. We added a side-by-side comparison of Aroldis Chapman for added context.

We’ve had Bill Petti’s EDGE%—how often a pitcher throws to the edges of the strike zone—for a while, but once you look at this picture, you realize the edges aren’t all created equal, and that there are places fairly deep within the zone that are still decent (read: blue) for the pitcher. By mapping it this way, we can ask the more exact and granular question: Who throws it most often to the places that produce the lowest exit velocities?

Create those bins again, weight them the same way we did for expected swinging strike rate, and what comes out is a number that represents the major league-average exit velocities tailored specifically to the locations each pitcher throws to. In other words, it’s a location-based expected exit velocity allowed–process derived from outcomes.

Is it safe to assume every pitcher wants to throw the ball to places that produce low exit velocity? I think it is. Every pitcher wants to avoid hard contact, most pitchers understand that heat map and have internalized it. Here are the pitchers who fared best by lgEV this past year:

lgEV Leaders

name	xSwStrk 17	lgEV 15	lgEV 16	lgEV 17
R. A. Dickey	9.0%	93.98	91.24	88.65
CC Sabathia	10.0%	92.32	90.95	89.38
Kyle Hendricks	10.1%	93.27	91.92	89.55
Corey Kluber	14.9%	91.53	90.95	89.61
Dallas Keuchel	12.9%	91.53	92.18	89.88
Jake Odorizzi	11.0%	92.65	91.69	90.13
Eduardo Rodriguez	11.2%	92.73	90.95	90.19
Marco Estrada	11.0%	94.34	91.07	90.24
Chris Sale	13.5%	92.46	92.06	90.29
Clayton Richard	9.1%	93.05	92.97	90.46
Matt Boyd	10.3%	94.22	91.50	90.54
Ervin Santana	11.5%	92.37	92.18	90.62
Johnny Cueto	11.8%	91.76	91.44	90.64
Jimmy Nelson	10.6%	92.29	92.47	90.67
Masahiro Tanaka	15.4%	92.13	92.02	90.69

Dickey may be a surprise at the top of this list, but it’s worth noting he’s been erratic from year to year. In terms of a three-year average, the top three here are Corey Kluber, CC Sabathia, and Keuchel. That makes a lot of sense intuitively.

Kluber constantly front- and back-doors his sinker and drops his breaking balls in at the knee, and those spots create weak contact all the time. Sabathia had a resurgence this past year throwing his cutter inside to lefties, which is an obvious pitcher’s spot on the graphic above. Keuchel lives outside the zone and low and yet somehow coaxes batters to swing at those pitches. This all sounds like command to me.

Back to the over and under performing xswSTR guys, though. This command stat doesn’t explain the difference between prime xswSTR underachiever Dallas Keuchel and overachiever Clayton Kershaw, not really. Keuchel was the fifth-best starting pitcher by lgEV last year, Kershaw was 56th. With count included, it’s not easy to blame the catcher’s framing ability, either. Keuchel had awesome command and still didn’t get as many whiffs as he could from his stuff.

Back to the fact that Keuchel is legendary with the throwing-outside-the-zone thing, though. We have location as one of the variables. The 15 who most underperformed their expected swinging strike rate averaged a 45.5 percent zone rate, and the 15 who most overperformed their expected swinging strike rate averaged a 47.2 percent zone rate.

If you throw outside the zone, you’ll necessarily throw to places with low exit velocities, which will make your command stat look good. But if you throw too often outside the zone, you’ll get into bad counts that will affect your xswSTR number negatively. Perhaps the relationship between location and outcomes needs to be bucketed finer or poked further.

But there’s also something interesting here when it comes to our approach and this issue. There’s a long-held truism among pitchers that the art they practice is “making balls look like strikes and strikes look like balls.” Our command stat really seems to be measuring the pitcher’s ability to make balls look like strikes; you’re not going to be around long if you exclusively throw outside the zone to the blue and don’t come in enough to coax the swings. That’s walk city.

What xswSTR seems to be measuring is the other skill, making strikes look like balls. By including location and count along with movement and velocity, we’re rewarding a pitcher’s ability to do well inside the zone.

Could we also be measuring deception, then, in some weird way? That’s a way to make strikes look like balls and do well on pitches thrown inside the zone.

Let’s look at the furthest outliers from the year using an angle behind home plate. Here’s the biggest underperformer, Mike Leake.

And then the No. 1 overperformer, Kershaw, from this past World Series. Notice the hitch, the over-the-top release point.

Is it easier to pick up Leake’s ball? I’d argue it is. Kershaw does a whole manner of things that make it tough to time him.

That sort of stuff is traditionally difficult to put a number on. You might need ball recognition technology and a mountain of interns to code and prepare home plate footage on every pitcher in order to say something as simple as, “You can see the ball for 0.5 milliseconds longer when facing Kershaw then when facing Leake.” Possible but not currently probable.

The results on xswSTR were decently sticky year to year, about on par with walk rate or first-strike percentage. But that means swinging strike rate itself is still currently more useful than this number. Check out the error numbers in the chart below.

Error

Stat	2015-2016	2016-2017	2015-2017
xSwStrk MSE	0.000363	0.000406	0.000482
SwStrk MSE	0.000358	0.000394	0.00053
xSwStrk to SwStrk R	0.675	0.646	0.573
SwStrk to SwStrk R	0.743	0.713	0.634

Pitchers with minimum 500 pitches thrown in 2015, 2016 and 2017

It’s not quite what we wanted, but on the bright side, this chart also means we’re close, and we’ve done so with an approach that hints at success when it comes to putting a number on command and deception.

The next iteration will define movement off the fastball; that should lead to a leap forward. In the future, we also could try to bring sequencing into the equation—the movement, location and velocity of the pitch right before the pitch in question. Though with this bucketing approach, we’d have to have a very large number of pitches going in to make it work. Andrew feels maybe a random forest approach may help improve the data, even if early results were not encouraging.

In the meantime, we’ll have to wonder how, exactly, all that movement and velocity combine to make the batter miss.

References & Resources

Bill Petti, The Hardball Times, “Expanding the Edges of the Strike Zone”
Eno Sarris, FanGraphs, “Aroldis Chapman Is Struggling with the Where and How”
Eno Sarris, FanGraphs, “The Death of a Fastball, with CC Sabathia”
Harry Pavlidis, Baseball Prospectus, “What Makes A Good Changeup: An Investigation, Part 1”
Eno Sarris, Fox Sports, “What makes a great curveball?”
Andrew Perpetua, RotoGraphs, “Adjusting Exit Velocity for Pitch Speed and Location”

BAL	CHW	ATH
BOS	CLE	HOU
NYY	DET	LAA
TBR	KCR	SEA
TOR	MIN	TEX

ATL	CHC	ARI
MIA	CIN	COL
NYM	MIL	LAD
PHI	PIT	SDP
WSN	STL	SFG