Using Movement, Velocity & Location to Predict a Swinging Strike

Masahiro Tanaka had the best expected swinging strike rate among starters last season. (via Hayden Schiff)

Editor’s Note: This article is co-authored by Eno Sarris and Andrew Perpetua.

We have had reasonably precise numbers on every pitch for years now. The location, velocity and movement for each pitch. Down to the second decimal. Add to that our knowledge of what areas at the plate produce the highest exit velocity—thanks to Statcast—and we have had even more numbers at our disposal the last three years.

While we’ve figured out that certain aspects of that movement are good for certain pitches, have we tried to put those together to predict outcomes using these process-based numbers? Could we add in that Statcast data to get an idea about which pitchers are throwing to good spots in the zone at the same time? Basically, we’re talking about putting a number to command and stuff, the two hardest things to quantify in pitching. Can we do it?

Turns out, maybe. We at least have a new framework with which to attempt it.

We�ve moved The 2018 Hardball Times Annual online to give the entire Internet the chance to read the research and insight The Hardball Times Annual has brought to print for the past 15 years.

Please consider supporting the 2018 Hardball Times Annual by
becoming a FanGraphs member. Your membership will help fund future Hardball Times Annuals, the thousands of articles FanGraphs publishes each year, and our ever growing database of baseball stats.

Support FanGraphs

I asked Andrew Perpetua if we could try to use the location of the pitch, the movement and velocity of the pitch, and the count to predict a swinging strike.

Here, Andrew explains the process he undertook:

For this exercise, I selected seven variables: pitch location (x and z), pitch movement (x and z), the count (balls and strikes), and effective velocity.

I limited the selection of pitches to only those roughly in the area of the strike zone. That is, pitches that had an x coordinate between -20 and 20 inches and a z coordinate between 5 and 50 inches. There is a small number of pitches with anomalous movement readings, so I limited the movement rate to sensible numbers (between -24 and 21 inches in the x axis and between -16 and 26 on the z axis).

All pitches with an effective velocity below 72 mph were discarded, and all pitches above 99 mph were said to be 100 mph. Approximately 97 percent of pitches have a measured effective velocity, so for the three percent without such a measurement, I used standard pitch velocity in its place.

Initially, the lower limit of pitch velocity was set to 50 mph, but I raised the threshold because it led to better results both on the player and major league level. Approximately 0.55 percent of pitches were thrown below 72 mph over the past three seasons. Roughly one in 12 of these pitches was thrown by Jered Weaver.

Next, I created three grids each for movement and location. One is a six-inch by six-inch grid, and two others are three-inch by three-inch grids. Each grid is offset with one another on both the x and z axes so their edges do not overlap. Likewise, I split pitch velocity into 10-, five-, and two-mile per hour groups, and I threw in the strike and ball totals as well.

I combined these different grids and velocity groups in six ways. For example: six-inch location grid, six-inch movement grid, and two-mph velocity group versus three-inch location, three-inch movement, and five-mph velocity group. These six bucketing schemes are unique.

Overfitting the data can be an easy trap when using a bucket-type method like this, but with the methodology detailed above I have managed to have an average of 72 balls in each bucket, which I feel is large enough to get the job done.

After grouping the pitches and finding the average results for each bucket, I am left with six estimates for swinging strike rate for each pitch. These six estimates are then weighted and combined to create one final estimate.

The above process first was conducted on a random set of 30 pitchers who each threw a minimum of 750 pitches in both 2015 and 2016. Once trained, I applied the method to several batches of 50 random pitchers, where I tested the results for both the 2015 to 2016 seasons, but more importantly, applicability to the 2016 to 2017 seasons. The method was not originally trained for the 2016 to 2017 data, so success in this area was encouraging and spurred me on to applying the method to all 1,135 pitchers.

There’s a quirk to this approach that’s a benefit and a possible drawback. Those buckets Andrew created serve to link actual outcomes with a wide variety of inputs without using multivariable regression. You might say that’s a drawback, because we can’t then use the equation to regress to the mean with each input, but there is a type of regression: Andrew creates bigger and smaller buckets and weights each bucket based on its similarity. That means the biggest buckets—or, major league average production—are weighted in the results. But it’s not done the way many analysts do it. More on why this might make sense later.

Here are the starters with the highest expected swinging strike rates. Here’s where whiffs live, for the most part. Many of your actual swinging strike leaders grace this board: aces abound. Though it’s interesting to see No. 9 on this list, more on him later.

xSWSTR Leaders
Player lgEV 17 SwStrk 17 xSwStrk 17
Masahiro Tanaka 90.69 14.9% 15.4%
Corey Kluber 89.61 15.6% 14.9%
Chris Archer 91.02 13.5% 14.7%
Robbie Ray 91.49 14.4% 14.0%
Max Scherzer 90.83 15.3% 13.8%
Chris Sale 90.29 14.7% 13.5%
Danny Salazar 91.32 16.2% 13.5%
Luis Severino 91.61 12.6% 13.1%
Dallas Keuchel 89.88 11.0% 12.9%
Zack Greinke 91.44 12.2% 12.8%
Jacob deGrom 90.72 13.2% 12.7%
Michael Pineda 91.18 12.2% 12.7%
Stephen Strasburg 90.89 13.5% 12.6%
Lance McCullers 90.93 12.2% 12.6%
Carlos Carrasco 91.91 13.5% 12.5%
Minimum 1,500 pitches thrown

Why is Dallas Keuchel interesting? Because he’s one of the pitchers who most underperformed his expected swinging strike rate. Here’s that leaderboard.

Underperformers
Name lgev 17 xSwStrk 17 SwStrk 17 Diff 17
Mike Leake 91.56 11.0%  8.4% -2.6%
Martin Perez 90.92  9.7%  7.2% -2.5%
Jordan Zimmermann 92.02 10.5%  8.1% -2.4%
Miguel Gonzalez 91.00  9.1%  6.7% -2.4%
Andrew Cashner 92.43  8.3%  6.0% -2.3%
Wade Miley 90.80 10.1%  8.1% -2.0%
Bartolo Colon 92.17  7.3%  5.3% -2.0%
Dallas Keuchel 89.88 12.9% 11.0% -1.9%
Jon Gray 91.65 10.8%  8.9% -1.9%
Daniel Norris 92.21 11.0%  9.1% -1.9%
Matt Harvey 92.38  9.3%  7.4% -1.9%
Mike Montgomery 91.42  9.8%  8.1% -1.7%
Matt Garza 90.85  9.7%  8.0% -1.7%
Kyle Hendricks 89.55 10.1%  8.5% -1.6%
Yovani Gallardo 91.00  9.9%  8.3% -1.6%
Minimum 1,500 pitches thrown

There are enough sinkerballers on this list to consider whether some of these pitchers chose to chase weak contact and ground balls instead of going after whiffs. Don’t know what’s going on with Jon Gray here, but since his curveball varies so much between home and away starts, it’s possible there’s a Coors Field effect for him—overall movement looks better than actual movement at home, for example.

Can we look at the overperformers and learn anything from the dichotomy between the two groups? Here are the over-performers, the ones that got more swinging strikes than our equation predicts.

Overperformers
Player lgev 17 xSwStrk 17 SwStrk 17 Diff 17
Clayton Kershaw 91.392 10.50% 13.30% 2.80%
Danny Salazar 91.318 13.50% 16.20% 2.70%
Drew Pomeranz 90.923  7.70%  9.90% 2.20%
Aaron Nola 92.444  9.30% 10.80% 1.50%
Max Scherzer 90.832 13.80% 15.30% 1.50%
Zack Godley 91.044 11.90% 13.20% 1.30%
Chris Sale 90.286 13.50% 14.70% 1.20%
Carlos Carrasco 91.908 12.50% 13.50% 1.00%
Stephen Strasburg 90.887 12.60% 13.50% 0.90%
Corey Kluber 89.605 14.90% 15.60% 0.70%
Jimmy Nelson 90.669 10.60% 11.30% 0.70%
Mike Fiers 91.199  8.30%  9.00% 0.70%
Jaime Garcia 91.243 10.60% 11.20% 0.60%
James Paxton 91.308 11.90% 12.50% 0.60%
Jacob deGrom 90.719 12.70% 13.20% 0.50%
Minimum 1,500 pitches thrown

Command may be one part of this group’s ability to perform above its movement and velocity, even if location was a variable. Of the top 10 overperforming starters, all but one had better command than the average of the top 10 underperforming pitchers by a command metric Andrew and I developed (lgEV). That metric represents the major league-average exit velocity in the locations the pitcher throws to, with a lower number being better.

That number was created after we contemplated the following visual, which showed the average exit velocity allowed by right-handed pitchers facing right-handed batters, bucketed by location. We added a side-by-side comparison of Aroldis Chapman for added context.

We’ve had Bill Petti’s EDGE%—how often a pitcher throws to the edges of the strike zone—for a while, but once you look at this picture, you realize the edges aren’t all created equal, and that there are places fairly deep within the zone that are still decent (read: blue) for the pitcher. By mapping it this way, we can ask the more exact and granular question: Who throws it most often to the places that produce the lowest exit velocities?

Create those bins again, weight them the same way we did for expected swinging strike rate, and what comes out is a number that represents the major league-average exit velocities tailored specifically to the locations each pitcher throws to. In other words, it’s a location-based expected exit velocity allowed–process derived from outcomes.

Is it safe to assume every pitcher wants to throw the ball to places that produce low exit velocity? I think it is. Every pitcher wants to avoid hard contact, most pitchers understand that heat map and have internalized it. Here are the pitchers who fared best by lgEV this past year:

lgEV Leaders
name xSwStrk 17 lgEV 15 lgEV 16 lgEV 17
R. A. Dickey  9.0% 93.98 91.24 88.65
CC Sabathia 10.0% 92.32 90.95 89.38
Kyle Hendricks 10.1% 93.27 91.92 89.55
Corey Kluber 14.9% 91.53 90.95 89.61
Dallas Keuchel 12.9% 91.53 92.18 89.88
Jake Odorizzi 11.0% 92.65 91.69 90.13
Eduardo Rodriguez 11.2% 92.73 90.95 90.19
Marco Estrada 11.0% 94.34 91.07 90.24
Chris Sale 13.5% 92.46 92.06 90.29
Clayton Richard  9.1% 93.05 92.97 90.46
Matt Boyd 10.3% 94.22 91.50 90.54
Ervin Santana 11.5% 92.37 92.18 90.62
Johnny Cueto 11.8% 91.76 91.44 90.64
Jimmy Nelson 10.6% 92.29 92.47 90.67
Masahiro Tanaka 15.4% 92.13 92.02 90.69

Dickey may be a surprise at the top of this list, but it’s worth noting he’s been erratic from year to year. In terms of a three-year average, the top three here are Corey Kluber, CC Sabathia, and Keuchel. That makes a lot of sense intuitively.

Kluber constantly front- and back-doors his sinker and drops his breaking balls in at the knee, and those spots create weak contact all the time. Sabathia had a resurgence this past year throwing his cutter inside to lefties, which is an obvious pitcher’s spot on the graphic above. Keuchel lives outside the zone and low and yet somehow coaxes batters to swing at those pitches. This all sounds like command to me.

Back to the over and under performing xswSTR guys, though. This command stat doesn’t explain the difference between prime xswSTR underachiever Dallas Keuchel and overachiever Clayton Kershaw, not really. Keuchel was the fifth-best starting pitcher by lgEV last year, Kershaw was 56th. With count included, it’s not easy to blame the catcher’s framing ability, either. Keuchel had awesome command and still didn’t get as many whiffs as he could from his stuff.

Back to the fact that Keuchel is legendary with the throwing-outside-the-zone thing, though. We have location as one of the variables. The 15 who most underperformed their expected swinging strike rate averaged a 45.5 percent zone rate, and the 15 who most overperformed their expected swinging strike rate averaged a 47.2 percent zone rate.

If you throw outside the zone, you’ll necessarily throw to places with low exit velocities, which will make your command stat look good. But if you throw too often outside the zone, you’ll get into bad counts that will affect your xswSTR number negatively. Perhaps the relationship between location and outcomes needs to be bucketed finer or poked further.

But there’s also something interesting here when it comes to our approach and this issue. There’s a long-held truism among pitchers that the art they practice is “making balls look like strikes and strikes look like balls.” Our command stat really seems to be measuring the pitcher’s ability to make balls look like strikes; you’re not going to be around long if you exclusively throw outside the zone to the blue and don’t come in enough to coax the swings. That’s walk city.

What xswSTR seems to be measuring is the other skill, making strikes look like balls. By including location and count along with movement and velocity, we’re rewarding a pitcher’s ability to do well inside the zone.

Could we also be measuring deception, then, in some weird way? That’s a way to make strikes look like balls and do well on pitches thrown inside the zone.

Let’s look at the furthest outliers from the year using an angle behind home plate. Here’s the biggest underperformer, Mike Leake.

And then the No. 1 overperformer, Kershaw, from this past World Series. Notice the hitch, the over-the-top release point.

Is it easier to pick up Leake’s ball? I’d argue it is. Kershaw does a whole manner of things that make it tough to time him.

That sort of stuff is traditionally difficult to put a number on. You might need ball recognition technology and a mountain of interns to code and prepare home plate footage on every pitcher in order to say something as simple as, “You can see the ball for 0.5 milliseconds longer when facing Kershaw then when facing Leake.” Possible but not currently probable.

The results on xswSTR were decently sticky year to year, about on par with walk rate or first-strike percentage. But that means swinging strike rate itself is still currently more useful than this number. Check out the error numbers in the chart below.

Error
Stat 2015-2016 2016-2017 2015-2017
xSwStrk MSE 0.000363 0.000406 0.000482
SwStrk MSE 0.000358 0.000394  0.00053
xSwStrk to SwStrk R    0.675    0.646    0.573
SwStrk to SwStrk R    0.743    0.713    0.634
Pitchers with minimum 500 pitches thrown in 2015, 2016 and 2017

It’s not quite what we wanted, but on the bright side, this chart also means we’re close, and we’ve done so with an approach that hints at success when it comes to putting a number on command and deception.

The next iteration will define movement off the fastball; that should lead to a leap forward. In the future, we also could try to bring sequencing into the equation—the movement, location and velocity of the pitch right before the pitch in question. Though with this bucketing approach, we’d have to have a very large number of pitches going in to make it work. Andrew feels maybe a random forest approach may help improve the data, even if early results were not encouraging.

In the meantime, we’ll have to wonder how, exactly, all that movement and velocity combine to make the batter miss.

References & Resources


With a phone full of pictures of pitchers' fingers, strange beers, and his two toddler sons, Eno Sarris can be found at the ballpark or a brewery most days. Read him here, writing about the A's or Giants at The Athletic, or about beer at October. Follow him on Twitter @enosarris if you can handle the sandwiches and inanity.