# Quantifying Pitcher Command

We begin today’s story with MARS.

Not the delicious candy bar that sticks to your teeth; nor the planet that can grow Matt Damon’s potatoes. Rather, it is Multi-variate Adaptive Regression Splines, or MARS for short. MARS is a trademarked term, so the R or Python implementation is usually referred to as “earth.” Essentially, the MARS approach to regression improves upon basic multiple linear regression in three ways:

- It breaks apart each regression line into multiple formulae (for example, incremental fastball velocity below 94 mph has a different value curve than velocity above 94 mph).
- It prunes terms that aren’t beneficial to the model and pares it down to the important factors.
- It can uncover relationships between variables (say, location and velocity).

I’ll step away here and encourage you to read the Wikipedia article linked above for a more thorough explanation.

### Location, Location, Location

I began by breaking out four-seam fastballs, sliders and curveballs–in the context of generating swinging strikes– to see what the model would predict. I fed in a host of variables (from PITCHf/x data) that included location (px, pz), movement, velocity and spin rate. The models for each pitch produced a consistent, yet surprising, result: the only important factors were vertical location (most important) followed by horizontal location.

According to the predictive model, the best way to predict whether a pitch will induce a swinging strike is to look at the location. Movement and velocity almost irrelevant to the equation. The fastball model concluded that four-seam fastballs are twice as effective when thrown above 28 inches compared to below 28 inches. Curveballs were only good when thrown below 28 inches. None of these tidbits are new, but it was neat how the model just spit them out organically.

### The Most Important Aspects of a Pitch are Pitch Type and Location

At first I grew frustrated with this type of modelling–which capped out at an R-Squared of roughly 0.2–since either (a) I had hit my knowledge limit on applying predictive models to swinging strikes, or (b) there was a critical piece of information I was missing. In other words, if there is such a linear relationship between location and result probabilities (shameless plug to my inaugural piece), why wasn’t I getting much stronger correlations? (I used a random forest approach as well, which did boost predictive power, but it wasn’t much better).

So I had a “duh” moment, specifically, that basic common knowledge of how effective a pitch is has a lot to do with the batter’s ability to hit a particular pitch type in a particular part of the zone. This begged the question, what would happen if we sliced the zone into 18 arbitrary parts, and simply added up the results you would expect a pitcher would have, based on the batter’s ability to do something with the pitch type thrown and the location?

### A Simple Approach to Quantifying Command

Occam’s razor suggests that the simplest answer is often the best answer. The model I will be presenting relies on two basic assumptions:

- Major league scouting is capable of rapidly identifying weak spots with respect to a batter’s ability. This model will use yearly hitter results (split by pitch type and zone) as a proxy for how effective a batter is with respect to pitch types and locations.
- Major league pitchers will attempt to exploit these weaknesses by pitching to areas where the specific batter will do the least damage. (This may or may not be the case, since some pitchers may pitch everyone the same way, and we are ignoring the effect the specific match-up has.)

Thus, the solution is simply to measure which pitchers are the best at optimizing where they throw their respective pitches to which parts of the zone in the context of what the hitter will do in that zone to that pitch type. This will be slightly skewed to pitchers who throw more off-speed pitches, but that to me speaks to command as well, since you can only throw a lot of off-speed pitches if you can command them.

**Command = a pitcher’s ability to generate surplus value by locating the pitch where the batter is least effective**

I played around with a bunch of arbitrary zone splits and ultimately settled on a three-by-three grid within the strike zone and eight zones ringing the strike zone, plus one “way outside zone.” Next, I aggregated each batter’s ability in the context of a variety of results: (Slugging on Balls in Play, Swinging Strike Percentage, Called Strike %, Ball%, GB%, LD%, PU%, FB%). Then, I simply added up all the pitches a pitcher made to those batters and came up with a seasonal command score with respect to the variables above.

### How predictive is command score year to year?

Very. I’m going to show you a series charts which show Year-One to current year correlations for 2142 pitcher seasons in the context of the above command variables. Let’s begin with our favorite pitcher metric: specifically, the ability to generate a whiff. Before you look at the chart, I want to stress that the swinging strike rate estimated below **has nothing to do with how hard the pitcher throws**.

#### Year to Year correlation for Swinging Strike Command | R-Squared = 0.52

I bring up the velocity point because when you see Aroldis Chapman at the top of a list, your gut assumption would be that it’s due to his velocity. What this chart is saying about Chapman is, based on the type of pitches he throws and where he throws them (within the context of the batter), he would generate more swinging strikes than Koji Uehara, even if he threw as soft as Uehara.

A lot of this has to do with the “effectively wild” strategy. Tyler Clippard has two seasons around the Chapman 2014 season (orange circles). Unsurprisingly, Bartolo Colon and Jim Johnson do not throw pitch type + location combinations that generate swinging strikes.

**The year-to-year R-squared correlation of swinging strike command is 0.52,** indicating this is definitely skill based. You can also see that pitchers tend to cluster around the same areas. R.A. Dickey is likely a product of pitch-type more than command, but it does take a lot of command to pitch a knuckleball effectively. Note that Colon, circa 2011-2012, based on batter-specific pitch type and location was among the worst at generating swinging strikes.

#### Year to Year correlation for Called Strike Command | R-Squared = 0.36

You’ll notice Colon is really good at throwing pitches in spots where hitters will take the pitch for a strike, and with diminished stuff, this perhaps a clue to how he continues to be effective. He is exceptional at putting pitches in the strike zone where a batter isn’t likely to swing at it. Burke Badenhop is interesting, as he appears to have leveraged this ability to generate a lot of first-pitch strikes between 2011 and 2014).

Also note how Colon is clustered in the 21-22 percent range, which is slightly above Cliff Lee (between 20 to 21 percent). What this shows is that certain pitchers focus on optimizing for called strikes, and based on my small sample size of two pitchers, would suggest mastering this skill can be a very effective strategy.

#### Year to Year correlation for Called Ball Command | R-Squared = 0.45

Lee backs up his ability to gather called strikes by an equal ability to command his pitches in locations where they won’t be called balls. What stands out to me as truly impressive is Kenley Jansen, who is consistently at the top end of ball-suppression command but is also, if you scroll back up, at the top end of the ability to get swinging strikes. (Again I want to stress that this does not assume anything about the “quality” of the pitch, so whether or not Jansen has a “great” cutter has no influence on this model.) In essence, this demonstrates that Jansen’s top-level pitching ability is predicated on top-shelf command, both in avoiding balls and in generating swinging strikes.

#### Year to Year correlation for Slugging on Balls in Play | R-Squared = 0.10

Location and pitch type slugging suppression isn’t a repeatable skill for most pitchers, with an R-Squared correlation around 0.10, unless of course your name is Brad Ziegler…

…where you’ll post some of the best command-based slugging suppression. As per the above notes, this says nothing about the “quality” of the pitch; it speaks only to Ziegler’s batter and pitch-type location command, which has afforded him elite-level slugging suppression.

#### Year to Year correlation for GB% | R-Squared = 0.65

For groundball percentage, we see an incredible year-to-year correlation, demonstrating that generating ground balls is a product of pitcher command. We see Ziegler at the top end of the spectrum, with Chapman and Jansen at the extreme low end.

What about pop-ups?

#### Year to Year correlation for PU% | R-Squared = 0.36

### Estimating Next Year’s ERA using the Command Variables

Can the above variables tell us anything about a pitcher’s ERA, all while ignoring “stuff?” Surprisingly, we actually can estimate next year’s ERA with an R Squared of .108, or slightly better than ERA itself. This means that we can explain roughly 11 percent of a pitcher’s ERA in the following season simply by measuring command in the context of the pitch type, location and batter.

### Conclusion

I think this is an intuitive way to measure pitcher command. The next step will be measuring “stuff” – specifically, how much better did the pitcher perform in the context of what we would have expected? We already saw that looking at only pitch type, location and batter, we already can predict a pitcher’s next-year ERA better than the current-year ERA. What will happen when we introduce “stuff” variables? Stay tuned…

This is really, really great. Well done.

Fascinating stuff. I wonder where Clayton Kershaw ranks in these various facets – my guess would be great on swinging strike command and called ball command.

This is a very interesting article. It was very cool to see graphed how some guys (The Chapmans, Clippards, and Koji’s) get by on nibbling at the edges of the zone and catching antsy hitters trying to either catch up to 103 MPH fastballs upstairs or go digging for changeups that look good low while others, like Colon, pound the zone and challenge hitters to reckon with his masterful fastball movement manipulation.

I am a bit unsure, however, about the notion of command being what’s really quantified here. Classically defined, a pitcher’s command is his ability to “hit his spots”. I think you hit the nail on the head with “effectively wild”, but that’s not really command, and while Colon pounds the zone effectively, I’m not so sure he’s particularly great at hitting the glove, so that’s not really “command” either.

Slicing up the zone into 18 sections also seems a bit arbitrary. If the idea of “command” is more toward your definition of hitting spots where the hitter is more likely to fail, I don’t think the pitcher and catcher conceive of the strike zone in so many discrete areas. More likely than not, they’re trying to hit one of the four corners and occasionally a bit below or above. Catchers by and large set up almost exclusively around the lower third and by the corners, so if the pitcher happens to miss well, he’ll get credited in this model, but that may not be what “command” is.

True, I am straying from the classic definition of “command” and arguing that really command is all about throwing a pitch type in the spot where the batter will do the least damage, which is a combination of scouting, execution and strategy. So yes, it is not true command, but more of a measure of who is maximizing their stuff the best, or put another way, ignoring the “quality” of the pitch, who is getting the most out of their stuff?

Re: the 18 zones, the issue boils down to balancing sample sizes and granularity. If you split the zone up into too few sub-zones, you don’t get enough differentiation between each zone; on the flip side if you have too many, you don’t have enough data points for each zone, batter and pitch type to generate anything but noise. I played around with a few configurations and settled on the 18 zones, which had the strongest signals. I would argue in today’s analytics-inclined world, pitchers do target very specific areas, to optimize outcomes (those that can), though I don’t have any data to prove that.

I’m not at all convinced that MARS is the best tool for this. Your justifications don’t really hold water with me, and the Wikipedia article doesn’t really add much besides specifics. Your first argument seems to be essentially that MARS allows for non-linear relationships. But dealing with non-linearity in multivariate least-squares regression (I assume that’s what you mean by multiple linear regression because of all the R-squareds being thrown around) is quite possible already. R-squared is a poor way to assess model fit, by the way. Least-squares is a pretty poor way to build a model, too, but I digress.

Your second argument is that MARS “prunes” terms. The Wikipedia article doesn’t seem to go into much detail on the specifics of the backwards pass, but letting an algorithm specify your model seems problematic. That is precisely the place where the researcher is most key. Using an algorithm to drop terms baffles me. This seems like data mining at its worst. Not only that, but the most important part of model specification seems to be missing from your analysis: model comparison. Maybe you built several models and evaluated them, but I’m not seeing it in your article.

Finally, your third argument in favor of MARS is that you can find relationships between variables. But you can do this in least-squares with interaction terms. From what I gather, two of your three arguments might not even be correct, and the third devolves the analytical process into data mining.

These methods seem to me to be dead on arrival. I can’t buy into any of your conclusions as a result. I think this analysis is emblematic of the replication crisis in science today. Some researchers (maybe even most) are content to toss a bunch of variables into a box, pull a lever, and publish the results. In the case of scientific researchers, their box is statistical significance. In the case of this article, the box is MARS. Both seem to be examples of bad ways to build knowledge.

You did quite a lot of interesting work that is highly laudable. However, I think your definition of command is wrong. To me (and I bet to most in the pitching profession), command is the function of a pitcher’s success in locating a pitch where he intends. Unfortunately, I don’t think it’s measurable short of interviewing a pitcher on his intent for every pitch – at least not yet.

This is outstanding work. This data will follow suit of the mission of our website launch sometime next year. We are happy to see we are not the only ones looking at the importance of command. Keep up the good work!