Pitcher Respect: A Strange but Effective Way to Project Hitter Performance

by Jesse Wolfersberger
March 26, 2015

Coco Crisp’s “respect” has fluctuated a lot over the past five seasons. (via Keith Allison)

You can learn a surprising amount about a hitter by looking at how he’s pitched. Good hitters are thrown fewer fastballs and fewer strikes than worse hitters. Using only pitch-selection variables, it is possible to make a model that describes the quality of the hitter, and does so with surprising accuracy.

Pitchers know the lineup they are facing that day. They plan for each hitter, and make decisions about which ones they can attack and which ones they have to be more careful with. If pitchers are choosing to pitch around or to pitch carefully to Hitter X, the odds are, Hitter X is pretty good.

This idea has gained steam recently, with Eno Sarris and Jeff Zimmerman writing about the predictive effect pitch selection this may have on hitter breakouts (and both of them were inspired by this piece from Rob Arthur). Indeed, in this article I take this concept, build it into a model, and find it does have some limited predictive potential.

I will be using three variables to measure how pitchers approach hitters:

Zone% — the percentage of pitches thrown to a hitter that are in the strike zone
F-Strike% — the percentage of plate appearances that begin with a strike
Fastball% — the percentage of pitches thrown to a hitter that are fastballs

Admittedly, this is not a completely exhaustive list. A deeper analysis could include context-specific pitch selection, lineup quality, ballpark, and pitch sequence variables. Room for improvement, but these three variables give us a good idea how the major league teams approach pitching to each hitter.

Below is the regression output of a model that uses hitter wOBA and the above “pitcher respect” variables for data in the same season. The sample is hitters from the 2010-2014 seasons who had at least 150 plate appearances each in the T-0 and T-1 seasons. This creates a sample of just over 1,500 player-seasons.

Hitter wOBA Year T-0 =

R-squared (adj) = 0.18
Variable	Coefficient	SE Coefficient	T-Score
Constant	0.69	0.02	34.39
Zone% T-0	-0.26	0.04	-6.82
F-Strike% T-0	-0.27	0.03	-10.53
Fastball% T-0	-0.16	0.02	-6.81

This tells us a few things. First, all three of the independent variables have coefficients which are highly significant, each with a T-Score beyond 6.8 (anything above 1.98 is statistically significant at the 95 percent level). Second, the adjusted r-squared of the model is 0.18. The knee-jerk reaction of seeing that r-squared might be to throw the model in the trash, but it is important to remember that the importance of r-squared numbers is relative. This model is intentionally ignorant. We are predicting a hitter’s skill without using any hitter-controlled statistics. This is akin to predicting a presidential election without using any polling data, or a dealer’s car sales without knowing any prices or features. In this context, 0.18 works for me.

There are certainly better variables to measure a hitter. Strikeout percentage, walk percentage, and isolated power, for example, are each are superior ways to measure a hitter’s abilities. If you used those, plus other hitter-controlled variables and add some aging variables, they would definitely come out significant and the r-squared of the model would be about 0.7 or 0.8. That is what projection systems such as Steamer and ZiPS do, and why those systems are the best publicly-available tools we have for predicting hitter performance. The question put forth in this article is “Can looking at pitcher approach tell us something new about that hitter?” Not better. New.

The in-season model is significant, so the next hurdle to clear is a predictive model. This model uses the same variables, but the three independent variables are lagged by one season. Put another way, “What does last year’s pitcher respect tell us about next season’s hitter performance?”

Hitter wOBA Year T-0 =

R-squared (adj) = 0.09
Variable	Coefficient	SE Coefficient	T-Score
Constant	0.58	0.02	27.16
Zone% T-1	-0.10	0.03	-3.07
F-Strike% T-1	-0.24	0.03	-8.90
Fastball% T-1	-0.13	0.02	-5.44

Things get less significant across the board, but this is to be expected because predicting next season is harder than using in-season numbers. The coefficient estimates – what we really care about – are still highly significant. Again, do not look at that 0.09 adjusted r-squared and throw your hands up in disgust. Even if you used everything at your disposal, the r-squared would be around 0.2 or 0.3. Simply put, 0.09 is pretty good.

The plain language of what this model says is, “Tell me nothing about a hitter but how pitchers pitched him last season, and I’ll give you a good idea of how he’ll perform next season.” That is a neat trick, but does it provide any real value? Is it purely academic? The third and final model will attempt to answer those questions.

In this model, the dependent variable is the residuals of the Steamer projections, meaning the difference between Steamer’s wOBA projection for a player going into the season and that player’s actual wOBA that season. The independent variable is the “Pitcher Respect” projections for that player. If we find significance here, it means that pitcher respect tells us something about the player that Steamer does not.

Steamer Residuals =

R-squared (adj) = 0.01
Variable	Coefficient	SE Coefficient	T-Score
Constant	0.07	0.02	3.04
“Pitcher Respect” Projection	-0.24	0.07	-3.48

Again, highly significant coefficient estimate, low adjusted r-squared. The “Pitcher Respect” model does give us an idea of which hitters are going to over- or under-perform their Steamer projections. It is not going to revolutionize projection systems or anything, but considering how good projection systems are today, gaining a few percentage points is great. Predicting breakouts is hard.

What this analysis reveals is that where a hitter’s skill and the perception of that hitter’s skill diverge, it increases the chances his Steamer projection will miss the mark.

Matt Joyce is an excellent example of this concept. Between 2010 and 2014, his Steamer-projected wOBAs for those five years paint the picture of a good, sometimes very good hitter. His actual wOBAs over that same time frame tell a different story – a steady decline. I submit that some of this decline can be explained by the gradual increase of pitcher respect for Joyce over that time frame.

Another, more complex example is Coco Crisp. In the past five seasons, Steamer sees him as an incredibly consistent hitter. However, his actual wOBA over that time has jumped around, alternating between over- and under-performing his projections almost every season. One factor that may explain this is pitchers are approaching him differently every season. It looks like pitchers’ respect for Crisp is simply lagged one period from his actual performance. Crisp hits well so they gain respect, he doesn’t hit well because he’s seeing more junk, they lose respect, so then he hits well again. All the while, his true talent is basically unchanged.

There are other explanations behind Joyce’s and Crisp’s wOBA trends, of course: batted ball luck, injuries, motivation, and hitting adjustments, just to name a few. The respect of the league’s pitchers is just one explanation that fits for these two particular players.

The “Respect wOBA” projections for 2015 are listed below in “References and Resources.” Reading that will quickly highlight one aspect of this model which, admittedly, does not work very well. Some hitters are simply worse at hitting off-speed pitches or have poor plate discipline. This leads to a few false positives, where the model thinks the hitter is great because pitchers are throwing a lot of junk to him, when in fact, that hitter is just overly susceptible to off-speed and out-of-the-zone pitching (or exceptional at hitting fastballs and in-the-zone pitching). This is why the 2015 list has Pedro Alvarez, Josh Hamilton, and Mike Moustakas as some of the most “respected’ hitters in baseball.

Going back to the Joyce example, his increase in “respect” over the past five seasons could be due to: 1) change in skill level; 2) change in effectiveness against off-speed pitches; 3) change in effectiveness against fastballs; 4) change in effectiveness against in-zone pitches; or 5) change in effectiveness against out-of-zone pitches. The model cannot tell the difference. The Steamer numbers and Joyce’s pitch type numbers lead me to believe that the gradual change is mostly due to skill level, but I cannot prove it. Perhaps the next iteration of this research will control for the base rate of each pitch type a hitter sees, then uses the change in those proportions as indicators in the model.

In summation, if you want to get the most accurate prediction of the quality of a hitter, use a projection system. However, there are a still a few nooks and crannies where marginal improvements can be made. One of those nooks is a pitcher’s perception of the hitter’s quality – how willing the pitcher is to go after him, especially early in the count and with fastballs. The hitters who break out and beat everyone’s expectations in 2015 are likely to be players who pitchers underestimate, too.

References & Resources

“Respect wOBA” 2015 projections
Robert Arthur, Baseball Prospectus, “Moonshot: What PITCHf/x Can Tell Us About Batters”
Eno Sarris, Just A Bit Outside, “Breakout Sluggers, Predicted by Fastballs”
Jeff Zimmerman, RotoGraphs, “Taking Hitter Analysis to Another Level”

3 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Andy

10 years ago

So of the three variables, first-pitch strike seems to be the best. Interesting that Trout manages to rank as high as top 150, given how often pitchers were throwing him a strike on the first pitch last year. At least it seemed that way. He had a very low first pitch swing rate, and pitchers seemed to catch on to that.

Let me see if by wasting some words at the beginning of this comment, I can get the rest of the comment posted. It keeps saying I already posted it, but it doesn’t post.

Joshua_C

I respect the attempt, but, having tried the same thing using similar variables, it’s hard to avoid the conclusion that the predictive power to be gained here is almost non-existent.

I think if we’re going to use this data to improve substantially upon the accuracy of ZiPS/Steamer, we’re going to have to be very purposeful with how the variables interact with other player attributes. That is, is a guy not seeing fastballs because he’s feared, or is he not seeing fastballs because he can’t recognize breaking balls? Is he not seeing balls in the zone because he’s feared, or because he swings at everything? More complex cluster analysis is needed.

There’s definitely something here, and eventually we might be able to come up with player archetypes and apply these factors to them. But for now, I don’t think applying them in a purely linear manner really gets us anywhere.

BAL	CHW	ATH
BOS	CLE	HOU
NYY	DET	LAA
TBR	KCR	SEA
TOR	MIN	TEX

ATL	CHC	ARI
MIA	CIN	COL
NYM	MIL	LAD
PHI	PIT	SDP
WSN	STL	SFG