Pitch run value and count

by Max Marchi
December 4, 2009

Pitch run values have been around for a while. When you assign the run value to a pitch, two factors contribute to the final number: the outcome of the pitch and the count on the batter before the pitch was thrown.

As John Walsh showed when he first introduced pitch run values, the difference between a strike and a ball is much higher on a full count (-.349 vs .271) than on the first pitch (-.044 vs .038). So the pitch count is already factored in the equation and we can forget about it, right?

Not so fast.

You probably remember that many times our (now Tampa Bay Rays’) Josh Kalk, when presenting the most effective pitches or the most lethal pitch combinations, specified that he had adjusted run values for the pitch count. And surely, you have read, in the comment section of an article, MGL criticizing the author for not having adjusted for pitch count.

What’s happening? Haven’t we already accounted for pitch count? A strike on 1-0 has a value of -.053 runs, while it’s -.062 on 0-1. Why does the need for an adjustment resurface?

Let’s go graphical.

A slider is thrown by a right-handed pitcher at the location shown above to a right-handed batter. What’s the expected run value of such a pitch? Here we are oversimplifying, pretending that only the location influences the effectiveness of the pitch, while Jeremy Greenhouse at Baseball Analyst has proposed a more advanced model that makes run value dependent on location, movement and speed.

Here’s the average run value of a slider (from a righty to a righty) according to its location (data MLB 08/09).

The hypothesized pitch, still visible on the chart, will produce on average -0.008 runs.

What happens if we calculate the expected value for the same pitch on a 1-0 count versus a 0-1 count?

Something is counterintuitive when comparing the pair of charts above: Batters fare better on sliders down the middle when they are behind 0-1. The possible explanation is that when they’re ahead 1-0, hitters aren’t sitting on the slider (or they are simply waiting for an easier pitch to hit), thus the outcome is usually a strike (-0.053 runs). On the contrary, on a 0-1 count, they can’t afford to fall behind 0-2, thus they swing at sliders clearly in the zone with moderate success.

However, our pitch is expected to produce 0.017 runs if delivered on 1-0, -0.016 runs on 0-1.

You surely don’t need next chart to know what’s going on, but let me show it just to confirm what everyone is expecting.

Hitters expand their zone when they fall behind (the 50 percent swing zone on 0-1 has an area two and a half times greater than on 1-0), thus swinging at pitches that are harder to reach or to make good contact with.

Add to the mix that a pitcher who is ahead tries to exploit the expanded strike zone of the batter (look below), and that a batter who is sitting on a favorable count can afford to let go a pitch he doesn’t like, and you are back to the run value charts shown above.

A Hardball Times Update

by RJ McDaniel

Goodbye for now.

Let’s now make up an extreme example. Suppose two identical pitchers exist. They both have an average fastball and a very peculiar slider: that slider always nails the location we have used insofar.

Now, Pitcher A throws the slider only on 1-0 counts, while Pitcher B delivers it only when on 0-1. Using run values unadjusted by pitch count would show that Pitcher B’s slider is better than Pitcher A’s, while the only difference is in the pitch selection.

Thus, while pitch count is already factored in the calculation of pitch run values, we can’t let it out of our analyses, especially when evaluating effectiveness of pitches/pitch combinations.

References & Resources
John Walsh—Searching for the game’s best pitch.
Joe P. Sheehan—More Run Values.

4 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

MGL

15 years ago

Thank you! I have been yelling about this for a while, as you indicated.

There is one other issue. When you look at the 0-1 and the 1-0 data, the pool of pitchers is different. In order to find the difference between the run values of the 0-1 and 1-0 sliders, you have to control for the identity of the pitcher by using the “delta method” or something like that.

IOW, the difference between the 1-0 and 0-1 slider is NOT .017 minus -.016 since those numbers are based on two different pitcher pools. What the actual difference is (for any given pitcher), we have no idea. I would guess that the pitchers who throw 1-0 sliders have better sliders, or at least more control (actually it may be a worse slider – more control but less bite – for example, Lidge throws a great slider with little control).

So, for example, a pitcher with a great slider which lacks a lot of control, might have a very good run value at 0-1, but if he threw it at 1-0, it might have a terrible run value because it would rarely be in the strike zone and the batter wouldn’t swing at it. So the gap would be larger for this type of pitcher. For a pitcher with a slider than he can control, the gap might be very small. I am speculating of course.

Nick Steiner

Very nice Max. Now how in the world do you suggest we adjust for this when dealing with large amounts of aggregate data and multiple pitch types and locations?

Max Marchi

Nick, I believe most of the time you just have to keep in mind the issue: go with the straight values, mention that different pitch selection might confound the results you are seeing. Maybe check for the pitchers at the extremes of whatever list you produce that they don’t have a peculiar behaviour on pitch selection.

Sometimes, consider whether a stratified analysis is better. Probably ahead vs behind in the count is enough for many purposes (watch out for 0-0 and 3-0 counts!).

Theoretically I would try to build a model to predict the expected run value of a pitch given its type (or, better, its speed and movement), its location, and the count when it was thrown.
I don’t think it’s an easy task, maybe multilevel modelling would be required.

I don’t remember Josh explicitly writing how he did his adjustments, and I don’t think he’s in a position now to share his methods.

MGL, what you added is very important and confirms my beliefs that multilevel modeling is necessary for this issue. Unfortunately I need a lot more training on the subject – I do not have the slightest idea on how to combine multilevel modeling and loess smoothing.

I’ve been looking into a way to predict run value based of it’s speed, movement and location, and it’s pretty much impossible. You’re right that you would need a type of multivariate LOESS, but even that doesn’t emit a closed form equation, so it might not even be applicable. I think, you would need something like a Neural Net, and a very well calibrated one at that.

BAL	CHW	ATH
BOS	CLE	HOU
NYY	DET	LAA
TBR	KCR	SEA
TOR	MIN	TEX

ATL	CHC	ARI
MIA	CIN	COL
NYM	MIL	LAD
PHI	PIT	SDP
WSN	STL	SFG