Pitch classification revisited

I haven’t published any article for some time (did you notice?), but I have been able to fine-tune the classification algorithm that I first introduced in the article Rider, slurve and… Titanic.
I have been able to classify all the pitchers’ pitches and, after performing comparisons with BIS classifications (thanks to FanGraphs), MLBAM‘s and C. Sven Jenkins‘s notes at 60 ft 6 in, I consider myself quite satisfied with my work.

I’m not giving out the details of the work behind the scenes right now. In fact I’m planning to show them at the next PITCH/f/x Summit at the end of August (and then write some articles here at THT). However, I promised in the comments a couple of months ago that, as soon as I felt comfortable with my classification, I would release the pitchers’ repertoires according to my method. Thus you will find a spreadsheet at the end of the article.

Hey, wait! Don’t skip immediately to the spreadsheet, please.

First, let’s see what’s new since last time I reported on the subject. I classified all the pitches in the PITCHf/x database. That means that—yes—I’ve been able to tame the lefties, who were initially so elusive to my algorithm.

The final number of clusters is 17, up from 14; thus I needed to give some new names to a few pitches.
Here they are (all 17), with their average values of speed and movement. I’m still open to suggestions for the vernacular.

              type speed h.mov v.mov
           heater  94.3  -8.3   7.4
 jumping fastball  93.0  -5.0   9.9
           sinker  90.3  -9.5   2.8
            rider  89.9  -8.4   6.6
  rising fastball  89.2  -4.8   9.7
           cutter  88.8  -0.4   6.6
 low-arm fastball  87.9  -9.2  -3.6
      hard slider  85.4   1.6   3.0
     power change  84.0  -7.4   3.2
    riding change  83.6  -6.0   6.3
     sharp slider  82.8   2.3   0.9
  straight change  79.7  -5.9   5.9
 low-arm offspeed  79.3  -7.5  -5.3
      tight curve  78.9   4.6  -4.9
           slurve  78.8   5.0   0.8
 roundhouse curve  74.3   5.9  -5.9
          floater  69.8   2.3   4.0

I’ll briefly outline the differences since the previous version.
{exp:list_maker}The pitches coming from low angles are now split into two groups, fastballs and offspeed, and I’m fine with it.
There are three change-ups, up from two. One is a straight slow ball (Tim Wakefield’s fastball and emergency pitchers’ offerings fall into this bucket, together with some regular changes). The other two are separated by a couple of miles and a few inches of movement (the one that travels faster stays up more, the other has a bigger tail on the throwing side).
Finally, we have a second slider. I dubbed “hard” the one with more velocity and “sharp” the one with more horizontal movement (should I say bite?).{/exp:list_maker}
I’ll say it again: I’m open to suggestions for improving the labeling.

One of the reasons leading me to undertake this task was that I suspected some hitters could be very effective against one type of curve (or change, or slider) and nearly helpless against a different type.
Let’s see if my reasoning holds ground.

The following hitters performed a lot better against tight curves than against roundhouse curves (data from 2008 and 2009 combined, minimum 40 pitches of each kind faced).

           player RV difference
1      Glaus Troy 0.108
2   Coghlan Chris 0.077
3   Berkman Lance 0.072
4    Cedeno Ronny 0.068
5      Gwynn Tony 0.068
6      Tracy Chad 0.055
7  Hairston Scott 0.054
8     Davis Chris 0.053
9  Rodriguez Ivan 0.050
10 Tatis Fernando 0.050

These other players behaved in the opposite way, having more success against the curveballs of the slowest type.

            player RV difference
1      Rolen Scott -0.108
2     Wells Vernon -0.102
3     Millar Kevin -0.091
4      Davis Rajai -0.083
5    Escobar Yunel -0.076
6      Jeter Derek -0.072
7  Cuddyer Michael -0.072
8   Delgado Carlos -0.070
9      Aybar Erick -0.069
10 Dickerson Chris -0.068

The following histogram shows the distribution of MLB players.


If having different success against the two types of curveballs were a repeatable skill, we would expect to find the same players in the above lists year in and year out. Looking at the following scatter plot, we see no correlation between “favorite type of curve” in 2008 and 2009.


I found similar results comparing change-ups and sliders (see small charts below).


Looking at two years of data, it seems there aren’t players who constantly crush a slow curve and are helpless against a tight one; similarly no differences appear for change-ups and sliders.

I need to note that I used crude run values, unadjusted either for pitch count or pitcher. This means that a hitter might have faced many tight curves by great pitchers on 0-2 counts in 2008 and a lot of them by replacement hurlers on 2-1 counts in 2009. (Note: if you are not sure what I’m talking about, I suggest this read.)

Adjustments might give us different results, but I feel if something was going on we would have seen some indication of correlation in the charts. Maybe in another year and a half, when we have four full seasons of PITCHf/x data (and thus more pitches faced of each kind) we might be able to see something.

Anyway I feel this is a non-issue, and I’m sure I’ll find more interesting uses for my classification algorithm.

A Hardball Times Update
Goodbye for now.

The spreadsheet.

Here is the link to the promised spreadsheet.

Things to note.
As I said at the beginning of the article, I checked my classifications against BIS’ (from Fangraphs), MLBAM’s and C. Sven Jenkins’. I did the comparisons on an individual basis for a couple of dozen pitchers—I’ll show a few of them in future articles. While I’m really satisfied with what I have seen on those pitchers (and I tried to sneak in those pitchers I thought would put the system to a real test), many eyes will surely help me find where my system fails.

The spreadsheet is based on 2009 data. Pitchers with limited pitches thrown in that season might show strange results—that’s the case with Brandon Webb, who toed the rubber for just four innings in 2009. Issues like this will be addressed in future refinements of the system.

References & Resources
Pitch classification by the author was compared with:
BIS’ (from FanGraphs);
C. Sven Jenkins’ (60 ft 6 in).

Newest Most Voted
Inline Feedbacks
View all comments
Peter Jensen
13 years ago

Max – Interesting article.  Is the table for the pitch speeds and movement by pitch classification only for right handers with left handers having mirror image horizontal movement for the same classification?

It would be helpful if you included MLB number and handedness for the pitchers in your spreadsheet.

Max Marchi
13 years ago

A couple of swing and misses here for me (another one and I’m out!).

The values Peter refers to are the ones I used to train the algorithm and they come from righties only; I should have published the average values after the full classification for RHPs and LHPs. Also I’ll keep in mind to put MLB ids next time.

Re. missing pitchers (strike two!): I had noticed that in a first draft (Joba was the one who caught my attention) and fixed it, but somehow the wrong version sneaked into the spreadsheet anyway.

Note: six as the maximum number of pitch types for pitcher is a constraint I decided to put into the model (explanations will come); I acknowledge I’m losing something for a few crafty pitchers, but I felt the constraint necessary to deal with the game-to-game variability in PITCHf/x calibrations.

13 years ago

I’m confused. Max says this is from 2009, yet I show 33 players who pitched in 2009 (according to fangraphs) who are not in the database. This is not nobodies, this includes guys like David Aardsma, Joba Chamberlain, Trevor Hoffman, and Brian Wilson, who played major innings in 2009 (I can provide a full list should you desire). Conversely, I show 17 players on his list who did not play major league baseball in 2009 (ditto). What happened there?

I did a little analysis of the information you have presented. That wall of text follows.

Here’s the pitcher who threw each pitch most often in 2009 (simplified by assuming each pitcher throws the same number of pitches per inning).
Power Change – Braden Looper
Hard Slider – Doug Davis
Heater – Ubaldo Jimenez
Sinker – Derek Lowe
Sharp Slider – Derek Lowe
Riding Change – Jair Jurrjens
Roundhouse Curve – Adam Wainwright
Rider – Adam Wainwright
Cutter – Roy Halladay
Slurve – Roy Halladay
Jumping Fastball – Matt Garza
Tight Curve – A.J. Burnett
Rising Fastball – Jered Weaver
Low-Arm Fastball – Brad Ziegler
Straight Change – Cole Hamels
Low-Arm Offspeed – Brian Shouse
Floater – Tim Wakefield

Pitchers are credited with between 1 and 6 different pitches. Of the 128 pitchers in the database with more than 100 innings in 2009, only 2 (Tim Lincecum and Tim Wakefield, two very different pitchers) had only 3 pitch types recorded. 27 pitchers had only 4 types, 59 pitchers had 5 types, and 39 pitchers had 6 types. 6 types is the maximum recorded for any player.

If each pitcher throws the same number of pitches per inning (to simply things), then league distribution breaks down like this:
Jumping Fastball: 21.8%
Rider: 15.8%
Rising Fastball: 11.4%
Hard Slider: 8%
Sharp Slider: 6.7%
Heater: 4.6%
Tight Curve: 4.6%
Roundhouse Curve: 4.5%
Cutter: 4.4%
Power Change: 4.3%
Riding Change: 3.6%
Sinker: 3.4%
Straight Change: 3.1%
Slurve: 2.4%
Low Arm Fastball: 0.8%
Floater: 0.4%
Low Arm Offspeed: 0.2%

13 years ago

Also, the pitchers who throw closest to the league average pitching ratio were Eric O’Flaherty, Ryan Dempster, and Tom Gorzelanny.

The pitchers who differ the most from the league average are exactly the guys you’d think it would be: the submariners and knuckleballers. Mariano Rivera and the cutter he throws 94% of the time isn’t far down this list either.