Searching for the game’s best pitch by John Walsh February 26, 2008 Is it Johan Santana’s change-up or Jon Papelbon’s four-seamer? Maybe Fausto Carmona’s wicked sinker or Mariano Rivera’s mythical cutter? Smoltz’s slider is right there and Josh Beckett’s curveball has to be in the discussion, right? So, who has the very best pitch in the game? Does this man throw the game’s best pitch? Well, I’m kind of a numbers guy, so when thinking about this kind of question, I start to wonder if you can add anything to the conversation with a little analysis. Is there any sensible way to rate the best pitches in the game using statistical analysis? They key word here is “sensible” and we won’t know if any ranking is sensible until we try, so let’s have a go. The value of a single pitch What we need to do here is to figure out how much any single pitch is worth. Now, when a ball is put into play, we have a pretty good idea of the value of the result. And when I say value, I am talking about value in runs. The stat I’m going to use here is batting runs, which was developed by Pete Palmer before some of you were born. By the way, batting runs is also known as linear weights, which has to be the worst name given to any baseball stat in history. Anyway, batting runs measures a player’s run production above that of the average batter, by assigning a run value to the various outcomes of a plate appearance—for example, a single is worth (on average) just under half a run, the value of a walk is around one-third of a run, an out is worth around negative .25 runs, and so on. If you want to know about where these values come from (these are the infamous linear weights), a good reference is Curve Ball by Jim Albert and Jay Bennett. Okay, so we’re good with balls put into play, but what about balls and strikes, what are they worth? Only about 20 percent of pitched balls are actually put into play, so we’d better figure out the value of the other 80 percent of pitches thrown. Well, we can assign a run value to a ball or strike; in fact, it’s something I already worked through for my article on platoon effects in the Hardball Times Baseball Annual 2008, which you should go read now if you haven’t already done so. Here’s how I figure out the value of a ball not put into play. The main point to keep in mind is that a ball will move the count in favor of the batter and a strike will move it in favor of the pitcher. We can figure out how much that is worth by examining how well batters hit after reaching any given count. Let’s work through an example, which is often the best way to understand something. Let’s say an average batter steps to the plate, with the count (obviously) zero balls, zero strikes. In 2007, the average batter hit .268/.336/.423. Let’s highlight that: 0-0 count: .268/.336/.423 Okay, now let’s say the pitcher throws a first-pitch ball, bringing the count to 1-0. After reaching a count of 1-0, batters, naturally, hit better than average. Here is the line: 1-0 count: .282/.394/.459 That first ball is worth quite a bit to the batter. It turns Jhonny Peralta into Grady Sizemore. What if that first pitch had been a strike instead of a ball? Well, we look at how batters fared after falling behind, 0-1: 0-1 count: .238/.282/.362 That’s a pretty big drop-off, Jhonny Peralta has turned into Tony Pena. To get runs into the discussion, let’s look at the batting runs values for these counts, instead of AVG/OBP/SLG that I’ve shown above. Actually, here are the batting runs for each of the 12 possible ball-strike counts: Table 1 - Run Value of Any Given Count +-------+-------------+ | Count | BattingRuns | +-------+-------------+ | 0-0 | 0.000 | | 1-0 | 0.038 | | 2-0 | 0.104 | | 3-0 | 0.220 | | 0-1 | -0.044 | | 1-1 | -0.015 | | 2-1 | 0.037 | | 3-1 | 0.142 | | 0-2 | -0.106 | | 1-2 | -0.082 | | 2-2 | -0.039 | | 3-2 | 0.059 | +-------+-------------+ So, going back to our example above, a first-pitch strike is worth -0.044 runs (to the batter, of course), while a ball on the first pitch on average is worth 0.038 runs. A quick look at the above numbers will show you that the value of a ball or strike will be different for different counts. The following table shows how much a ball or strike is worth in any given count: Table 2 - Run Values of Balls and Strikes Count Ball Strike 0-0 0.038 -0.044 1-0 0.066 -0.053 2-0 0.116 -0.067 3-0 0.110 -0.078 0-1 0.029 -0.062 1-1 0.052 -0.067 2-1 0.105 -0.076 3-1 0.188 -0.083 0-2 0.024 -0.184 1-2 0.043 -0.208 2-2 0.098 -0.251 3-2 0.271 -0.349 Not surprisingly, the highest leverage occurs on the 3-2 count, where a ball results in a walk and a strike results in a strikeout. We are almost ready to start searching for the best pitch in baseball, but I need to return for a moment to balls in play. We know how much each ball in play is worth, as discussed above, but those values are relative to the average batter, or, in other words, a batter with an 0-0 count. If a batter singles with the count 0-2, the value of the single is greater than the usual 0.47 runs, because the run value of the 0-2 count was already at -.106 runs, as seen in Table 1, above. The value of the single in this case is around .58 runs, or the final run value (.47) minus the initial run value due to the count (-.106). All balls in play will be evaluated this way: the value of the plate appearance according to batting runs minus the value of the count when the ball was put in play. Two good pitchers, six good pitches Okay, let’s have a look at a couple of pitchers to get a feel for all this. Let’s start with last year’s NL Cy Young winner, Jake Peavy. The table below shows the values of Peavy’s three pitches, expressed in terms of runs above average per 100 pitches. Negative values means the pitcher gave up fewer runs than average. Run values for Jake Peavy's pitches +------------------+-----------------+---------+ | Not In Play | In Play | Total | +------------+-------+--------+---------+-------+---------+---------+ | Name | Pitch | NP_nip | runs100 | NP_ip | runs100 | runs100 | +------------+-------+--------+---------+-------+---------+---------+ | Peavy_Jake | FB | 1257 | -1.3 | 224 | -3.3 | -1.6 | | Peavy_Jake | SL | 555 | -1.1 | 139 | 1.6 | -0.6 | | Peavy_Jake | CB | 390 | -3.3 | 80 | 1.9 | -2.4 | +------------+-------+--------+---------+-------+---------+---------+ Notation: NP - number of pitches runs100 - runs per 100 pitches nip - not-in-play ip - in-play tot - all pitches As you can see, I’ve broken out the results for not-in-play and in-play pitches. When batters don’t put the ball in play, it appears that Peavy’s best pitch is his curve. On the other hand, when the ball is put into play (and these include home runs), his fastball is his most effective pitch. Overall, as we shall see, Peavy’s fastball and curve are two of the better pitches I’ve analyzed with the pitch-f/x data. Here’s another: Run values for Johan Santana's pitches +------------------+-----------------+---------+ | Not In Play | In Play | Total | +---------------+-------+--------+---------+-------+---------+---------+ | Name | Pitch | NP_nip | runs100 | NP_ip | runs100 | runs100 | +---------------+-------+--------+---------+-------+---------+---------+ | Santana_Johan | FB | 534 | -1.7 | 81 | 4.4 | -0.9 | | Santana_Johan | CU | 241 | -4.8 | 47 | 11.6 | -2.1 | | Santana_Johan | SL | 100 | -1.1 | 21 | -0.6 | -1.0 | +---------------+-------+--------+---------+-------+---------+---------+ Santana’s change-up appears to be his most effective pitch, as we might have expected, but he was better than average with the fastball and slider, as well. Now that we’ve gotten a feel for the run values of a pitch, let’s go looking for the best pitches in the game. Fastballs Who has the game’s best fastball? That’s surely open to debate, but what I can do here is show whose fastball was the most effective in 2007. My pitch classification scheme doesn’t distinguish among four-seamers, cutters or sinkers—all those pitches are considered “fastballs.” I also include only pitchers with at least 500 identified fastballs, and, finally, don’t forget that the pitch-f/x data is not complete, so not all pitchers are included in the analysis. In any case, 120 pitchers threw enough identified fastballs to make it into my sample. Here are the top 20 fastballs of 2007: Best Fastballs of 2007 +-------------------+-------+--------+-------------+-------+------------+-------------+ | Name | Pitch | NP_nip | runs100_nip | NP_ip | runs100_ip | runs100_tot | +-------------------+-------+--------+-------------+-------+------------+-------------+ | Bell_Heath | FB | 492 | -2.0 | 81 | -6.6 | -2.7 | | Young_Chris | FB | 770 | -1.4 | 169 | -7.9 | -2.6 | | Howry_Bob | FB | 539 | -2.2 | 122 | -4.2 | -2.6 | | Burnett_A.J. | FB | 554 | -1.4 | 104 | -5.0 | -2.0 | | Greinke_Zack | FB | 494 | -1.3 | 99 | -5.3 | -1.9 | | Kazmir_Scott | FB | 582 | -1.8 | 96 | -1.6 | -1.8 | | Correia_Kevin | FB | 475 | -0.8 | 90 | -7.1 | -1.8 | | Putz_J.J. | FB | 524 | -2.2 | 90 | 0.7 | -1.8 | | Webb_Brandon | FB | 1020 | -0.6 | 302 | -5.3 | -1.7 | | Peavy_Jake | FB | 1257 | -1.3 | 224 | -3.3 | -1.6 | | Schilling_Curt | FB | 470 | -1.8 | 111 | -0.6 | -1.6 | | Penny_Brad | FB | 1213 | -0.7 | 295 | -4.6 | -1.5 | | Wilson_C.J. | FB | 558 | -0.5 | 106 | -7.1 | -1.5 | | Germano_Justin | FB | 687 | -0.9 | 174 | -3.5 | -1.4 | | Gorzelanny_Tom | FB | 463 | -0.9 | 94 | -3.8 | -1.4 | | Hughes_Phil | FB | 561 | -1.5 | 113 | -0.3 | -1.3 | | Hill_Rich | FB | 873 | -1.6 | 187 | 1.1 | -1.2 | | Smoltz_John | FB | 698 | -1.2 | 182 | -1.3 | -1.2 | | Francisco_Frank | FB | 432 | -0.9 | 82 | -2.8 | -1.2 | | Morrow_Brandon | FB | 567 | -0.9 | 79 | -3.6 | -1.2 | +-------------------+-------+--------+-------------+-------+------------+-------------+ Notation: NP - number of pitches runs100 - runs per 100 pitches nip - not-in-play ip - in-play tot - all pitches Padres setup man Heath Bell throws hard—he averaged above 96 mph for his 573 fastballs captured by pitch-f/x. His teammate Chris Young, on the other hand, throws his fastball at average speed (91 mph). While another teammate, Justin Germano, is a confirmed soft-tosser, his fasty averages just 87 mph. When you compare the not-in-play numbers with the in-play numbers, you see some interesting things. J.J. Putz was actually below average when the ball was put into play (six home runs off the fastball), but was very good when the ball was not put into play. Putz, perhaps implicitly, realized this and was able to limit the number of balls put into play (only 90 out of 614 fastballs). Brandon Webb was just the opposite—he was more effective on balls in play and in fact, had one of the highest ratio of balls in play to pitches thrown in this sample. Sliders Pitchers throw a lot of fastballs, so we had the luxury of requiring at least 500 pitches when searching for the best fastball. If I made the same requirement on sliders, I’d end up with 10 pitchers, which isn’t much fun. Instead, I simply selected the top 20 pitchers in terms of number of sliders recorded by pitch-f/x. Here’s the resulting list, ranked by runs per 100 pitches. Sliders in 2007 +-------------------+-------+--------+-------------+-------+------------+-------------+ | Name | Pitch | NP_nip | runs100_nip | NP_ip | runs100_ip | runs100_tot | +-------------------+-------+--------+-------------+-------+------------+-------------+ | Marcum_Shaun | SL | 354 | -1.3 | 87 | -9.4 | -2.9 | | Blanton_Joe | SL | 361 | -1.6 | 69 | -7.5 | -2.6 | | Litsch_Jesse | SL | 368 | -1.4 | 111 | -5.3 | -2.3 | | Buehrle_Mark | SL | 378 | -2.2 | 133 | -1.1 | -1.9 | | Smoltz_John | SL | 540 | -3.6 | 116 | 6.7 | -1.8 | | Young_Chris | SL | 432 | -1.8 | 67 | 0.5 | -1.5 | | Hernandez_Felix | SL | 394 | -2.3 | 52 | 5.8 | -1.3 | | Haren_Dan | SL | 586 | -1.9 | 119 | 2.5 | -1.2 | | Gaudin_Chad | SL | 428 | -2.3 | 87 | 4.6 | -1.2 | | Halladay_Roy | SL | 554 | -1.5 | 171 | 0.1 | -1.1 | | Maddux_Greg | SL | 433 | -1.3 | 161 | -0.1 | -1.0 | | Marquis_Jason | SL | 373 | -1.1 | 110 | -0.8 | -1.0 | | Batista_Miguel | SL | 995 | -1.0 | 195 | 0.7 | -0.7 | | Contreras_Jose | SL | 371 | -1.0 | 98 | 0.4 | -0.7 | | Peavy_Jake | SL | 555 | -1.1 | 139 | 1.6 | -0.6 | | Speier_Justin | SL | 357 | -1.7 | 89 | 4.3 | -0.5 | | Matsuzaka_Daisuke | SL | 348 | -1.3 | 86 | 3.0 | -0.4 | | Millwood_Kevin | SL | 408 | -0.6 | 132 | 0.6 | -0.3 | | Vazquez_Javier | SL | 398 | -1.7 | 86 | 7.3 | -0.1 | | Davis_Doug | SL | 544 | -0.7 | 153 | 2.3 | 0.0 | +-------------------+-------+--------+-------------+-------+------------+-------------+ Now the sample size issue is becoming more important; more of these guys have fewer than 100 in-play pitches and you can see the runs100 values for in-play pitches are jumping around quite a bit. I doubt Shaun Marcum’s -9.4 runs/100 pitches on balls-in-play will hold up as we get more data for him. But, hey, give the man credit, this is what he did in 2007. Smoltz, of course, is famous for his slider and he is found high on this list. I wonder if his poor showing on balls-in-play (+6.7 runs per 100 pitches) might be a statistical fluctuation, which will come down in time, moving him up on this list. Change-ups Here are the results for change-ups. Change-ups in 2007 +------------------+-------+--------+-------------+-------+------------+-------------+ | Name | Pitch | NP_nip | runs100_nip | NP_ip | runs100_ip | runs100_tot | +------------------+-------+--------+-------------+-------+------------+-------------+ | Francis_Jeff | CU | 323 | -1.8 | 104 | -4.2 | -2.4 | | Blanton_Joe | CU | 258 | -0.9 | 98 | -5.2 | -2.1 | | Vazquez_Javier | CU | 259 | -2.1 | 69 | -0.7 | -1.8 | | Hendrickson_Mark | CU | 234 | -1.2 | 84 | -3.1 | -1.7 | | Glavine_Tom | CU | 281 | 0.7 | 93 | -8.5 | -1.6 | | Marcum_Shaun | CU | 322 | -1.4 | 77 | -1.4 | -1.4 | | Weaver_Jered | CU | 285 | -1.7 | 77 | 1.2 | -1.1 | | Gaudin_Chad | CU | 517 | 0.6 | 153 | -5.0 | -0.7 | | Contreras_Jose | CU | 342 | -1.2 | 68 | 2.3 | -0.6 | | Buehrle_Mark | CU | 388 | 0.1 | 130 | -0.4 | 0.0 | | James_Chuck | CU | 350 | -1.0 | 113 | 3.2 | 0.0 | | Danks_John | CU | 251 | -2.8 | 91 | 8.0 | 0.1 | | Rogers_Kenny | CU | 288 | 0.2 | 99 | 0.0 | 0.2 | | Willis_Dontrelle | CU | 249 | 0.6 | 76 | -1.0 | 0.2 | | Colon_Bartolo | CU | 313 | -1.6 | 105 | 6.0 | 0.3 | | Capuano_Chris | CU | 260 | -2.2 | 81 | 9.7 | 0.6 | | Washburn_Jarrod | CU | 311 | 1.2 | 94 | -0.6 | 0.8 | | Penny_Brad | CU | 268 | -1.4 | 82 | 9.9 | 1.2 | | Moyer_Jamie | CU | 249 | -2.1 | 73 | 17.6 | 2.3 | | Maroth_Mike | CU | 255 | 0.9 | 71 | 9.5 | 2.8 | +------------------+-------+--------+-------------+-------+------------+-------------+ Jeff Francis of the Rockies leads the list, with Blanton, Vazquez, Hendrickson and Glavine rounding out the top five. I found it interesting that Jamie Moyer, who is famous for his change-up, fared so poorly with it in 2007. Opposing batters just murdered it when they managed to put it in play, to the tune of 17.6 runs worse than average per 100 pitches. He was likely unlucky on balls in play—at least I hope he was! Curveballs Finally, we come to the curveball. Curveballs in 2007 +-----------------+-------+--------+-------------+-------+------------+-------------+ | Name | Pitch | NP_nip | runs100_nip | NP_ip | runs100_ip | runs100_tot | +-----------------+-------+--------+-------------+-------+------------+-------------+ | Rodriguez_Wandy | CB | 293 | -1.7 | 44 | -14.7 | -3.4 | | Burnett_A.J. | CB | 399 | -2.6 | 39 | -8.9 | -3.1 | | Beckett_Josh | CB | 348 | -2.6 | 47 | -3.9 | -2.8 | | Peavy_Jake | CB | 390 | -3.3 | 80 | 1.9 | -2.4 | | Marmol_Carlos | CB | 414 | -2.4 | 50 | -3.0 | -2.4 | | Haren_Dan | CB | 534 | -1.6 | 108 | -5.6 | -2.3 | | Perez_Oliver | CB | 287 | -2.5 | 35 | 1.5 | -2.1 | | Arroyo_Bronson | CB | 381 | -2.5 | 88 | 0.5 | -2.0 | | Weaver_Jeff | CB | 244 | -1.7 | 76 | -3.0 | -2.0 | | Sabathia_C.C. | CB | 272 | -2.5 | 57 | 0.8 | -1.9 | | Lackey_John | CB | 663 | -1.7 | 139 | -0.4 | -1.5 | | Bell_Heath | CB | 285 | -2.2 | 65 | 2.2 | -1.4 | | Hill_Rich | CB | 397 | -2.0 | 65 | 3.4 | -1.2 | | Wells_David | CB | 375 | -2.0 | 93 | 4.1 | -0.8 | | Blanton_Joe | CB | 261 | -0.9 | 63 | 0.4 | -0.7 | | Santana_Ervin | CB | 450 | -1.9 | 63 | 8.1 | -0.6 | | Washburn_Jarrod | CB | 351 | -1.6 | 88 | 3.4 | -0.6 | | Meche_Gil | CB | 277 | -1.3 | 49 | 3.7 | -0.5 | | Halladay_Roy | CB | 393 | -2.1 | 81 | 8.1 | -0.4 | | Germano_Justin | CB | 331 | -1.3 | 75 | 9.3 | 0.6 | +-----------------+-------+--------+-------------+-------+------------+-------------+ Oops, how’d Wandy get in there? Actually, I am told that Rodriguez does have a very good curveball. In any case, mentally discounting him for that unsustainable -14.7 runs per 100 pitches for balls in play, our top three curveballs belong to Burnett, Beckett and Peavy, which sounds pretty good to me. Wrapping up I must confess, while I think the method I’ve used here is sound, I’m feeling just a bit unsatisfied—I’m thinking that to really nail this down, we need more data. With many pitchers having only a few hundred pitches recorded for any given type of pitch, there is necessarily a fair amount of noise in the run value that we end up with. So, you should consider these results more of a rough guide than a definitive list of the best pitches. There’s not much we can do about that right now, except be glad that the pitch-f/x system will be in place in all parks for the 2008 season. References & Resources Further Reading: As many of you know, there are several people delving into the pitch-f/x data and producing excellent research. Lately, there has been quite a bit activity on the run value of individual pitches, and I’d encourage folks to look at recent work by Joe P. Sheehan and Mike Fast. Pitch Classification: My pitch classification scheme is mostly unchanged since I described it in this article. However, I would like to acknowledge fellow pitch-f/x researcher Mike Fast, who noticed a small problem with my classification and even suggested how to fix it. So, thanks to Mike for that.