# Thinking about strikeout rates

Until recently I was perfectly okay with using strikeouts per nine innings as a stat to evaluate pitchers. However, during a discussion with Ben Lindbergh he mentioned that K percentage was the standard at Baseball Prospectus. I didn’t see much of a difference between the two at first but as time went by and I thought about it more deeply I started to see some benefits in strikeout percentage. The main difference between the two statistics is that K/9 is based on a percentage of outs while K% is based on a percentage of plate appearances.

Basically K% incorporates OBP into the calculation (since outs should be approximately equal to (1-OBP) * Plate Appearances.) I found a thread on the forums at fangraphs about adding K% in which David Appelman said that there’s a correlation of .98 between the two measures and that switching to K% would require people to recalibrate—always a problem.

I still had a nagging feeling that neither measure was a perfect representation of the pitcher’s strikeout skills. DIPS theory, one of the biggest breakthroughs in evaluating pitchers, suggests that batting average on balls in play is almost entirely a product of luck. Strikeout rate, on the other hand, is one of the few things a pitcher can control.

What bothered me, specifically, is that BABIP actually plays a role in strikeout rate. A pitcher who gets unlucky on batted balls will see an increase in his strikeout rate because he is getting fewer outs per ball in play (or plate appearance) than we expect him to. This gives the pitcher more opportunities to strike people out, raising his K/9. This effect will be smaller for K% but it still is there to some degree. Pitchers with a low BABIP will suffer the opposite fate. By getting more outs on their balls in play they will have fewer chances to strike batters out.

If, while evaluating a pitcher, we look at both his BABIP and K/9 we are double counting some of the randomness of BABIP. A good example of this is Ubaldo Jimenez, who had an extremely low BABIP to start the season. I read a lot of analysis commenting that the combination of his 7.7 K/9 combined with a low .200s BABIP meant that his ERA was in line for very significant regression. While this was definitely true (and we’ve seen a lot of the expected regression take place already) I think that these articles were being extra harsh on Ubaldo because of this quirk. As his BABIP has increased, his K/9 has increased as well and it now sits at 8.1 K/9 which, as we’ll see later, is still lower than it should be.

To combat this we should neutralize strikeout rate for BABIP. The new measure, which I have taken to calling BABIP Adjusted K/9, or baK/9 (pronounced back nine) for short, will simply replace the outs in the formula to calculate K/9 with expected outs. I am calculating expected outs by taking the number of outs on balls in play and replacing it with expected outs on balls in play, while leaving all other outs constant. Perhaps it makes sense to adjust double plays/pickoffs/etc. for BABIP as well but I believe the difference would be negligible.

We can calculate expected outs on balls in play for any pitcher as BIP*(1-BABIP) and since we generally treat expected BABIP for pitchers as approximately .300 I am going to generalize by using that figure. In short:

While a guy like Ubaldo Jimenez is having his strikeout rate hurt by his low BABIP, a good example of a pitcher benefiting from this effect in 2010 is Dan Haren. Haren’s last two seasons are very interesting from a strikeout rate perspective. By glancing at his strikeout rate and BABIP you might think that Haren is pitching better this year than last year, but that he’s getting unlucky. After all, he is striking out more batters this year (9 K/9) than last year (8.8 K/9.)

As we see above, Haren has had a rather large effect from BABIP for each of the last two seasons. Like his luck, the effects have gone in opposite directions. Last year’s low BABIP (.271) cost Haren .23 K/9 while this year’s high BABIP (.341) has given him a bonus of .33 K/9. If we look at his baK/9 we see that his 2009 strikeout rate was actually higher than his 2010 rate. The decrease in baK/9 now agrees with his decrease in K% as well, a nice bonus.

Because this adjustment is based on BABIP it is going to be larger with smaller sample sizes. Relievers can have very large differences, approaching 1 K/9 over a full season. Neftali Feliz, for example, had a 12.3 baK/9 last year compared to his 11.3 K/9. Starting pitchers are going to have smaller effects. The biggest difference between K/9 and baK/9 in 2009 for a pitcher with at least 30 starts belonged to Matt Cain, whose 7.5 baK/9 was .4 higher than his 7.1 K/9. If a pitcher goes from one end of the BABIP scale to the other, we could see a pretty big swing in K/9 from year to year, a change which otherwise might have been seen as a change in skill rather than a change in luck.

I will leave you with leader boards of the largest differences between K/9 and baK/9 for starting pitchers with at least 10 starts. I intend to post a few THT Live entries in the upcoming days examining more leader boards for relievers and for the last few seasons. The images are courtesy of Bloomberg Sports and are accurate as of midnight July 24. Note that Jimenez appears on the list and that his baK/9 of 8.3 is still quite a bit higher than his K/9.

Nice job, Craig, but you’re missing a closing parentheses on that formula, aren’t you?

Thanks Dave.

That missing ) is a compiler-level catch. Very impressive.

This is fascinating stuff, and I’m certainly open to hopping on the K% train, but I am confused about one thing—why would a pitcher’s strikeout

rateincrease with more opportunities? I understand that he’s facing more batters with a higher BABIP and therefore I would expect him to compile more strikeouts, but I don’t understand why the rate at which he does so would increase.Really enjoyed this piece, I’ve always wondered why K/9 was used in favor of something like K%.

On a similar note, in measuring a batter’s ability to see pitches and thus wear a pitcher down, wouldnt it make more sense to look at (pitches seen)/out rather than pitches per plate appearance.

Clearly a team who sees 3.5 pitches per plate apperance and have an OBP of .400 do a better job wearing out pitchers than a team that see 3.55 pitches per plate appearence with an OBP of .320.

“This effect will be smaller for K% but it still is there to some degree”

Can you explain it, because on its own, this is wrong. K% says to take all PA, put that in the denominator, and put all K and put that in the numerator. (Maybe exceptions for IBB and SH.) Where is the bias?

***

What your stat is proposing is to look at K per “expected” out, but someone with 100 walks or 10 walks will maintain the same rate. Am I getting that right?

If so, I fail to see the benefit of doing that in isolation without also knowing the BB per PA.

Would it be better to use xBABIP based on the pitcher’s batted ball profile? That is, replace the .300 with xBABIP. I know it wouldn’t change much but would be interesting to see.

Thanks guys.

Phylan you might be right here. I think it will have some effect on K% but not necessarily increase it per se. BABIP will definitely play around with the denominator (PAs) but it will also influence the strikeouts and may just add noise rather than biasing the measure in one specific direction.

That is an interesting suggestion Dan. I wonder if pitcher/PA is used because people also separately look at the OBP against the pitcher/number of batters faced/innings pitched. I’ve looked at pitches per inning which is basically just 3 times pitches per out before…

You can ignore my question Craig, it was answered on the forum for cool people. Nice work.

Hi Tom,

As I replied to Phylan I think my statement on K% was incorrect. I don’t remember exactly what I was thinking when I concluded that one.

I definitely think you need to look at walk rate when considering this as well and originally I had a little bit about that in the end of the article. It got a bit unwieldy though and I wasn’t so sure how to handle it exactly so I left it out for now, hoping that some discussion would start on that point.

I was wondering if looking at baK/9 and also K/BB ratio would be a good way to keep them in perspective of each other while still sticking with stats that don’t require people to readjust.

Aaron I think that is a very good point. I didn’t do it here because it would have increased the difficulty by more than it was worth but you could definitely plug in xBABIP to personalize it a bit more for each pitcher.

Craig, I’m glad you brought this subject up again, it’s something I looked at very similarly a year ago and kind of left in development hell.

http://www.hardballtimes.com/main/fantasy/article/the-great-strikeout-debate/

http://www.hardballtimes.com/main/fantasy/article/the-great-strikeout-debate-part-ii/

I think my conclusion was very similar to yours, I’d be interested in comparing the TrueK% I (and Derek) made up compared to baK/9. Another thing you could normalize besides BABIP is HR percentage since getting un/lucky in allowing homers can also influence a pitchers K/9.

I’ve used BB% and SO% for more than 30 years. These rates per PA are key component of the Oliver projections for batters and pitchers. I’ve just read the referenced articles, but honestly I believe this is being over thought.

I look at each plate appearance as a unit. It begins with the batter/pitcher matchup. In this basic model, there are only four possible results – the batter is hit by the pitch, the batter walks, the batter strikeouts, or the batter outs the ball in play. Once the plate appearance is completed, a new batter steps in, balls and strikes are set to zero, and we start over.

If the ball is in play, the pitcher failed to get a strikeout. The result of the batted ball is irrelevant. When measuring a batter or pitcher’s strikeout ‘skill’, I believe the best and simplest is SO/(PA-IBB), and (BB-IBB)/(PA-IBB) for walks.

Yeah, I’m with Brian (comment above me) here. K% is the best solution in my mind.

Craig,

This may be a silly question, but since you’re computing expected outs as the denominator, shouldn’t the last part of your equation be (.300 – BABIP) instead of (BABIP – .300)?

That is, doesn’t this part of the equation reduce a pitcher’s total recorded outs by the difference between his actual BIP outs and his “league average” BIP outs? Therefore, shouldn’t a pitcher with a BABIP above the league average suffer by having his actual recorded outs reduced because he effectively had “extra” chances for Ks?

Thanks.