Looking at Pitcher WAR

Only 34 percent of Aroldis Chapman's pitches were put in play. (via Keith Allison)

Only 34 percent of Aroldis Chapman’s pitches were put in play. (via Keith Allison)

Trying to assign how much credit a pitcher gets when he is on the mound can be difficult. How much of the credit goes to the pitcher, to the catcher and to the players in the field? Different baseball-related web sites deal with this question in separate ways by using different WAR values.

At Baseball-reference.com, the pitcher is given credit just for the number of runs he allows, with a small amount of credit given or taken away for the defense. At FanGraphs, the pitcher is given credit for what he (and the catcher) can control (strikeouts, walks, and home runs) by using a FIP-based WAR value (FanGraphs does give other possible pitcher WARs). The WAR values can be averaged, compared and combined in many ways, with each method having its advantages and disadvantages. I am going to try to take it one final step and find a possible ideal mix of credit from batted balls and non-batted balls. The final results make for a clearer look, but still not a perfect one.

There are several possible inputs into pitcher WAR, and it will take a few steps to walk through them all. I am going to start with items a pitcher can control, which is easy. Then I will move on to the randomness of batted balls. Some batted balls almost always will be hits or outs. Then there is the gray area in between that is up for discussion. Finally, I will see if a pitcher has the ability to control the outcome by mixing up the possible outcomes to his advantage.

Overall, I would look for the final equation to be:

WAR ERA = FIP components + Batted-ball components + Ability to Sequence the Preceding Events

A pitcher can control certain aspects of the game, the FIP components, when on the mound and make it 100 percent between him and the batter (and the catcher). For this part of the discussion, I will include:

  • Walks (excluding intentional walks – these are the manager’s decision, not the pitcher’s)
  • Strikeouts
  • Hit by pitches
  • Home runs

Here is a graph of the percentage of at-bat outcomes for which the pitcher (and catcher) is 100 percent responsible:

In 1975, only 23.4 percent of plate appearance end up with one of the above pitcher-controlled occurrences. Now the number is over 30 percent. These values are just the league-wide averages. Here is a look at the data for the 2014 top and bottom five pitchers in the number of events under their control.

Highest Pitcher Control Percentage, 2014
Name Pitcher Control Batted Ball
Aroldis Chapman 65.8% 34.2%
Brad Boxberger 55.5% 44.5%
Andrew Miller 52.1% 47.9%
Craig Kimbrel 51.2% 48.8%
Dellin Betances 48.7% 51.3%
Lowest Pitcher Control Percentage, 2014
Name Pitcher Control Batted Ball
T.J. McFarland 20.0% 80.0%
Anthony Swarzak 19.8% 80.2%
Burke Badenhop 19.7% 80.3%
Christian Bergman 19.7% 80.3%
Dan Otero 17.0% 83.0%

Only 34 percent of the batters Chapman faced ended up putting a ball in in play. On the other hand, 83 percent of the hitters Dan Otero faced put the ball in play. Such a huge discrepancy, but each pitcher uses the same formula for his value.

If a pitcher takes control of a situation and doesn’t allow any hits, he should be given the credit accordingly. On the other hand, if a pitcher just pitches to contact, he is at the whim of the defense’s position and quality, the speed of the hitter, the park dimensions, the field conditions, weather and fans with the last name Bartman. For now, I think the FIP portion of the equation should be weighted to the number of batters faced in which one of the defense-neutral events was the outcome.

(As an aside, right now I am giving 100 percent credit, good or bad, to the pitcher for strikeouts and walks. I can see the point that catchers should start getting some of the credit, good or bad, for called strikes and balls. Some recent good work has been done the called strike zone — including SABR nominated work by Dan Brooks and Harry Pavlidis and this beautiful outline of the history of pitch framing by the also beautiful Bradley Woodrum.)

Batted Balls: Who Gets Credit Once the Ball is in Play?

I am now entering the gray area for pitchers. Do they have any control over their batted balls in play (BIP)? This information has been tackled before, and stabs have been taken to define it.  I am going to make another stab at it. I am going to use a limited amount of available data, and the results aren’t what I expected at all.

For my analysis, I used fielding data by Inside Edge, which uses highly trained ex-ball players as stringers for each game to determine the chances a defensive player has of making a play. Here are the various play bins with the percentage of times a batted ball gets placed in each bin.

Inside Edge Play Breakdowns
% Chance of Making Play Actual % of plays made % of Plays
0% (for sure hits) 0% 23.2%
1% to 10% 6.3% 2.7%
10% to 40% 28.9% 2.2%
40% to 60% 57.6% 2.6%
60% to 90% 80.5% 5.2%
90% to 100% (for sure outs) 97.9% 64.0%

(The plays not made in the “90% to 100%” range are simple errors.)

The key to take away is that only 12.7 percent of balls in play are in the range in which it is undetermined if a player can or can’t make a play on the ball. This should be the amount of credit fielders should be allocated in the WAR formula. Additionally, 1.3 percent should be added to the total (the 2.1 percent of the 64 percent in the bottom row) for errors, for a total of 14 percent of batted balls in which the defense is in play.

As we found out in the FIP section, between 69.4 percent and 77.9 percent of the plate appearances from 1975 to 2014 end with a batted ball. So taking 14 percent (plays in which fielders play a deciding role) of the two extremes, I end up with a range of 9.7 percent to 10.9 percent.

Now, just because because about 10 percent of plays are determined by fielder talent, the number of defensive runs isn’t 10 percent. That amount will vary depending on the run value for the events taking place. I took a quick stab at getting to this value and found I was opening a huge can of worms. While range is a major component to defensive WAR, I would also need to take into account at least Arm rankings and Double Play values. So for now we’ll set it aside.

Currently, at FanGraphs (and Baseball-Reference), 500 WAR are allocated to pitchers and defense. At FanGraphs, the fielders are allocated 70 WAR, which is 14 percent of the WAR total. A quick look shows that this value may need to be higher. I used some rough estimates and would feel dirty even giving out the values. This is another study for another day, unfortunately. I will stick with the 14 percent value for now. The key is to find out if a pitcher has control over the batted balls to prevent (or cause) hits.

(One area in which Wins eventually could be assigned to managers and coaches is the positioning of fielders. Players are being moved all over the field and at times seem to be positioned perfectly to make a play with near zero effort. It could be possible to measure the value of field positioning, but for now I will also have to ignore it.)

Now that I’ve separated out the influence of fielders, I can look at the rest of the batted ball data. Can pitchers control sure outs or those given up for sure hits? “For sure” hits are those batted balls which are in the zero percent group in the above table and make up ~23 percent of batted balls. “For sure” outs are the batted balls that fall in the 90 to 100 percent range. Unless the fielder makes a boneheaded mistake, the batted ball will be an easy out.

Using just the three years of Inside Edge data, I looked to see if giving up sure hits or getting outs was a skill. I compared Season One to Season Two, and Season Two to Season Three and found no correlation. Also, I took Season One and Two data and compared them to Season Three and got nothing again. I looked at the pitchers with the highest number of innings and never could get a usable correlation.

I even went to bucket regression to try to find some correlation. For example, I took all the pitchers with at least 100 balls in play in both seasons and ranked them by percentage of batted balls for “for sure” hits. These are the pitchers who were better than the average in Season One and also were given the chance to pitch another season.

Hits Allowed Correlation
Group Hit% Year 1 Hit% Year 2 Out% Year 1 Out% Year 2
Many-Hit Group 26.7% 24.6% 60.9% 64.3%
Few-Hit Group 17.9% 23.3% 64.0% 64.3%

Both groups allowed a below-average number of hits in Season One, but moving to the next season, the numbers morph into almost the overall averages shown above.

In small year-to-year samples with Inside Edge data, pitchers have no ability to allow or prevent hits. This is really no surprise, since batting average on balls in play (BABIP), which takes the defense behind the pitcher into account, stabilizes around 2,000 balls in play.

I wonder if part of the stabilization is having a good (or bad) defense behind the pitcher for multiple seasons. Over the last 10 seasons, Matt Cain has the highest amount of value added by balls in play. Over the same time frame, the Giants defense was rated the best in the league by UZR. The No. 2 pitcher in getting value from preventing balls in play is Jered Weaver. The Angels defense producing the fourth-highest UZR value over that time frame.

Some information does exist that shows some traits can lead to lower runs allowed. Matt Swartz and Dave Studeman used these data to help create SIERA and xFIP.

Both of these stats look at what the pitcher should do. Instead, the wERA I am looking for is what he actually did on the mound. The pitcher allowed a home run, the infield let a ground ball through, etc. — not what the pitcher should have done on the mound. I am going to come back to this idea in just a bit after I look at sequencing.

Sequencing

Sequencing is the term for pitching to the situation or score. Can the pitcher get a strikeout with the bases loaded and one out? Is he able to limit home runs with runners on base? To find if this is a talent, I looked at the FanGraphs LOB-Wins for pitchers from season to season.

Are pitchers able to show any kind of skill to pitch to the situation? Again, nothing. Now, pitchers have been able to show this trait over a career: Tom Glavine refused to give up a home run with runners on base even if it meant walking the hitter. It just can’t be a trait that can be assigned to any pitcher immediately.

Sure, some pitchers show the skill of sequencing or the ability to limit batted-ball damage, but it is uncommon. Should the focus be on the exceptions or the norm?

I took all pitchers from 1974 to current with 600 innings pitched. Here is how their batted-ball-in-play WAR (BIP-Wins) and sequencing WAR (LOB-Wins) per 200 innings stacks up over their careers.

There are a few cases of pitchers seeming to have the ability to limit balls in play and how many runs eventually score once runners are on base, but these are hard to detect in a small sample.

Putting it all together

There is no easy way to measure the skill sets for batted ball data or sequencing with just a year’s worth of data. The truth is, I don’t know what to use to determine pitcher value besides giving a portion to FIP. Setting the proportions is really up to the individual to decide. Here is the overall equation I would use.

wERA = %FIP + %p(E)RA + %l(E)RA

Where:
p(E)RA = Pitcher’s ERA (batted ball and sequencing factors)
l(E)RA = League’s ERA

I have found pitchers have very little skill when it comes to preventing hits or sequencing events. Why not just assign the pitcher the league-average number of runs allowed for at-bats out of their control? The more I think about how random the batted ball and sequencing is, why not throw in the towel and put everyone on the same level?

Then again, some pitchers eventually show some talent not measured in year-to-year data. As long as a floor for FIP is used to calculate wERA, I could see any combination of the values being used, and there is no way to decide the wrong or right answer. Additionally, the weight of these values could change over the years.

For example, take Alfredo Simon’s 2014 season. He had a 3.44 ERA, but a 4.33 FIP. The plate appearances that ended in a FIP-based outcome were 25 percent. The overall league average (E)RA was 3.74.

Here are the three values mixed in different ways:

Mixing the Three Values
Breakdown FIP% pERA% l(E)RA% w(E)RA
All FIP 100 0 0 4.33
Actual FIP% / rest league average ERA 25 0 75 3.89
Actual FIP% / rest his ERA 25 75 0 3.66
Actual FIP% / half of rest is lg avg ERA / other half is ERA 25 37 38 3.78
Half FIP / 25% pitcher ERA / 25% league ERA 50 25 25 3.96

The FIP value will be an anchor, and it will be adjusted by the other weights. Looking at just 2014, what was Simon’s value? It is tough to really tell beyond the 25 percent he gets credit for with his strikeouts, walks and home runs.

Pitchers (and catchers) have complete control over some plate appearance outcomes, but once the ball is put into play, it is nearly impossible to know what is a skill and what isn’t. Additionally, the ordering of the events is also not an easy skill to put any value on. After setting a minimum amount of credit for FIP-based outcomes, the rest of an ERA measurement used for WAR can be any combination of FIP, pitcher ERA and/or league ERA. While I didn’t find an exact answer to giving a pitcher an exact value like batters get, I hope I shed some light on difficulties and decisions involved when calculating a pitcher’s WAR.

References & Resources


Jeff, one of the authors of the fantasy baseball guide,The Process, writes for RotoGraphs, The Hardball Times, Rotowire, Baseball America, and BaseballHQ. He has been nominated for two SABR Analytics Research Award for Contemporary Analysis and won it in 2013 in tandem with Bill Petti. He has won four FSWA Awards including on for his Mining the News series. He's won Tout Wars three times, LABR twice, and got his first NFBC Main Event win in 2021. Follow him on Twitter @jeffwzimmerman.
19 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Matt P
9 years ago

That’s an interesting idea. The problem is that the Inside Edge data simply has issues.

For example, center fielders consistently convert 14% of all remote chances (1-10%). They also convert about 60% of even chances. The significance of this is that it indicates that Inside Edge graders are likely overestimating the difficulty of converting plays. This makes sense because they’re using human graders and it seems reasonable that they would give players the benefit of the doubt when trying to decide whether a play is “even” or “likely”. This has implications for your article because it indicates that some plays aren’t in fact “impossible” despite being graded as such due to scorers bias. That means it’s likely that there are more than 12.4% of plays where a fielder can have an impact.

Also in 2014, the number of impossible chances for infielders was much larger than it was in 2012 and 2013 while the opposite was true for outfielders. Basically, this means that the methodology has changed for year to year and therefore you can’t really combine data from multiple years. This may have limited impact on your article but certainly puts into question whether one should trust their numbers.

You recently wrote an article where you came up with a metric using Inside Edge to judge player defense. Did you ever test whether player defense when grouped by team has any correlation to team wins? I created a similar metric and found no correlation between the two but maybe you’ll have better luck than me. If there’s no correlation between the two stats than it would seem to indicate that the Inside Edge data really isn’t useful.

Eric the Clown
9 years ago

I love the idea of Baseball-Reference’s model. Take the actual number of runs allowed, compare to what the average offense would score, and subtract the runs saved by the defense, and you get the exact number of runs that the pitcher is responsible for. It’s really quite elegant.

Unfortunately, we don’t have a perfect way to measure defense, which means this version of WAR will also be imperfect. But the idea is good.

Mike
9 years ago
Reply to  Eric the Clown

The problem is that I don’t believe BR differentiates between the difference in defensive outcomes between pitchers on the same team. If your defense is worse when you pitch than when your teammates pitch, you’ll have a lower WAR than you should.

Derek
9 years ago

I can accept that pitchers don’t have control over how many balls end up as hits. But they have some amount of control over batted ball distribution (GB/LD/FB). wOBA results are different for each type of batted ball. Line drives result in the highest wOBA, then fly balls, followed by ground balls. Shouldn’t this be factored in?

Mike
9 years ago
Reply to  Derek

The problem is that LD rate takes way longer to stabilize than FB and GB rate: http://www.fangraphs.com/library/principles/sample-size/

Because of this it’s hard to say the pitcher “controls” the metric. This doesn’t necessarily mean that pitchers don’t exhibit some sort of control over hard hit balls. It does mean that the metric is mostly noise, though.

Derek
9 years ago
Reply to  Mike

Okay so line drives are harder to incorporate. But GB% and FB% shouldn’t be. Some pitchers are specifically attempting to induce a high rate of ground balls. I’m not convinced we can judge them by the same metrics we can use for strikeout pitchers.

Matthew Murphy
9 years ago
Reply to  Mike

GB% and FB% may be stable and repeatable, but while ground balls are good, ground-ball heavy pitchers tend to get hit harder on the fly balls they do allow. Factoring in the average wOBA for batted ball types would ignore this and skew the system in favor of ground-ball pitchers.

pft
9 years ago

Great article overall.

Bit confused here though. So pitchers control 30% of hitter events all themselves. Batted balls are 70% where defense fielding talent is responsible for 10%, or 63% pitcher and 10% fielder.

That means pitchers are responsible for 93% and fielders 7%. But then you say 14% is too low for fielders? Arm Rankings and DP’s the reason?. Sorry, you need to do better to support the idea that 14% is too low.

Tim
9 years ago

I like the idea of creating a WAR metric using more than FIP because FIP WAR doesn’t give enough credit to pitchers like Clayton Kershaw, Jim Palmer, Tyler Clipppard ect who almost always have ERAs lower than their FIPs.

What about creating a WAR metric using the “tru ERA” that Tony Blengino calculates in his articles or a combination of %FIP and %BIP only tru ERA? This seems better to me than using only BIP frequency data since Tony adjusts for park factors and can also adjust for luck using his contact authority score rating thing. Is this in any way feasible? Does this idea even make sense?

Dr. C
9 years ago

Is inducing pop ups a repeatable skill? I seem to recall an FG article where Dave Cameron suggested it could be incorporated into FG WAR because it was a real thing. I would guess that every pop up (or nearly every one) is a sure-out play, but that’s not true for every ground ball. It requires very little fielding skill and in terms of impact is the equivalent of a strikeout.

Kincaid
9 years ago

Fly balls only have a higher wOBA than ground balls because of home runs, which is accounted for in FIP. Once you take out the home runs and handle them separately, there isn’t much difference between the average value of a fly ball in play and a ground ball. As long as you are still penalizing for the HR, it isn’t that big a deal to ignore GB/FB.

Kincaid
9 years ago
Reply to  Kincaid

(This was supposed to be a reply to Derek’s comment thread above, but apparently I had this page open in two tabs and entered my post in the one where I didn’t hit “Reply”)

Lanidrac
9 years ago
Reply to  Kincaid

I don’t think so. Ground balls rarely result in doubles or triples (unlike a lot of fly balls), so while BABIP may stablize at about the same rate for most (but not all) pitchers, SLG% on balls in play does not.

As for what Matthew Murphy said about ground ball pitchers getting hit hard when they do give up fly balls, a lot of those extra bombs are going to be home runs, which are accounted for by FIP anyway.

So even if line drive rates don’t stablize, I think a proper pitching metric should include pop-up and grounder rates, as well as park factors to adjust the home run portion of the FIP component.

Kincaid
9 years ago
Reply to  Lanidrac

BABIP isn’t the same for GB and FB–ground balls have a higher BABIP, which offsets the extra doubles/triples from fly balls. The average linear weights value of a ball in play last year (from FanGraphs’ splits page) was something like -.12 for a FB and -.07 for a GB.

(It also depends on how you classify FB/LD: Retrosheet/Baseball-Reference has more BIP classified as line drives last year than BIS/FanGraphs, so the average value of a FB in play is a bit lower using that data, something like -.16 runs per FB in play.)

Michael Goetze
9 years ago

I wanted to like this article, but I couldn’t bring myself to continue reading once you admitted that you would be using Inside Edge fielding data. This data has zero credibility in my eyes, it seems clear they punish fielders for getting a quick break and taking a good route to the ball, while rewarding those who take a poor route and then manage to dive spectacularly and come up with the ball anyway. Exhibit No. 1: Jackie Bradley Jr. – I watched some of those catches Inside Edge called “Even” and they weren’t. Not even close.

Eric
9 years ago

Hey Jeff, the higher pitcher control percentages seem to be skewed towards relievers (maybe higher velocity guys as well?) since Chapman and Kimbrel made the top five. Is there a reliever benefit that could possibly bias their WAR, like there is a DH penalty? – seeing as how relievers throw one inning at a time rather than a starter who throws 6 innings on average per start, plus overall relievers in the course of the year throw 40-70 innings per year versus 150 to 220 innings. Just curious.

Matthew
9 years ago

You linked us to regression/stabilization numbers for BABIP and LOB stabilization numbers are out there too. Your link suggests that you do realize that BABIP skill exists – it is just almost impossible to find in just two seasons of data. The regression rates for LOB are similar.

Why not take a pitcher’s RA and regress their BABIP and LOB runs based on career PAs each season? As the pitcher increases PA, the regressions would get smaller, of course. We would have to adjust previous seasons, of course, but it wouldn’t be too hard.

But we shouldn’t treat Tom Glavine the same way as a rookie. If regression analysis suggests a 15% regression for BABIP or LOB%, just regress those that much each season.

Evan Gattis
9 years ago

It’s strange that IFFB is a clear pitcher skill with a reasonably wide range of values, and IFFB are automatic outs, yet the technique can’t even find a hint of that signal year to year.

Calvin Liu
9 years ago

A nice try to extract more information with what’s out there, but IMO the key is to find more insights rather than juggle existing but insufficient/incomplete data.
For example, one item I’d look investigate is to see how foul balls of various types and in various counts correlate with pitcher skill. The rationale is that one of the items which pitcher/catcher combinations can demonstrate skill is to disrupt the batter’s timing. Strikeouts are one way, but they’re the extreme case: when a batter misses the 3rd strike via swing through or called strike. A foul ball, particularly because of swinging too early or late, is generally an indication that the batter was fooled in timing in some way; conversely a foul ball that goes straight back generally means the batter just missed the sweet spot and was fooled only slightly/was unlucky. In contrast, a called 3rd strike is generally more of a location fooling of the batter – although it also encompasses pitchers who simply overwhelm the batter with sheer power.
It would also be interesting to look at hitters which foul balls off more than “normal”. We all know contact hitters that are famous for fouling off seemingly dozens of pitches per game. How do these batters compare in strikeout, line drive, and other ratios vs. their peers?
Extracting these two data sets could prove very instructive.
Another interesting example would be looking at slow non-knuckleball pitchers like Jamie Moyer. At 84mph, he clearly wasn’t overpowering anyone yet still got people out enough to be effective – and I recall reading an article some time back which showed that one reason he was effective was his own defense. If the fielding by a single position (pitcher) plus location/type pitching was enough of a contribution to value such that Moyer could hold a job at the major league level for decades, clearly there is more to life than just throwing 100 mph fastballs.
One last note: sequencing as you use it above can be confusing: catchers/pitchers also use sequencing, but in this use case it refers to switching between high and low pitches, or sometimes inside/outside pitches so as to force batters to realign the planes upon which the batter is placing a pitch upon arrival at the plate. Mike Krukow of the SF Giants in particular talks about how this is an effective way to improve pitcher performance.