Postseason probability added

by Dave Studeman
January 15, 2009

So now we know two things about the 2008 season: how “important” specific games were in each pennant race (the drama index, or DI), and how much players helped their teams win specific games (Win Probability Added, or WPA). Like peanut butter and chocolate, love and marriage, a horse and carriage, these two things go together. Let’s do it. Let’s create a new stat.

Why? Because with these two tools, we can derive a unique look at baseball players in 2008, one that’s in tune with how many fans and writers think of “value.” In fact, our original question concerned last year’s National League MVP voting, in which Ryan Howard finished a strong second to Albert Pujols. Howard had strong superficial stats (home runs and RBI’s) but didn’t come close to Pujols’ real overall value. Yet many people thought Howard should have won the MVP anyway because of his strong September performance.

That’s been our quest: to measure the impact of Howard’s timely performance. We’ve answered the question of how “important” September games truly are (I prefer the word “dramatic”) with our drama index. Now, we’re going to combine the drama index with WPA to develop PPA (oh no! another acronym!), or Postseason Probability Added. Quite simply, we’re going to multiply each player’s WPA in each game times that game’s drama index. And that’s all the explanation you’re going to get out of me, because it’s that simple.

So how do Pujols and Howard compare?

Player      Team      WPA    DI    PPA
Howard, R   PHI      2.37  0.75   4.48
Pujols, A   STL      6.39  0.63   4.39

Well, look at that: a virtual PPA tie. Even though Howard’s overall Drama Index was only slightly higher than Pujols’, Howard produced in games with the highest index (in other words, “when it counted”) and he completely closed the four-win gap in WPA. Pujols’ production was more even during the year, and the ratio between his WPA and PPA reflects that.

Remember, however, that we’re not including fielding prowess here, and Pujols was about two wins better than Howard last year. Pujols was still the more valuable player, by two wins. Two wins is a lot. Howard shouldn’t have received a single vote ahead of Pujols on any ballot. But we know that many MVP voters don’t value fielding, and our PPA result seems to be a good reflection of the actual MVP voting.

But that’s not the end of the story.

It turns out that neither Pujols nor Howard finished first in NL PPA last year. A certain New York Met actually contributed more than any other player to his team’s pennant drive, and second place wasn’t even close. Here are the top ten NL PPA leaders among all position players:

Player      Team      WPA    DI    PPA
Beltran, C  NYN      5.02  0.97   8.30
Braun, R    MIL      3.68  0.87   4.63
Fielder, P  MIL      3.20  0.84   4.63
Howard, R   PHI      2.37  0.75   4.48
Pujols, A   STL      6.39  0.63   4.39
Ethier, A   LAN      3.78  0.90   4.38
Ramirez, H  FLA      4.74  0.68   4.31
Ramirez, M  LAN      3.51  1.07   4.13
Berkman, L  HOU      6.71  0.59   3.45
Wright, D   NYN      4.18  0.98   3.39

In the last four games of the season, the Mets had drama indices of 4.25, 4.44, 7.00 and 6.36. In those four games, Beltran batted .429/.529/.643. Before those four games, his PPA was 4.44, in line with the other leaders. But that stretch of games boosted him far beyond the rest of the pack.

When you add in the extras—Beltran plays a premium position, plays it as well as anyone in the game and is among the very best baserunners in the game—you have a very, very credible case for Carlos Beltran as the 2008 National League MVP. Alas, he finished 21st in the voting.

You probably noticed that Manny Ramirez is eighth on the list, even though he played less than half of the year in LA. It has to be said that the scale of PPA is an issue: if we were to compare Manny to replacement level instead of average, he would rank lower, maybe much lower. When you add the fact that Manny plays a “secondary” position—and not very well—you have to conclude that his fourth-place finish in the MVP voting wasn’t justified.

Overall, however, this list is a pretty good reflection of actual MVP voting. On the surface, it seems to do a better job than WPA or any other general stat of matching the actual MVP results. I have a feeling that, if we were to add a “kicker” for each team that actually makes the postseason, we’d have an even better match.

How about the American League PPA? How well does it match the MVP voting? Hold onto your hats:

Player      Team      WPA    DI    PPA
Span, D     MIN      1.98  1.56   5.16
Mauer, J    MIN      4.88  1.20   3.71
Dye, J      CHA     -0.26  0.88   3.48
Hamilton, J TEX      2.80  0.66   2.99
Ramirez, A  CHA      0.31  0.97   2.54
Bradley, M  TEX      2.09  0.65   2.26
Cabrera, M  DET      2.95  0.64   2.24
Giambi, J   NYA      1.96  0.77   2.23
Quentin, C  CHA      3.81  0.57   2.18
Morneau, J  MIN      3.87  1.16   1.88

Yes, that Denard Span. On September 25th, Span hit a run-scoring triple in the bottom of the eighth to tie a key game with the White Sox (drama index of 4.7, WPA of .425). Multiply those out, and you get a PPA of more than two for just one play. Span also played a lot of critical games for the Twins—I thought that his average Drama Index of 1.56 was a typo until I realized that Span didn’t play at all in May and June. He played “when it counted.” To his credit, Span didn’t have any big negative WPA days when it counted; most players do.

No wonder the BBWAA had such a hard time choosing an MVP. Alexei Ramirez fifth in PPA? The actual MVP winner, Dustin Pedroia, finished 27th in our rankings. My own MVP choice, Joe Mauer, would be the logical PPA MVP pick, particularly when you factor in the fact that he is a fine catcher, to boot.

Here’s another strange one: Jermaine Dye. Dye’s overall WPA was actually below average, but he really delivered when it counted. Here’s a breakout of his average WPA per game, grouped by how dramatic each game was:

A Hardball Times Update

by RJ McDaniel

Goodbye for now.

DI       G     WPA
0-1    120  -0.002
1-2     25  -0.011
2-3      2  -0.049
4-5      1   0.049
6-7      2   0.059
12-13    2   0.114

You’re probably noticing that a few games can make a big difference. That is the nature of this particular beast. If you’re going to insist that when players perform is as important as how they perform, a few games will count a whole lot more than the others. I’m not doing the insisting, by the way. I’m just doing the math.

Still, we’ve got two lists here. One appears to be a very good reflection of how some people interpret their MVP ballots. The other is a bit of a mess.

I’ve only looked at position players so far. Let’s look at pitchers next. The National League:

Player      Team     WPA    DI    PPA
Santana, J  NYN      4.08  1.06   6.54
Sabathia, C MIL      3.17  1.30   4.37
Lidge, B    PHI      5.37  0.85   4.31
Webb, B     ARI      3.43  0.49   2.85
Lincecum, T SF       4.73  0.55   2.43
Smith, J    NYN      1.10  1.04   2.30
Hamels, C   PHI      2.51  0.79   2.15
Billingsley,LAN      1.46  0.84   2.15
Qualls, C   ARI      0.42  0.49   2.13
Wilson, B   SF       1.77  0.55   2.02
Marte, D    PIT      2.41  0.78   1.99
Hudson, T   ATL      2.30  0.78   1.70
Myers, B    PHI     -1.46  0.81   1.67
McClung, S  MIL     -0.05  0.82   1.64
Wade, C     LAN      1.85  0.89   1.62

Good relievers rank particularly well in the WPA system (another reason to use replacement level as the benchmark; it would give a boost to starters who pitch more innings), so I expanded the list to fifteen to include more starting pitchers.

A few pitchers did appear on MVP ballots. Sabathia finished sixth, Lidge was eighth, Santana was 14th, Webb was 17th and Lincecum was 23rd. In other words, the top five finishers in pitcher PPA all made it onto MVP ballots. Once again, the National League standings make a lot of sense.

I want to point out one other pitcher on the National League list, Philadelphia’s Brett Myers. As you may know, Myers had a terrible first half and was demoted to the minors in early July. When he returned to Philly, his reemergence helped fuel the Phillies’ pennant drive, and he pitched very well in some of their most dramatic games. For that effort, he turned a negative WPA (-1.46) into a positive PPA (1.67).

Here are the American League pitchers:

Player           Team     WPA    DI    PPA
Danks, J         CHA      2.99  1.11   7.08
Baker, S         MIN      2.82  1.29   5.29
Jenks, B         CHA      3.47  1.02   4.13
Rivera, M        NYA      4.47  0.80   2.83
Nathan, J        MIN      3.26  1.31   2.83
Mijares, J       MIN      0.48  4.05   2.53
Buehrle, M       CHA      1.57  0.81   2.44
Mussina, M       NYA      2.20  0.76   2.38
Lee, C           CLE      5.96  0.39   2.37
Guardado, E      MIN     -0.33  2.08  -0.77
                 TEX      2.85  0.87   3.08
Duchscherer, J   OAK      2.16  0.69   2.27
Ziegler, B       OAK      3.20  0.54   2.20
Downs, S         TOR      2.51  0.58   1.98
Chamberlain, J   NYA      2.28  0.68   1.80
Soria, J         KC       4.08  0.37   1.77

Because Chicago and Minnesota played the most dramatic games of the year, their pitchers rank highly here. John Danks is the best example of the extreme impact a game can have. He pitched a fantastic game in the final game of the season for the Sox, picking up a total of 7.5 PPA points in one game alone! Take that one game away, and his PPA is below average.

We had two extremes in the American League last year that impacted the PPA list. You may recall, from the drama index article, that the White Sox had a “hockey stick” graph. Until their last six games, they didn’t have a DI over 1.7. After that, they were drama queens, with an average DI of 7.6. At the other extreme, K-Rod (who finished sixth in MVP voting) doesn’t make this list because the Angels had no dramatic games last year. They ran away with the division.

I think this explains the wacky results in the AL. PPA seems to work pretty well when the league is generally competitive, across divisions and over the full season. In the American League, the only real drama concerned the Central division, and that drama was extreme. So PPA doesn’t match the MVP results (or any kind of “smell test” I can think of) as well.

Of course, we can play around with this system. I could set up the distribution tables so that final games aren’t dramatically better than other games, but then I’d be muting the impact of those other games, too. I could set an arbitrary cutoff for the highest drama index at seven or whatever, but that’s so … arbitrary.

I actually did play with a running five-game average of the drama index instead of setting the drama index specific to that game. I used a five-game average because most pitching rotations are five-man rotations, and DI seems to really take off in the last five games of the season. That did mute the impact somewhat. For instance, Joe Mauer leaped to the top of the AL list and Denard Span dropped a bit. But it didn’t make a huge difference. I think the system that I’ve presented here does what it’s supposed to do.

Every idea, carried to its logical extreme, becomes a caricature of itself. We’ve created a caricature here, but I think we’ve learned a few valuable lessons along the way. In general, sabermetrics helps processes like MVP voting by providing insight where generalities were previously used. For instance, RBI’s were used (and, in many cases, still are used) as a proxy for batting prowess and clutch hitting. Sabermetrics has brought a microscope to RBI’s, and provided better data such as Base Runs and true clutch hitting figures.

Here’s another example: many MVP voters like to reward players who play for contending teams, and some will almost always vote just for players whose teams make the playoffs. But we don’t have to rely on team winning statistics anymore. We’ve not only shown the relationship between certain skills and winning (the essence of Win Shares), we’ve even developed stats that document, on a play-by-play basis, how much each player contributed to his team’s chance of winning (the essence of WPA). We don’t have to rely on a team’s winning record as proof of how much individual players contributed to winning.

We’ve now done the same thing with “performing when it counts during the season.” Using an approach that makes intuitive sense, we’ve given more weight to performances in September games, and we’ve found that sometimes the results make a lot of “sense.” And sometimes they don’t.

Which is fine by me. Despite my penchant for math and big spreadsheets, I’m not really a reductionist. I believe that MVP voting shouldn’t be totally objective. But I do like to shine a light on subjectivity when we can. Consider this exercise a flash of light.

BAL	CHW	ATH
BOS	CLE	HOU
NYY	DET	LAA
TBR	KCR	SEA
TOR	MIN	TEX

ATL	CHC	ARI
MIA	CIN	COL
NYM	MIL	LAD
PHI	PIT	SDP
WSN	STL	SFG