# corrVOL: Updating How We Can Measure Hitter Volatility

### Introduction

Two years ago at at Saber Seminar, I presented my most recent attempt at quantifying how inconsistent or volatile hitters were at distributing the runs they create over the course of the season. The method built off an approach I devised at the team level, essentially generating Gini coefficients for each team over the course of a season to see how evenly teams distributed their runs allowed and runs scored. Lower scores indicate a more even distribution; higher scores, a more unequal distribution.

As I have written before, volatility is not the same thing as being streaky. Streakiness is about how extreme positive and negative performances lump together over the course of a season. Essentially, it’s the clustering of good and bad performances over long stretches. Volatility is different. It is more about the overall distribution of a player’s daily performance relative to the overall runs they create over the course of a season. If a player creates 81 runs in a given season, did he create half of a run every game (perfectly consistent/equal distribution of runs), or did he create 80 percent of his runs in only 20 percent of his games?

That’s the question volatility, and my VOL metric, saught to answer and offer a means to quantify.

After the presentation, I benefited from some fantastic feedback from a number of people, and as a result I have updated my VOL metric in a number of ways. In this article I walk through those changes.

### Gini Coefficients

The biggest change to the VOL metric rolled out during that Saber Seminar presentation is the use of Gini coefficients to calculate volatility.

Gini coefficients typically are used to measure how equally some value–usually wealth or income–is distributed amongst the individual citizens in a given country. Let’s look at a simplified example to better understand how they work and how they can be used to compare the relative inequality of two distributions.

Assume we have two counties, Egalistan and Concentratistan. The two countries each have 15 citizens, and the wealth in each country is distributed as follows:

Egalistan | Concentratistan |
---|---|

$30 | $1 |

$35 | $1 |

$40 | $1 |

$45 | $3 |

$50 | $3 |

$55 | $3 |

$60 | $4 |

$65 | $5 |

$70 | $5 |

$75 | $16 |

$80 | $55 |

$85 | $78 |

$90 | $200 |

$95 | $200 |

$100 | $400 |

Now, each country has the same aggregate wealth ($975), but how that wealth is distributed is quite different. Gini coefficients provide us with a way to quantify the eveneness of these two distributions and compare them.Gini Coefficients range from 0-1, with 0 being the most equally distributed and 1 the least. When we compare our two countries, Egalistan has a gini coefficient of .21, while Concentratistan has a coefficient of .80.

We also can visualize the these distributions using a Lorenz Curve. The curve maps the cumulative wealth for a given cumulative percentage of individual citizens for any given point. Here are the Lorenz Curves for both of our fictional countries:

We can see how the area between the curve and the line of equality is far smaller for Egalistan than Concetratistan. In Egalistan, the bottom 80 percent of the citizens hold 71 percent of the wealth, whereas in Concentratistan the bottom 80 percent of citizens only hold 18 percent of the wealth. This is essentially a visualization of our Gini coefficients; in fact, Gini coefficients are derived from Lorenz Curves.

### Measuring the Volatility of Hitters

So now that we understand how Gini coefficients work, it’s not that difficult to translate their application to individual hitters in the following way:

Hitters = Countries

Games = Citizens

Runs = Income

For data, I used individual game data from 1974 through 2016. These data were acquired from the FanGraphs database, but you could recreate the analysis from individual game data using Retrosheet as a source.

For runs I chose to use Weighted Runs Created (wRC) for hitters calculated on a game-by-game basis. I used the fg_guts function from my baseballr package to obtain wOBA weights and constants for each year from FanGraphs and to calculate daily wOBA for each hitter. wOBA then can be converted to wRC using the following equation:

**wRC = (((wOBA-League wOBA)/wOBA Scale)+(League R/PA))*PA**

Once we have the wRC for each game for each hitter, we easily can calculate the season VOL for each hitter simply by calculating the Gini coefficient for each hitter-season combination.

The updated version presented at Saber Seminar dealt with the negative occurrence of wRC by adding a constant to the wRC calculation. I won’t go into those calculations here, but if need be you can review that approach here. A few commenters mentioned there are some approaches that do a better job of handling negative values than simply using a constant. Our own Sean Dolinar had done some recent work with Gini coefficients and came across an approach to handle the negative values. Sean was kind enough to share some R code, and I’ve used that to update my approach.

Here is the function used to calculate the Gini coefficients for VOL in this version:

Gini_neg <- function(Y) { Y <- sort(Y) N <- length(Y) u_Y <- mean(Y) top <- 2/N^2 * sum(seq(1,N)*Y) - (1/N)*sum(Y) - (1/N^2) * sum(Y) min_T <- function(x) { return(min(0,x)) } max_T <- function(x) { return(max(0,x)) } T_all <- sum(Y) T_min <- abs(sum(sapply(Y, FUN = min_T))) T_max <- sum(sapply(Y, FUN = max_T)) u_P <- (N-1)/N^2*(T_max + T_min) return(top/u_P) }

This solution addressed one of the issues, but there was another one I discussed at the conference and hadn’t really decided how to handle: the high correlation between VOL and a number of metrics, namely plate appearances, games, wRC, and wRC+ for everyday players.

The correlation between wRC and VOL overall seemed reasonable (-0.23), but this somewhat masked what seemed like a bias; the relationship becomes much stronger for players who appear in more games and have more plate appearances per game. If we restrict to players who appear in at least 100 games, that correlation jumps to -0.40. The relationship gets even stronger when we break players up into groups based on how many plate appearances per game they average:

And here’s a scatter plot with each point colored based on how many plate appearances per game a hitter averaged:

It’s pretty clear that VOL appears to be mapping quite closely to the overall quality of a hitter. Better hitters tend to play in more games and get more plate appearances, and those players that can accumulate more wRC in a season appear to be more consistent/less volatile in terms of how they distribute those runs per game.

Now, maybe that’s true–hitters that produce more runs, both in terms of absolute runs and runs per game or plate appearance, may also be far more consistent than lesser hitters. However, it could also be that by virtue of getting more playing time, their VOL scores are being reduced artificially in some way as a function of how the Gini coefficients are being calculated. This makes sense as Gini coefficients are known to be sensitive when calculated over smaller n-sizes. In a baseball season, we will have at most 162 games.

When we restrict to players with at least 100 games played in a season, the strength of the relationships jumps substantially:

To address this issue, I made one additional change to the VOL metric. I had a few suggestions about how to approach this, but the one I chose was to model VOL as a function of–potentially–games played, plate appearances, and runs created.

The idea is to create an expected VOL for a given player based on how often he plays and the level at which he produces, and then use that expectation to normalize his VOL, telling us whether that player is more or less volatile than we would expect given these other attributes. I also am restricting the modeling and overall calculation of VOL to players who had at least 100 games played in a season to help deal with the “Gini coefficients with smaller n-sizes” issue.

### Expected VOL Modeling

I decided to go with a simple linear model to find the expected VOL of a player. I played around with a few versions, but this version seemed the appropriate:

**expectedVOL ~ wRC + PA_G + Games**

wRC, plate appearances per game, and games played all have a very high correlation to VOL, although it differs by feature:

I took the 9925 player seasons in the reduced data set and split the data into a training and test set, with 70 percent of the data going to the training set and 30 percent to the test set.

Here is the output for expectedVOL:

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.67923794 0.00281500 241.293 <0.0000000000000002 *** wRC -0.00132221 0.00001655 -79.909 <0.0000000000000002 *** PA_G -0.00685035 0.00069641 -9.837 <0.0000000000000002 *** Games 0.00060503 0.00002020 29.947 <0.0000000000000002 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.02213 on 6944 degrees of freedom Multiple R-squared: 0.666, Adjusted R-squared: 0.6658 F-statistic: 4615 on 3 and 6944 DF, p-value: < 0.00000000000000022

The overall model has an adjusted r2 of 0.67 on the training set, and the variable inflation factors for each feature range from 1.9 to 2.73. So, some mild correlation, but nothing to be alarmed about. In terms of applying it to the test set, it has a root mean squared error (RMSE) of 0.0224 and the errors are normally distributed.

Overall, the model fits pretty well, but there is enough variance not accounted for that it serves its intended use–to provide a solid expected VOL for a player but not overfitting such that the potential unique talents of a hitter to be more consistent would be masked.

From here, calculating our new corrected VOL measure (corrVOL) is quite simple.

### corrVOL

corrVOL is calibrated so that average–or, meeting expectations–is equal to 100.

**corrVOL = VOL/expectedVOL * 100**

As an example, if a given hitter has a VOL of 0.7, but his expectedVOL is 0.5, the corrVOL would be 140, meaning his VOL is 40 percent higher than what one would expect given that hitter’s wRC, games played, and plate appearances per game. By comparison, another player with a VOL 0f 0.3 and an expectedVOL of 0.5 would have a corrVOL of 60, or 40 percent lower than expected. This latter player would be performing at an elite rate in terms of how evenly he distributes his runs, while the former player would be the most volatile.

I applied the model to all hitters since 1974 who appeared in at least 100 games in a season and then used their expectedVOL to calculate their corrVOL.

Generally speaking, corrVOL is normally distributed across hitters in any given season (1981 and 1994 have been excluded given the strikes in those years):

The new metric does achieve what I was hoping for, which is a metric that is not simply a reflect of a player’s playing time and overall run creation.

Here’s our same correlation matrix from earlier, but now with corrVOL added in. As you can see, we’ve essentially wiped out any dependence on playing time and run creation:

Of course, one of the first things we want to know is who had the most and least volatile seasons according to the new metric?

Let’s start with the least volatile, or most consistent, hitters:

Names | Season | Games | PA per Game | wRC | wRC+ | VOL | expectedVOL | corrVOL |
---|---|---|---|---|---|---|---|---|

Rickey Henderson | 1992 | 116 | 4.3 | 84.1 | 158 | 0.504 | 0.609 | 82.8 |

Kevin Seitzer | 1988 | 149 | 4.3 | 91.2 | 125 | 0.525 | 0.619 | 84.8 |

Willie Randolph | 1991 | 122 | 4.2 | 76.2 | 131 | 0.530 | 0.624 | 85.0 |

Orlando Palmeiro | 2006 | 103 | 1.2 | 9.2 | 59 | 0.617 | 0.721 | 85.6 |

Rickey Henderson | 1989 | 150 | 4.5 | 102.8 | 138 | 0.518 | 0.603 | 85.9 |

Tony Phillips | 1993 | 151 | 4.7 | 117.7 | 136 | 0.503 | 0.583 | 86.3 |

Toby Harrah | 1985 | 123 | 4.2 | 83.0 | 133 | 0.532 | 0.615 | 86.5 |

Miguel Dilone | 1980 | 128 | 4.4 | 79.9 | 123 | 0.539 | 0.621 | 86.8 |

Rickey Henderson | 1995 | 112 | 4.3 | 80.1 | 130 | 0.532 | 0.612 | 87.0 |

Tony Gwynn | 1994 | 110 | 4.3 | 94.8 | 166 | 0.514 | 0.591 | 87.0 |

Tim Raines | 1986 | 151 | 4.4 | 111.0 | 149 | 0.518 | 0.594 | 87.3 |

Tony Gwynn | 1984 | 158 | 4.3 | 102.6 | 144 | 0.532 | 0.610 | 87.3 |

Paul Molitor | 1987 | 118 | 4.6 | 112.3 | 165 | 0.498 | 0.571 | 87.3 |

Orlando Palmeiro | 2007 | 101 | 1.2 | 9.7 | 67 | 0.630 | 0.719 | 87.6 |

Chuck Knoblauch | 1994 | 108 | 4.6 | 78.3 | 117 | 0.534 | 0.610 | 87.6 |

Joe Morgan | 1975 | 146 | 4.4 | 127.8 | 176 | 0.498 | 0.568 | 87.6 |

Miguel Cabrera | 2011 | 161 | 4.3 | 135.5 | 177 | 0.498 | 0.568 | 87.7 |

Lenny Harris | 2001 | 103 | 1.4 | 7.5 | 44 | 0.634 | 0.722 | 87.8 |

Rod Carew | 1979 | 109 | 4.5 | 70.8 | 127 | 0.545 | 0.621 | 87.8 |

Ken Griffey | 1981 | 101 | 4.4 | 56.6 | 121 | 0.558 | 0.635 | 87.8 |

Rickey Henderson’s 1992 season comes in as the least volatile based on this method. Henderson put up some great numbers that year–his third best wRC+ in his career–with a VOL of 0.504 and an expectedVOL of 0.609. His corrVOL was 82.8, meaning compared to what we’d expect he was slightly over 17 percent less volatile.

To put that season into perspective, below is a scatter plot of all players with an expectedVOL between 0.607 and 0.611. All of Henderson’s seasons are shaded red. We can see that not only was Henderson’s actual volatility far lower than other players with similar expectations, but his rate of production taking into account league and park was also one of the best.

Henderson shows up in the top 20 three times, which is more than any other player.

We also can compare Henderson’s 1992 season visually to the nearly 10,000 other seasons since 1974 using Lorenz Curves:

Even without making corrections for his playing time and wRC, it’s clear Henderson’s seasons was incredibly consistent.

What about the most volatile?

Names | Season | Games | PA per Game | wRC | wRC+ | VOL | expectedVOL | corrVOL |
---|---|---|---|---|---|---|---|---|

Manny Ramirez | 1998 | 150 | 4.4 | 123.6 | 144 | 0.643 | 0.576 | 111.6 |

Matt Holliday | 2016 | 110 | 3.9 | 56.1 | 109 | 0.719 | 0.645 | 111.5 |

Miguel Cabrera | 2016 | 158 | 4.3 | 120.3 | 152 | 0.644 | 0.586 | 109.8 |

Jose Canseco | 1994 | 111 | 4.5 | 89.4 | 134 | 0.656 | 0.597 | 109.8 |

Vinny Castilla | 2001 | 146 | 4.0 | 71.9 | 93 | 0.708 | 0.645 | 109.7 |

Aramis Ramirez | 2010 | 124 | 4.1 | 57.9 | 94 | 0.713 | 0.650 | 109.7 |

Ryan Braun | 2012 | 154 | 4.4 | 125.7 | 159 | 0.632 | 0.576 | 109.7 |

Albert Belle | 1992 | 153 | 4.3 | 86.3 | 121 | 0.689 | 0.628 | 109.7 |

Sammy Sosa | 1995 | 144 | 4.4 | 89.0 | 115 | 0.676 | 0.618 | 109.3 |

Troy Glaus | 2006 | 153 | 4.1 | 96.9 | 119 | 0.672 | 0.616 | 109.2 |

Sammy Sosa | 2001 | 160 | 4.4 | 156.4 | 186 | 0.588 | 0.539 | 109.1 |

Tommy Medica | 2014 | 102 | 2.5 | 27.3 | 97 | 0.750 | 0.688 | 109.1 |

Mike Schmidt | 1975 | 158 | 4.3 | 107.7 | 142 | 0.658 | 0.603 | 109.1 |

Chris Parmelee | 2013 | 101 | 3.3 | 32.3 | 84 | 0.736 | 0.675 | 109.0 |

Russell Branyan | 2001 | 109 | 3.3 | 48.0 | 106 | 0.718 | 0.659 | 108.9 |

Harold Baines | 1984 | 147 | 4.3 | 103.9 | 143 | 0.655 | 0.601 | 108.9 |

John Olerud | 1996 | 125 | 3.8 | 73.8 | 116 | 0.687 | 0.631 | 108.8 |

Rafael Palmeiro | 2001 | 160 | 4.5 | 127.3 | 139 | 0.628 | 0.577 | 108.8 |

Cecil Fielder | 1994 | 109 | 4.4 | 70.1 | 111 | 0.677 | 0.622 | 108.8 |

Dan Johnson | 2007 | 117 | 4.2 | 63.2 | 106 | 0.693 | 0.638 | 108.7 |

Manny Ramirez’s 1998 takes the title here with a corrVOL of 111.6. Manny put up his usual strong numbers–a 144 wRC+–but his expectedVOL was far lower than is actual VOL (0.643 versus 0.576).

We also can get a sense of how more or less consistent two players were by looking at their Lorenz Curves together. Let’s compare Henderson’s 1992 season to Ramirez’s 1998 season:

When you look at their games played ordered from least wRC to most, Henderson already had accumulated a positive wRC after his first 40 percent of games played, while Ramirez wouldn’t cross that threshold until his first 57 percent. Ramirez had more big games than Henderson (17 percent of his games resulted in a wRC at or over 2.0 compared to Henderson’s 3.5 percent), but Henderson distributed the runs he created far more evenly.

And in the end, that’s really what VOL and corrVOL are trying to capture–the extent to which a player produces the same or similar from an offensive standpoint on a day-to-day basis, or the extent to which they are more “boom or bust” game to game. Would I take Ramirez’s 123.6 wRC in 1998? You bet I would. But I might have preferred Larry Walker’s 120 given that his corrVOL was 89.2–one of the best marks historically. Walker provided similar production to Ramirez, but Walker was more consistent from a game-to-game standpoint.

In terms of players with the least volatile production over longer stretches, the top five and bottom five since 1974 with more than five seasons of 100 games played are (weighted average based on PA each season):

Name | Seasons | Average corrVOL | Average wRC+ |
---|---|---|---|

Least Volatile | |||

Rickey Henderson | 20 | 94.3 | 134 |

Brett Butler | 14 | 94.5 | 118 |

Rod Carew | 10 | 94.6 | 139 |

Tony Gwynn | 16 | 94.7 | 133 |

Willie Randolph | 15 | 94.8 | 111 |

Most Volatile | |||

Hank Blalock | 5 | 105.3 | 99 |

Jay Bruce | 9 | 104.7 | 107 |

Ron Kittle | 5 | 104.5 | 103 |

Lee Stevens | 7 | 104.4 | 96 |

Cecil Fielder | 8 | 104.3 | 120 |

Henderson not only has the best season on record in terms of consistency, but he’s the best over the course of a career, as well. Seemingly boom-or-bust sluggers like Jay Bruce and Cecil Fielder make an appearance in the bottom five, which, just based on my observations, seems to pass the smell test.

### Wrapping Up

As I’ve stated previously, being consistent isn’t necessarily an inherently good or bad thing. There is some evidence that from an offensive standpoint it may be better to be consistent, all things being equal. But in terms of individual players, being a poor hitter but distributing that poor performance evenly over the course of the year likely isn’t desirable. More likely, players with lesser offensive skills that are more volatile may be preferable compared to similar players that are more consistent, since there is a potential upside to the former in a handful of games during the season.

Front offices may use VOL and corrVOL as another piece of information to consider about a player when making roster decisions. It likely is preferable to compare players with similar offensive abilities by whether they tend to distribute their production consistently or not. As we’ve seen above–and is evident when looking at the data historically–players who are similar in almost every other way can differ in terms of their volatility. Deciding whether to go after a player based on volatility first is not advisable, but using it as one way to decide between a number of similar players could be advantageous.

Besides front offices, we can imagine how VOL and corrVOL can be useful from a fantasy standpoint. Owners who are attempting to build rosters based on different strategies may want another way to gauge whether a player will give them similar production on a daily basis or be more of boom-bust performer. As with front offices, VOL and corrVOL can help tip the scales towards one player versus another when their projected annual production is similar.

### References & Resources

- All code for this work can be found on GitHub. Note that the raw game-by-game data is not included, as that was acquired from the FanGraphs database.
- Code and materials from my Saber Seminar presentation in 2015.
- A sortable table of all seasons from 1974 to the present can be found here.
- Career averages for corrVOL can be found here.
- Thanks again to Sean Dolinar for the Gini function.
- Thanks as well to all the generous attendees at Saber Seminar 2015 who offered suggestions to improve this project. Special thanks to Keith Woolner for all his thoughts and ideas, which spurred me to create the corrVOL measure and approach. Any and all issues with this approach are mine alone.

Great article! It got me curious – regarding teams, is there a correlation between Gini coefficients for runs scored or runs allowed and the amount of variance between a team’s record and a team’s Pythagorean record? My hypothesis would be that the lower the Gini coefficients, the less a team’s record would vary from their Pythagorean record.

I initially looked at the team level and how consistency might help explain the gap between actual and expected W% here: http://www.hardballtimes.com/the-value-of-inconsistent-play-in-major-league-baseball/

Looks like the hitters on the list of most volatile seasons also had a higher instance of PED. I count at least 7 likely-PED-enhanced seasons on the “most volatile” chart.

There’s a pretty similar player profile among most of the least-volatile seasons: Players who’s primary run creating skill was hall of fame level on base ability without significant power. Henderson, Raines, Gwynn, and to a lesser extent Butler, Randolph, are all that kind of player.

This makes sense: home runs are more volatile production because by their nature they’re worth more than a run each time, so players who hit a lot of home runs are more likely to have less-equal distribution of their run values, especially since those will sometimes be distributed in the same game.

The highest-volatility players are all sluggers, usually with less than incredible on base skills.

I don’t know that this is really as instructive in looking at most/least volatile players as we’d like. I suspect we’d have to narrow the profile (most and least volatile power hitters, etc) in order to really glean meaningful differences.

There’s also the question of whether completely egalitarian distribution is actually the best. We don’t regard a game where a player goes 3-4 with 2 HR and a 2B as a waste of some of that production – while sometimes that happens in a 15-2 blowout, its just as likely to happen in a 7-6 win where you need all the runs. I would go so far as to suggest that for most teams, a completely even distribution of run scoring would actually increase their variance in overall results, since it would probably increase the number of close games they play, and we know that is generally something that creates more opportunities for results that don’t match expected W/L metrics.

Would also love you to do this for the other side, as I feel it would be of higher importance (and understanding) to do this for pitching.

Pitching seems to have a lot of statistics that correlate with outcome (H to R, WHIP to FIP), whereas hitting would still require prior evidence of evaluating the player. I had done this with fantasy baseball, where I did a linear count of pitchers who threw less than 5 innings more than twice in a row, and who gave up more than 5 runs (I think average run per team was around 4.6XX). I don’t know statistics so was what I had.

I kept away from players who did either of these things more than 10% of the time. These aspects demonstrate forcing a bullpen to do work (possibly wasting arms in a loss, which could lead to future inefficiencies), forcing a bullpen to do work again (2X in row signifies you weren’t good enough that an already tired bullpen has to work again, possibly in a loss), and five runs is usually a sign that requires great offensive potential, but which most times places your team at a disadvantage.

Think there is something more complex that some teams recognize and don’t mind taking on a Mike Leake, or Fiers, Miley. I think we might find some good names if you use your Vol measurements on pitching.

Any idea how much of these measures, once you factor out level of overall offensive performance, is noise and how much is “skill” (i.e. is repeatable)? For example, two players with the same RC+ and one is more volatile than the other in one full season, how much should we expect that difference in volatility to continue? Without estimating the ratio of signal to noise we’re wasting our time using it for future decisions, right?

I did that in the past with the previous measure. Haven’t done it yet for the corrVOL version; can’t do everything at once.

Hello, I am a graduate student in Graduate Institute of Statistics in Taiwan and interested in sabermetrics and data analysis techniques applied on baseball domain. How could I get access to the FanGraphs’ database? Or I have to construct my own retrosheet MySQL database introduced on FG’s website?