corrVOL: Updating How We Can Measure Hitter Volatility by Bill Petti March 28, 2017 The dark green line is Rickey Henderson. (via Bill Petti) Introduction Two years ago at at Saber Seminar, I presented my most recent attempt at quantifying how inconsistent or volatile hitters were at distributing the runs they create over the course of the season. The method built off an approach I devised at the team level, essentially generating Gini coefficients for each team over the course of a season to see how evenly teams distributed their runs allowed and runs scored. Lower scores indicate a more even distribution; higher scores, a more unequal distribution. As I have written before, volatility is not the same thing as being streaky. Streakiness is about how extreme positive and negative performances lump together over the course of a season. Essentially, it’s the clustering of good and bad performances over long stretches. Volatility is different. It is more about the overall distribution of a player’s daily performance relative to the overall runs they create over the course of a season. If a player creates 81 runs in a given season, did he create half of a run every game (perfectly consistent/equal distribution of runs), or did he create 80 percent of his runs in only 20 percent of his games? That’s the question volatility, and my VOL metric, saught to answer and offer a means to quantify. After the presentation, I benefited from some fantastic feedback from a number of people, and as a result I have updated my VOL metric in a number of ways. In this article I walk through those changes. Gini Coefficients The biggest change to the VOL metric rolled out during that Saber Seminar presentation is the use of Gini coefficients to calculate volatility. Gini coefficients typically are used to measure how equally some value–usually wealth or income–is distributed amongst the individual citizens in a given country. Let’s look at a simplified example to better understand how they work and how they can be used to compare the relative inequality of two distributions. Assume we have two counties, Egalistan and Concentratistan. The two countries each have 15 citizens, and the wealth in each country is distributed as follows: Example: Distribution of Wealth Egalistan Concentratistan $30 $1 $35 $1 $40 $1 $45 $3 $50 $3 $55 $3 $60 $4 $65 $5 $70 $5 $75 $16 $80 $55 $85 $78 $90 $200 $95 $200 $100 $400 Now, each country has the same aggregate wealth ($975), but how that wealth is distributed is quite different. Gini coefficients provide us with a way to quantify the eveneness of these two distributions and compare them.Gini Coefficients range from 0-1, with 0 being the most equally distributed and 1 the least. When we compare our two countries, Egalistan has a gini coefficient of .21, while Concentratistan has a coefficient of .80. We also can visualize the these distributions using a Lorenz Curve. The curve maps the cumulative wealth for a given cumulative percentage of individual citizens for any given point. Here are the Lorenz Curves for both of our fictional countries: We can see how the area between the curve and the line of equality is far smaller for Egalistan than Concetratistan. In Egalistan, the bottom 80 percent of the citizens hold 71 percent of the wealth, whereas in Concentratistan the bottom 80 percent of citizens only hold 18 percent of the wealth. This is essentially a visualization of our Gini coefficients; in fact, Gini coefficients are derived from Lorenz Curves. Measuring the Volatility of Hitters So now that we understand how Gini coefficients work, it’s not that difficult to translate their application to individual hitters in the following way: Hitters = Countries Games = Citizens Runs = Income For data, I used individual game data from 1974 through 2016. These data were acquired from the FanGraphs database, but you could recreate the analysis from individual game data using Retrosheet as a source. For runs I chose to use Weighted Runs Created (wRC) for hitters calculated on a game-by-game basis. I used the fg_guts function from my baseballr package to obtain wOBA weights and constants for each year from FanGraphs and to calculate daily wOBA for each hitter. wOBA then can be converted to wRC using the following equation: wRC = (((wOBA-League wOBA)/wOBA Scale)+(League R/PA))*PA Once we have the wRC for each game for each hitter, we easily can calculate the season VOL for each hitter simply by calculating the Gini coefficient for each hitter-season combination. The updated version presented at Saber Seminar dealt with the negative occurrence of wRC by adding a constant to the wRC calculation. I won’t go into those calculations here, but if need be you can review that approach here. A few commenters mentioned there are some approaches that do a better job of handling negative values than simply using a constant. Our own Sean Dolinar had done some recent work with Gini coefficients and came across an approach to handle the negative values. Sean was kind enough to share some R code, and I’ve used that to update my approach. Here is the function used to calculate the Gini coefficients for VOL in this version: Gini_neg <- function(Y) { Y <- sort(Y) N <- length(Y) u_Y <- mean(Y) top <- 2/N^2 * sum(seq(1,N)*Y) - (1/N)*sum(Y) - (1/N^2) * sum(Y) min_T <- function(x) { return(min(0,x)) } max_T <- function(x) { return(max(0,x)) } T_all <- sum(Y) T_min <- abs(sum(sapply(Y, FUN = min_T))) T_max <- sum(sapply(Y, FUN = max_T)) u_P <- (N-1)/N^2*(T_max + T_min) return(top/u_P) } This solution addressed one of the issues, but there was another one I discussed at the conference and hadn’t really decided how to handle: the high correlation between VOL and a number of metrics, namely plate appearances, games, wRC, and wRC+ for everyday players. The correlation between wRC and VOL overall seemed reasonable (-0.23), but this somewhat masked what seemed like a bias; the relationship becomes much stronger for players who appear in more games and have more plate appearances per game. If we restrict to players who appear in at least 100 games, that correlation jumps to -0.40. The relationship gets even stronger when we break players up into groups based on how many plate appearances per game they average: And here’s a scatter plot with each point colored based on how many plate appearances per game a hitter averaged: It’s pretty clear that VOL appears to be mapping quite closely to the overall quality of a hitter. Better hitters tend to play in more games and get more plate appearances, and those players that can accumulate more wRC in a season appear to be more consistent/less volatile in terms of how they distribute those runs per game. Now, maybe that’s true–hitters that produce more runs, both in terms of absolute runs and runs per game or plate appearance, may also be far more consistent than lesser hitters. However, it could also be that by virtue of getting more playing time, their VOL scores are being reduced artificially in some way as a function of how the Gini coefficients are being calculated. This makes sense as Gini coefficients are known to be sensitive when calculated over smaller n-sizes. In a baseball season, we will have at most 162 games. When we restrict to players with at least 100 games played in a season, the strength of the relationships jumps substantially: To address this issue, I made one additional change to the VOL metric. I had a few suggestions about how to approach this, but the one I chose was to model VOL as a function of–potentially–games played, plate appearances, and runs created. The idea is to create an expected VOL for a given player based on how often he plays and the level at which he produces, and then use that expectation to normalize his VOL, telling us whether that player is more or less volatile than we would expect given these other attributes. I also am restricting the modeling and overall calculation of VOL to players who had at least 100 games played in a season to help deal with the “Gini coefficients with smaller n-sizes” issue. Expected VOL Modeling I decided to go with a simple linear model to find the expected VOL of a player. I played around with a few versions, but this version seemed the appropriate: expectedVOL ~ wRC + PA_G + Games wRC, plate appearances per game, and games played all have a very high correlation to VOL, although it differs by feature: I took the 9925 player seasons in the reduced data set and split the data into a training and test set, with 70 percent of the data going to the training set and 30 percent to the test set. Here is the output for expectedVOL: Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.67923794 0.00281500 241.293 <0.0000000000000002 *** wRC -0.00132221 0.00001655 -79.909 <0.0000000000000002 *** PA_G -0.00685035 0.00069641 -9.837 <0.0000000000000002 *** Games 0.00060503 0.00002020 29.947 <0.0000000000000002 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.02213 on 6944 degrees of freedom Multiple R-squared: 0.666, Adjusted R-squared: 0.6658 F-statistic: 4615 on 3 and 6944 DF, p-value: < 0.00000000000000022 The overall model has an adjusted r2 of 0.67 on the training set, and the variable inflation factors for each feature range from 1.9 to 2.73. So, some mild correlation, but nothing to be alarmed about. In terms of applying it to the test set, it has a root mean squared error (RMSE) of 0.0224 and the errors are normally distributed. Overall, the model fits pretty well, but there is enough variance not accounted for that it serves its intended use–to provide a solid expected VOL for a player but not overfitting such that the potential unique talents of a hitter to be more consistent would be masked. From here, calculating our new corrected VOL measure (corrVOL) is quite simple. corrVOL corrVOL is calibrated so that average–or, meeting expectations–is equal to 100. corrVOL = VOL/expectedVOL * 100 As an example, if a given hitter has a VOL of 0.7, but his expectedVOL is 0.5, the corrVOL would be 140, meaning his VOL is 40 percent higher than what one would expect given that hitter’s wRC, games played, and plate appearances per game. By comparison, another player with a VOL 0f 0.3 and an expectedVOL of 0.5 would have a corrVOL of 60, or 40 percent lower than expected. This latter player would be performing at an elite rate in terms of how evenly he distributes his runs, while the former player would be the most volatile. I applied the model to all hitters since 1974 who appeared in at least 100 games in a season and then used their expectedVOL to calculate their corrVOL. Generally speaking, corrVOL is normally distributed across hitters in any given season (1981 and 1994 have been excluded given the strikes in those years): The new metric does achieve what I was hoping for, which is a metric that is not simply a reflect of a player’s playing time and overall run creation. Here’s our same correlation matrix from earlier, but now with corrVOL added in. As you can see, we’ve essentially wiped out any dependence on playing time and run creation: Of course, one of the first things we want to know is who had the most and least volatile seasons according to the new metric? Let’s start with the least volatile, or most consistent, hitters: Least Volatile Individual Seasons, 1974-2016 Names Season Games PA per Game wRC wRC+ VOL expectedVOL corrVOL Rickey Henderson 1992 116 4.3 84.1 158 0.504 0.609 82.8 Kevin Seitzer 1988 149 4.3 91.2 125 0.525 0.619 84.8 Willie Randolph 1991 122 4.2 76.2 131 0.530 0.624 85.0 Orlando Palmeiro 2006 103 1.2 9.2 59 0.617 0.721 85.6 Rickey Henderson 1989 150 4.5 102.8 138 0.518 0.603 85.9 Tony Phillips 1993 151 4.7 117.7 136 0.503 0.583 86.3 Toby Harrah 1985 123 4.2 83.0 133 0.532 0.615 86.5 Miguel Dilone 1980 128 4.4 79.9 123 0.539 0.621 86.8 Rickey Henderson 1995 112 4.3 80.1 130 0.532 0.612 87.0 Tony Gwynn 1994 110 4.3 94.8 166 0.514 0.591 87.0 Tim Raines 1986 151 4.4 111.0 149 0.518 0.594 87.3 Tony Gwynn 1984 158 4.3 102.6 144 0.532 0.610 87.3 Paul Molitor 1987 118 4.6 112.3 165 0.498 0.571 87.3 Orlando Palmeiro 2007 101 1.2 9.7 67 0.630 0.719 87.6 Chuck Knoblauch 1994 108 4.6 78.3 117 0.534 0.610 87.6 Joe Morgan 1975 146 4.4 127.8 176 0.498 0.568 87.6 Miguel Cabrera 2011 161 4.3 135.5 177 0.498 0.568 87.7 Lenny Harris 2001 103 1.4 7.5 44 0.634 0.722 87.8 Rod Carew 1979 109 4.5 70.8 127 0.545 0.621 87.8 Ken Griffey 1981 101 4.4 56.6 121 0.558 0.635 87.8 Rickey Henderson’s 1992 season comes in as the least volatile based on this method. Henderson put up some great numbers that year–his third best wRC+ in his career–with a VOL of 0.504 and an expectedVOL of 0.609. His corrVOL was 82.8, meaning compared to what we’d expect he was slightly over 17 percent less volatile. To put that season into perspective, below is a scatter plot of all players with an expectedVOL between 0.607 and 0.611. All of Henderson’s seasons are shaded red. We can see that not only was Henderson’s actual volatility far lower than other players with similar expectations, but his rate of production taking into account league and park was also one of the best. Henderson shows up in the top 20 three times, which is more than any other player. We also can compare Henderson’s 1992 season visually to the nearly 10,000 other seasons since 1974 using Lorenz Curves: Even without making corrections for his playing time and wRC, it’s clear Henderson’s seasons was incredibly consistent. What about the most volatile? Most Volatile Individual Seasons, 1974-2016 Names Season Games PA per Game wRC wRC+ VOL expectedVOL corrVOL Manny Ramirez 1998 150 4.4 123.6 144 0.643 0.576 111.6 Matt Holliday 2016 110 3.9 56.1 109 0.719 0.645 111.5 Miguel Cabrera 2016 158 4.3 120.3 152 0.644 0.586 109.8 Jose Canseco 1994 111 4.5 89.4 134 0.656 0.597 109.8 Vinny Castilla 2001 146 4.0 71.9 93 0.708 0.645 109.7 Aramis Ramirez 2010 124 4.1 57.9 94 0.713 0.650 109.7 Ryan Braun 2012 154 4.4 125.7 159 0.632 0.576 109.7 Albert Belle 1992 153 4.3 86.3 121 0.689 0.628 109.7 Sammy Sosa 1995 144 4.4 89.0 115 0.676 0.618 109.3 Troy Glaus 2006 153 4.1 96.9 119 0.672 0.616 109.2 Sammy Sosa 2001 160 4.4 156.4 186 0.588 0.539 109.1 Tommy Medica 2014 102 2.5 27.3 97 0.750 0.688 109.1 Mike Schmidt 1975 158 4.3 107.7 142 0.658 0.603 109.1 Chris Parmelee 2013 101 3.3 32.3 84 0.736 0.675 109.0 Russell Branyan 2001 109 3.3 48.0 106 0.718 0.659 108.9 Harold Baines 1984 147 4.3 103.9 143 0.655 0.601 108.9 John Olerud 1996 125 3.8 73.8 116 0.687 0.631 108.8 Rafael Palmeiro 2001 160 4.5 127.3 139 0.628 0.577 108.8 Cecil Fielder 1994 109 4.4 70.1 111 0.677 0.622 108.8 Dan Johnson 2007 117 4.2 63.2 106 0.693 0.638 108.7 Manny Ramirez’s 1998 takes the title here with a corrVOL of 111.6. Manny put up his usual strong numbers–a 144 wRC+–but his expectedVOL was far lower than is actual VOL (0.643 versus 0.576). We also can get a sense of how more or less consistent two players were by looking at their Lorenz Curves together. Let’s compare Henderson’s 1992 season to Ramirez’s 1998 season: When you look at their games played ordered from least wRC to most, Henderson already had accumulated a positive wRC after his first 40 percent of games played, while Ramirez wouldn’t cross that threshold until his first 57 percent. Ramirez had more big games than Henderson (17 percent of his games resulted in a wRC at or over 2.0 compared to Henderson’s 3.5 percent), but Henderson distributed the runs he created far more evenly. And in the end, that’s really what VOL and corrVOL are trying to capture–the extent to which a player produces the same or similar from an offensive standpoint on a day-to-day basis, or the extent to which they are more “boom or bust” game to game. Would I take Ramirez’s 123.6 wRC in 1998? You bet I would. But I might have preferred Larry Walker’s 120 given that his corrVOL was 89.2–one of the best marks historically. Walker provided similar production to Ramirez, but Walker was more consistent from a game-to-game standpoint. In terms of players with the least volatile production over longer stretches, the top five and bottom five since 1974 with more than five seasons of 100 games played are (weighted average based on PA each season): Most and Least Volatile Careers, 1974-2016 Name Seasons Average corrVOL Average wRC+ Least Volatile Rickey Henderson 20 94.3 134 Brett Butler 14 94.5 118 Rod Carew 10 94.6 139 Tony Gwynn 16 94.7 133 Willie Randolph 15 94.8 111 Most Volatile Hank Blalock 5 105.3 99 Jay Bruce 9 104.7 107 Ron Kittle 5 104.5 103 Lee Stevens 7 104.4 96 Cecil Fielder 8 104.3 120 Henderson not only has the best season on record in terms of consistency, but he’s the best over the course of a career, as well. Seemingly boom-or-bust sluggers like Jay Bruce and Cecil Fielder make an appearance in the bottom five, which, just based on my observations, seems to pass the smell test. Wrapping Up As I’ve stated previously, being consistent isn’t necessarily an inherently good or bad thing. There is some evidence that from an offensive standpoint it may be better to be consistent, all things being equal. But in terms of individual players, being a poor hitter but distributing that poor performance evenly over the course of the year likely isn’t desirable. More likely, players with lesser offensive skills that are more volatile may be preferable compared to similar players that are more consistent, since there is a potential upside to the former in a handful of games during the season. Front offices may use VOL and corrVOL as another piece of information to consider about a player when making roster decisions. It likely is preferable to compare players with similar offensive abilities by whether they tend to distribute their production consistently or not. As we’ve seen above–and is evident when looking at the data historically–players who are similar in almost every other way can differ in terms of their volatility. Deciding whether to go after a player based on volatility first is not advisable, but using it as one way to decide between a number of similar players could be advantageous. Besides front offices, we can imagine how VOL and corrVOL can be useful from a fantasy standpoint. Owners who are attempting to build rosters based on different strategies may want another way to gauge whether a player will give them similar production on a daily basis or be more of boom-bust performer. As with front offices, VOL and corrVOL can help tip the scales towards one player versus another when their projected annual production is similar. References & Resources All code for this work can be found on GitHub. Note that the raw game-by-game data is not included, as that was acquired from the FanGraphs database. Code and materials from my Saber Seminar presentation in 2015. A sortable table of all seasons from 1974 to the present can be found here. Career averages for corrVOL can be found here. Thanks again to Sean Dolinar for the Gini function. Thanks as well to all the generous attendees at Saber Seminar 2015 who offered suggestions to improve this project. Special thanks to Keith Woolner for all his thoughts and ideas, which spurred me to create the corrVOL measure and approach. Any and all issues with this approach are mine alone.