Introducing PrOPS
Have you ever been following a team and noticed players who seem to be continually lucky or unlucky? Maybe there’s a player who continually dinks the ball between the infield and the outfield. And what about the guy whose spraying liners all over the field but can’t crack the Mendoza line, because there always seems to be a defender in the right spot? There’s a good chance that most of the noisy outs and swinging bunts come out in the laundry by season’s end — they may not entirely cancel out for all players — but the numbers we look at now to evaluate hitters, such as OPS, are tainted. And while there is no such thing as a perfect metric to evaluate how well a player is playing absent luck, I think there may be a way to get a better grasp on how well players are performing than just relying on the standard raw statistics.
Early in the season, things haven’t had a chance to even out yet. We see historically poor players putting up good numbers (Brian Roberts), and historically good players putting up bad numbers (Bernie Williams). How much of a player’s April OPS is a product of chance, and how much reflects the quality of play? Unfortunately, most of the data we use to evaluate players is based on scorebook outcomes (single, double, walk, etc.), and therefore the numbers themselves reflect both random chance and the quality of play. A double gained on an outfielder’s untied shoes counts the same as a liner to the gap in the stats. And we would expect players who are putting up the latter type of doubles are better than those reaping the benefits of funny bounces and bad personnel decisions. The recent influx of new data provided by Baseball Info Solutions through The Hardball Times provides a possible way to separate out good/bad play from lucky/unlucky outcomes.
I set out to estimate the impact of certain areas of player performances on their season’s OPS using the 2004 season. In particular, I was curious in the types of batted balls (line drives, flyballs, etc.) players were hitting. Is there a correlation between these variables and hitting success? If so, maybe we can learn something about how “good” players are actually playing by looking at this data. To begin, I looked at several different combinations of variables to find the model that best predicted player OPS in 2004 using linear regression estimation. This model uses estimated weights of hitting performances that are not necessarily officially scored outcomes to generate a predicted OPS, or PrOPS. With this model I can evaluate players by the process with which they reached these outcomes; thereby, hopefully separating useful information from the noise of raw statistics.
The model that best predicted a player’s OPS in 2004 included the following variables:
- Line drives per batted ball
- Groundball-to-flyball ratio
- Walk rate
- Hit-by-pitch rate
- Strikeout rate
- Home run rate
- Home park of the player
While many of these variables are official scorebook outcomes, we know that players do happen to have skills in these areas, and that these skills translate directly and indirectly into a player’s OPS. I am most concerned with the random bounces of batted balls in play, which is why I included line drives and the groundball-to-flyball ratio in the model. it turns out that these variables are important in predicting a hitter’s OPS. The R2 of the overall regression model was .81, which indicates that about 80% of the differences in OPS from player to player were explained by the changes in the included variables. And while we think of luck canceling out over the course of the season, here are lists of the top-25 under/over-performers of 2004 measured as a percent of the player’s actual OPS (minimum 400 plate appearances).
OPS: Actual OPS for 2004
PrOPS: Predicted OPS
PrOPS+: Absolute difference between OPS and PrOPS — a positive PrOPS+ indicates observed performance better than predicted while a negative PrOPS+ indicates observed performance worse than predicted.
PrOPS%: The difference between OPS and PrOPS expressed as a percent of OPS.
2004 Top-25 Under-Performers
Rank First Last OPS PrOPS PrOPS+ PrOPS% 1 Desi Relaford 0.601 0.708 -0.106 -17.72% 2 Scott Spiezio 0.634 0.740 -0.105 -16.63% 3 Rafael Palmeiro 0.796 0.898 -0.102 -12.86% 4 Jason Phillips 0.624 0.702 -0.079 -12.66% 5 Brad Ausmus 0.631 0.704 -0.073 -11.52% 6 David Eckstein 0.671 0.737 -0.066 -9.80% 7 Chipper Jones 0.847 0.930 -0.083 -9.79% 8 Tony Batista 0.726 0.793 -0.066 -9.11% 9 Joe Crede 0.717 0.781 -0.064 -8.91% 10 Barry Bonds 1.422 1.537 -0.115 -8.11% 11 Rob Mackowiak 0.739 0.799 -0.060 -8.05% 12 Craig Counsell 0.648 0.700 -0.052 -8.04% 13 Jose Castillo 0.665 0.718 -0.053 -7.95% 14 Aaron Miles 0.697 0.751 -0.054 -7.81% 15 Placido Polanco 0.786 0.847 -0.061 -7.72% 16 Steve Finley 0.828 0.891 -0.063 -7.63% 17 Ramon Hernandez 0.818 0.879 -0.062 -7.56% 18 A.J. Pierzynski 0.729 0.783 -0.055 -7.50% 19 Toby Hall 0.666 0.716 -0.050 -7.45% 20 Alex Gonzalez 0.689 0.739 -0.050 -7.24% 21 Dmitri Young 0.816 0.875 -0.059 -7.21% 22 Sammy Sosa 0.849 0.909 -0.060 -7.03% 23 Bill Mueller 0.811 0.868 -0.056 -6.93% 24 Orlando Cabrera 0.631 0.672 -0.042 -6.60% 25 Matt Lawton 0.787 0.839 -0.052 -6.59%
2004 Top-25 Over-Performers
Rank First Last OPS PrOPS PrOPS+ PrOPS% 1 J.T. Snow 0.958 0.846 0.112 11.72% 2 Ichiro Suzuki 0.869 0.774 0.095 10.93% 3 Melvin Mora 0.981 0.895 0.086 8.74% 4 Jack Wilson 0.794 0.728 0.066 8.34% 5 Erubiel Durazo 0.919 0.842 0.076 8.30% 6 Aaron Rowand 0.905 0.830 0.075 8.26% 7 Lyle Overbay 0.862 0.792 0.070 8.17% 8 Todd Helton 1.088 1.003 0.085 7.84% 9 David Newhan 0.814 0.753 0.062 7.57% 10 Carlos Guillen 0.921 0.853 0.069 7.46% 11 Travis Hafner 0.993 0.919 0.074 7.42% 12 Mark Loretta 0.886 0.822 0.064 7.21% 13 Lance Berkman 1.016 0.944 0.072 7.07% 14 Chone Figgins 0.770 0.717 0.052 6.79% 15 Juan Rivera 0.828 0.772 0.056 6.76% 16 Alexis Rios 0.720 0.674 0.047 6.49% 17 Carl Crawford 0.781 0.732 0.050 6.34% 18 Ivan Rodriguez 0.893 0.837 0.056 6.23% 19 Jimmy Rollins 0.803 0.753 0.050 6.19% 20 Ray Durham 0.848 0.798 0.050 5.89% 21 Jason Bay 0.907 0.855 0.052 5.78% 22 Joe Randa 0.751 0.708 0.043 5.76% 23 Juan Uribe 0.833 0.786 0.046 5.58% 24 Bobby Abreu 0.971 0.918 0.053 5.50% 25 Albert Pujols 1.072 1.013 0.059 5.49%
Desi Relaford wins the award for worst luck of 2004, while J.T. Snow had the best luck. Now, when I say “luck” I want to be clear as to what I mean. Given the batting statistics included in the regression, PrOPS tells us what all other players in MLB did, on average, based on the variables included in the regression model. You can think of PrOPS as similar to DIPS for pitchers. It is entirely possible that some of these players got lucky with hitting line drives, striking out, etc.; however, given their actual numbers for these events we would have expected them to perform much differently.
Now that I have established a baseline impact for the batting statistics on player OPS, I can apply the model to 2005. With only about a fifth of games finished for the season, it’s much less likely that good and bad bounces have had time to even out. The model can help us pull out how well players are actually playing by removing some luck. I’m not saying this is perfect, and players may be getting lucky with their hit types, but it’s all we’ve got to work with at the moment.
Now, let’s use the model to tell us what OPS a player ought to have in the current season based on their hitting peripherals. I have calculated PrOPS stats for every MLB player with at least one plate appearance. You can view the stats by team for the AL and NL. Here are the lists of the top under/over performers for 2005.
2005 Top-25 Under-Performers
Rank First Last OPS PrOPS PrOPS+ PrOPS% 1 Tike Redman 0.454 0.806 -0.353 -77.75% 2 Aaron Boone 0.457 0.751 -0.294 -64.44% 3 Jose Molina 0.470 0.770 -0.300 -63.82% 4 Luis Rivas 0.477 0.769 -0.293 -61.35% 5 Miguel Olivo 0.362 0.568 -0.206 -56.77% 6 Nomar Garciaparra 0.405 0.619 -0.215 -53.13% 7 Keith Ginter 0.554 0.842 -0.288 -51.96% 8 Wilson Valdez 0.466 0.685 -0.219 -46.94% 9 Jay Payton 0.579 0.851 -0.272 -46.91% 10 Jack Wilson 0.466 0.675 -0.209 -44.80% 11 James Hardy 0.457 0.639 -0.182 -39.74% 12 J.D. Closser 0.470 0.649 -0.179 -38.15% 13 Ty Wigginton 0.523 0.703 -0.180 -34.48% 14 John Buck 0.474 0.637 -0.163 -34.39% 15 Quinton McCracken 0.515 0.682 -0.166 -32.31% 16 Jason Kendall 0.573 0.754 -0.180 -31.48% 17 Jose Hernandez 0.567 0.741 -0.174 -30.66% 18 Richard Hidalgo 0.562 0.730 -0.169 -30.00% 19 Yadier Molina 0.475 0.616 -0.140 -29.53% 20 Placido Polanco 0.592 0.766 -0.174 -29.43% 21 Jason LaRue 0.536 0.685 -0.149 -27.79% 22 Casey Blake 0.669 0.851 -0.183 -27.30% 23 Marcus Thames 0.654 0.833 -0.178 -27.27% 24 Eric Byrnes 0.633 0.803 -0.170 -26.82% 25 Brad Ausmus 0.567 0.710 -0.143 -25.26%
I would look for these guys to rebound. It’s not that we really needed a regression model to tell us this, but we can see that players who put up similar peripherals in 2004 performed much better than these guys have shown in their stats so far this season. So hang in there Tike, because better days are coming if you keep hitting like you have been.
What about players who may be looking for a fall.
2005 Top-25 Over-Performers
Rank First Last OPS PrOPS PrOPS+ PrOPS% 1 Jason Ellison 1.005 0.754 0.251 24.99% 2 Carlos Guillen 1.015 0.772 0.243 23.97% 3 Bill Hall 0.796 0.622 0.174 21.84% 4 Brad Wilkerson 0.851 0.678 0.173 20.32% 5 Alex Sanchez 0.829 0.669 0.160 19.32% 6 Ryan Freel 0.917 0.745 0.172 18.78% 7 Ricky Ledee 0.907 0.740 0.167 18.46% 8 Craig Biggio 0.873 0.716 0.156 17.89% 9 Vinny Castilla 0.901 0.745 0.156 17.27% 10 Shea Hillenbrand 0.894 0.747 0.147 16.48% 11 Derrek Lee 1.224 1.022 0.202 16.47% 12 Justin Morneau 1.289 1.077 0.212 16.44% 13 Freddy Sanchez 0.787 0.659 0.127 16.17% 14 Carlos Beltran 0.881 0.752 0.129 14.69% 15 Rob Mackowiak 0.770 0.659 0.110 14.31% 16 Nook Logan 0.816 0.700 0.115 14.12% 17 Kenny Lofton 0.938 0.806 0.132 14.12% 18 Cliff Floyd 1.047 0.900 0.146 13.99% 19 Mike Sweeney 1.016 0.878 0.138 13.56% 20 Brandon Inge 0.877 0.760 0.118 13.39% 21 Nick Johnson 0.924 0.801 0.124 13.38% 22 Frank Catalanotto 0.768 0.666 0.101 13.19% 23 Clint Barmes 1.082 0.940 0.142 13.16% 24 Jacque Jones 0.990 0.860 0.130 13.09% 25 Chipper Jones 1.111 0.967 0.145 13.01%
Jason Ellison, with his .500 BABIP, certainly won’t continue at his same pace. And when Alex Sanchez returns to being Alex Sanchez by the All-Star break, let’s not say it was the steroids wearing off. These are the guys you may want to try to unload in your fantasy league.
What about two guys who are off to hot and cold starts, Brian Roberts and Bernie Williams? Are these just statistical anomalies? Well, they might be, but they would have to be anomalies in the batting peripherals that go into the model. Roberts is putting up numbers a little better than predicted (OPS =1.111 , PrOPS = 1.058), but he’s still playing well. Bernie is playing poorly (PrOPS = 0.703), but not as bad as his stats show (OPS = 0.607).
Feel free to take a look around on the stats pages and tell me what you see. This is all new stuff, and I will make changes to the model based on any discoveries people make. Personally, I’m just happy to see Johnny Estrada is due for a little rebound (PrOPS = 0.746 versus OPS = 0.632).