Introducing: Quality of Opponent adjustments and CAPS (for pitchers) by Derek Carty January 5, 2009 Johan Santana shakes off one of his 33 home runs in 2007. Can the batters he faced explain his out-of-the-ordinary 14.5% HR/FB? (Icon/SMI) A few weeks ago, I discussed how we need to adjust stats based on players switching leagues. I didn’t explicitly say it at the time, but when analyzing baseball players, it is of the utmost importance to understand the context under which stats are accumulated. As I’m sure anyone reading this knows, it is far from enough to simply look at ERA. What some of you may not fully realize is that it isn’t even enough to analyze peripheral stats within a vacuum, anymore. Lots of fantasy players (and websites, for that matter) are starting to realize that stats like K/9 and BB/9 are better indicators of a pitcher’s true skill than ERA or WHIP, so we—as fantasy owners—need to take things a step further to keep our advantage over them. The context in which pitchers post these stats is something I’ve yet to see any other website address (aside from Baseball Prospectus, although all they provide is BA/OBP/SLG—not of much use to fantasy owners). Are they accumulated in the American League or the National League? A “pitcher’s” ballpark or a “hitter’s” ballpark? Against good batters or poor batters? That last one is the one that I’ll discuss today. If a pitcher faces a disproportionate number of Adam Dunn-type hitters, he is going to strike out and walk more batters than he should be. Because a pitcher has no ability to control the batters he faces, we can’t consider this a repeatable skill and must, therefore, neutralize a pitcher’s stat line based on the opposition he faces. Method To calculate the quality of opposition faced, I took the aggregrate year-end Marcels projection of every batter the pitcher faced in a given year. I then compared this to league average to arrive at a ‘quality of opposition index’ for each pitcher. I repeated this process for every stat that we care about for pitchers. I considered using Sal Baxamusa’s daily Marcels method to estimate true talent, but eventually landed on using year-end values. I think fellow THTer Colin Wyers put it best when he said, “In the vast majority of cases, on the size of a single season, you’re not going to have a lot of cases where a player’s true-talent level drastically changes midseason. And you’d get better results for rookies—a day-by-day Marcels of, say, Evan Longoria is going to be very inaccurate to begin the season.” That’s really all there is to this method. If you have any questions you think I didn’t address, feel free to let me know and I’d be happy to answer them. Now let’s check out our quality of opposition adjustments in action. Johan Santana +------+---------+-------+-------+--------+-------+-------+ | | | | | Actual | Adj. | Adj. | | YEAR | LAST | FIRST | IP | HR/FB | HR/FB | Index | +------+---------+-------+-------+--------+-------+-------+ | 2004 | Santana | Johan | 228.0 | 11.9 | 11.8 | 1.01 | | 2005 | Santana | Johan | 231.7 | 9.9 | 9.6 | 1.03 | | 2006 | Santana | Johan | 233.7 | 11.4 | 11.0 | 1.03 | | 2007 | Santana | Johan | 219.0 | 14.5 | 12.9 | 1.11 | | 2008 | Santana | Johan | 234.3 | 10.9 | 10.3 | 1.06 | +------+---------+-------+-------+--------+-------+-------+ One of the concerns some Mets fan had coming into 2008 was that Santana was very prone to the long-ball in 2007. Luckily he bounced back in 2008, as we should have expected based on the unstable nature of HR/FB, but if we had these stats back then, we could have nearly wrote 2007’s HR/FB off completely as bad luck. Santana was one of the unluckiest pitchers in baseball in 2007 in terms of opposition HR/FB, seeing inflation of 11 percent. If we neutralize his HR/FB, it’s still a little high at 12.9% but that is much closer to his career line and much easier to chalk up to random variation. Dan Haren +------+-------+-------+-----------+-------+--------+------+-------+ | | | | | | Actual | Adj. | Adj. | | YEAR | LAST | FIRST | TEAM | IP | K/9 | K/9 | Index | +------+-------+-------+-----------+-------+--------+------+-------+ | 2005 | Haren | Dan | Athletics | 217.0 | 6.8 | 7.0 | 0.97 | | 2006 | Haren | Dan | Athletics | 223.0 | 7.1 | 7.5 | 0.94 | | 2007 | Haren | Dan | Athletics | 222.7 | 7.8 | 8.3 | 0.93 | | 2008 | Haren | Dan | D'Backs | 216.0 | 8.6 | 8.8 | 0.97 | +------+-------+-------+-----------+-------+--------+------+-------+ Haren is an incredible example of why it’s important to consider context. Looking solely at his K/9, we would have thought he experienced a huge jump in 2008, and in projecting 2009, would expect a sizable regression. Bill James has him at 7.5, Marcels at 8.1, and Ron Shandler at 8.2. If we look at his adjusted numbers, though, we see that he’s actually been steadily increasing over the past four years, culminating in an 8.8 adjusted K/9 in his age 27 season. And if we were to apply the 0.57 AL to NL adjustment, his 2007 figure would have exactly matched his 2008 one, and he’d have been over 8.0 three years in a row. A simple three-year weighted average would put his K/9 at 8.6—much more optimistic and accurate than the three projections listed above. Javier Vazquez +------+-------+------+------+------+---------+------+-------+-------+ | YEAR | IP | QERA | K/9 | BB/9 | K/BB RI | xGB% | BABIP | HR/FB | +------+-------+------+------+------+---------+------+-------+-------+ | 2006 | 202.7 | 3.84 | 8.2 | 2.5 | 0.59 | 40 | 0.311 | 10.7 | | 2006 | 202.7 | 3.35 | 9.5 | 2.4 | 0.96 | 40 | 0.301 | 8.0 | +------+-------+------+------+------+---------+------+-------+-------+ | 2007 | 216.7 | 3.34 | 8.8 | 2.1 | 0.84 | 38 | 0.294 | 12.1 | | 2007 | 216.7 | 2.86 | 10.4 | 2.0 | 1.25 | 38 | 0.286 | 8.8 | +------+-------+------+------+------+---------+------+-------+-------+ | 2008 | 208.3 | 3.76 | 8.6 | 2.6 | 0.62 | 39 | 0.320 | 11.3 | | 2008 | 208.3 | 3.25 | 9.9 | 2.5 | 0.99 | 39 | 0.310 | 8.4 | +------+-------+------+------+------+---------+------+-------+-------+ Note: To read this table, the first line for each year is Vazquez’s actual numbers. The second line is his adjusted line based on quality of opposition, ballpark factors, and the league change. Vazquez is often talked about as an unlucky pitcher, but very few analysts notice that Vazquez has also been unlucky in the batters he’s faced. His strikeout numbers have been depressed by four, five and three percent, respectively, from 2006 to 2008. Those numbers aren’t huge in-and-of themselves, but when you consider the five percent swing from U.S. Cellular to Turner Field and the 0.57 K/9 increase from switching leagues, Vazquez’s adjusted numbers are monstrous. No matter how much bad luck he faces in terms of HR/FB (which will greatly improve moving away from Chicago), BABIP, or LOB%, I can’t see Vazquez’s ERA being held above 4.00 as it has four out of the last five years. In fact, his QERA hasn’t been higher than 3.35 over the past three years, and there’s a good chance his actual ERA ends up there in 2009. Plus, with the strikeout adjustments, he could strike out over 230 batters if he reaches his usual innings total. Huge fantasy value to be had here. Caveats As of right now, the quality of opposition indexes are based on overall MLB average. I’ve yet to break it down by league (AL/NL) or division, though I may do that sometime in the future. There’s a few logistical hurdles I need to jump over first, and I’m unsure if division quality is too unstable from year-to-year for use as a predictive stat. CAPS Since we’re working with so many adjustments now, and since not all of them will apply to every player, I’m simply going to call our new stat Context Adjusted Pitching Statistics (CAPS). When you see me refer to, say, Context Adjusted K/9 (or CAPS K/9), it means that I’ve included all adjustments that apply. For some pitchers, that might only be the quality of opposition adjustment. For others, it might mean the opposition adjustment and a ballpark adjustment. For others, like Vazquez, it could mean all three. To present the results of these adjustments, I’m really leaning towards the format I used for Vazquez above. I think it allows me to include all of the important stats in an easy-to-understand format. I’d really appreciate feedback on this, though, so if you guys don’t agree, please let me know. Furthermore, while I’ve still got some cool stuff in the works behind-the-scenes, if you guys have any ideas for future adjustments you’d like me to tackle, don’t hesitate to contact me. If you have any other questions regarding these new adjustments, please feel free to contact me. References and resources The listing of batters each pitcher faced was generated from files from the incomparable Retrosheet. I also received help from fellow THT-writers Sal Baxamusa, David Gassko, and, in-particular, Colin Wyers. Colin’s willingness to help was incredible for a guy new on the THT scene, so many, many thank you’s Colin.