Why Oliver Loves Yu
WAR ERA WHIP W L IP H HR BB SO HR/9 BB/9 SO/9 6.2 2.57 0.97 16 4 185 138 8 41 198 0.4 2.0 9.6
It looks like Yu broke Oliver. That’s Yu Darvish; Oliver is the engine of The Hardball Times Forecasts. It’s not the first time it’s happened, but when a player so dominates his non-major league competition that that his derived major league true talent exceeds generally accepted norms, it offers an opportunity to examine the system and make some changes for the better.
Darvish’s performance against batters in Nippon Professional Baseball, the world’s second best professional league, is indeed mind-boggling: consistently low hits, home runs and walks, with more than a strikeout an inning.
Patrick Newman of npbtracker shows pitch type, velocity and usage rate for pitchers in that league. This past year, Darvish’s fastball sat at 94 to 95 mph, with a slider in the low 80s, and a high 80s change-up. He also mixes in a low 90s cut fastball, forkball, shuuto and slow curve.
Newman also pointed me to Pro Yakyu Nuru Data Okijyo from which I was able to get Darvish’s ground ball rates.
Year Age ERA W L IP H HR BB SO GB% 2007 20 1.82 15 5 208 123 9 49 210 59.9 2008 21 1.88 16 4 201 136 11 44 208 57.8 2009 22 1.73 15 5 182 118 9 45 167 59.2 2010 23 1.78 12 8 202 158 5 47 222 57.4 2011 24 1.44 18 6 232 156 5 36 276 60.0
Still the question remains, how accurately can that performance be projected into a major league equivalent? The standard process is to find as many players as possible who have played in both leagues, comparing their performance, as a group, in both situations.
If, for example, starting pitchers might translate differently from relievers, players can be divided into different groups that better fit their role and profile, but at the risk of having the comparisons based on smaller, and thus less reliable, sample sizes.
Oliver’s Japanese translations are based on the performances of 260 pitchers who have performed on both sides of the Pacific from 1998 to 2011. Of these, 185 have been North American players who have gone to Japan, with 75 Japanese pitchers coming here, but only 28 of those 75 appearing in the major leagues. Since 1998, only five pitchers who were starters in Japan were given starting roles in the majors.
Oliver is rule based. Given a supply of play by play and seasonal data, I write code that describes how different parts of the data relate to one another. If I believe Darvish’s translations are too strong, adjusting the code will also affect every other Japanese pitcher. Changes must be made in a way that balances the performances of all in the group. There did appear to be differences in whether the pitcher started his career in North America or Japan, and whether he was a starter or a reliever. After adjustments were made, Darvish’s projection hardly budged.
With a projected 2.57 ERA, give or take a few tenths, Oliver is putting Darvish ahead of every current major league starting pitcher. The Texas Rangers were willing to commit $111 million dollars over the next six years to procure his services, but can he realistically be expected to out-perform this projected list of 2012’s top 15 starting pitchers?
ERA Name 2.75 Clayton Kershaw 2.79 Stephen Strasburg 2.88 Justin Verlander 2.97 Roy Halladay 3.05 Cliff Lee 3.05 Josh Johnson 3.15 Matt Cain 3.16 Jered Weaver 3.17 Felix Hernandez 3.25 Ian Kennedy 3.25 Mat Latos 3.25 Adam Wainwright 3.26 Cole Hamels 3.28 Tim Lincecum 3.33 Michael Pineda
Let’s look at how Oliver’s past projections for Japanese starting pitchers compare to their actual performances. I will note that the major league performance is a weighted mean of the player’s first three seasons in the majors, with the first season weighted at 1.0, the second 0.7 and the third 0.5. This is the reverse ordering of how past seasons are used to generate the projections. No minor league data are included. Also, the projected ERA is based on the expected wOBA allowed, while the major league ERA is the actual, and not park adjusted.
Kei Igawa Size ERA BH% HR% BB% SO% Projection 1788 3.89 0.297 0.046 0.072 0.218 MLB 1st 3 years 330 6.54 0.317 0.064 0.109 0.161
Igawa was signed by the Yankees in 2007 and was expected to provide an above-average numbers of strikeouts, although accompanied by a few extra home runs. Maybe the pressure of working for George Steinbrenner was too much; Igawa allowed far too many walks and long balls and lasted only 12 starts that year and one the next before returning to Japan.
Kaz Ishii Size ERA BH% HR% BB% SO% Projection 1547 3.96 0.284 0.048 0.119 0.246 MLB 1st 3 years 1525 4.25 0.279 0.042 0.144 0.191
Ishii signed with the Dodgers in 2002, spending three years in their rotation. After one more with the Mets, he also returned to Japan. Wild in Japan, he walked even more here and also underperformed his projected strikeout rate, although the ERA projection was fairly close.
Kenshin Kawakami Size ERA BH% HR% BB% SO% Projection 1381 3.50 0.284 0.044 0.046 0.205 MLB 1st 3 years 943 4.22 0.295 0.032 0.071 0.157
Kawakami joined the Braves in 2009 and had a respectable 3.86 ERA, but suffered through a 1-10, 5.15 year in 2010, then spent the entire 2011 season in the minors. He walked more and struck out fewer than projected (I’m beginning to notice a pattern).
Hiroki Kuroda Size ERA BH% HR% BB% SO% Projection 1685 3.54 0.278 0.037 0.048 0.167 MLB 1st 3 Years 1520 3.65 0.283 0.025 0.045 0.170
Kuroda delivered four quality season from 2008 to 2011 for the Dodgers, almost exactly matching his projection, and just signed a 1 year, $10 million deal with the Yankees.
Daisuke Matsuzaka Size ERA BH% HR% BB% SO% Projection 1630 2.77 0.273 0.030 0.061 0.245 MLB 1st 3 years 1517 4.01 0.295 0.039 0.105 0.221
The Japanese import everyone loves to hate, Matsuzaka did have two solid seasons, in 2007 and 2008, for the Red Sox, but injuries have kept him sidelined and/or ineffective for the past three years. Showing fine control his last two years in Japan, he’s issued an above-average numbers of walks in the majors.
Hideki Irabu Size ERA BH% HR% BB% SO% Projection 1658 3.19 0.281 0.028 0.100 0.258 MLB 1st 3 years 1125 4.94 0.283 0.058 0.085 0.187 Hideo Nomo Size ERA BH% HR% BB% SO% Projection 1707 4.40 0.291 0.040 0.157 0.243 MLB 1st 3 years 1884 3.16 0.269 0.035 0.094 0.275 Colby Lewis Size ERA BH% HR% BB% SO% Projection 1479 3.26 0.302 0.034 0.039 0.230 MLB 1st 3 years 1431 4.03 0.273 0.046 0.072 0.220
I looked at three more pitchers – Hideo Nomo and Hideki Irabu from the 1990s, and Colby Lewis, who after never experiencing any success in the majors spent 2008 and 2009 in Japan before returning the past two years with the Rangers.
Irabu issued fewer walks but also fewer strikeouts than expected, and couldn’t avoid the long ball. Nomo was very wild in Japan but pitched much better than expected in the major leagues. Lewis’ strikeout rates were as expected, but his walks jumped up.
Hisanori Takahashi Size ERA BH% HR% BB% SO% Projection 1355 4.27 0.292 0.047 0.066 0.175 MLB 1st 3 Years 713 3.60 0.294 0.037 0.068 0.215 Ken Takahashi Size ERA BH% HR% BB% SO% Projection 940 5.28 0.293 0.052 0.088 0.133 MLB 1st 3 Years 116 2.96 0.280 0.026 0.113 0.200 Koji Uehara Size ERA BH% HR% BB% SO% Projection 872 3.65 0.290 0.050 0.037 0.201 MLB 1st 3 years 522 3.34 0.282 0.043 0.036 0.248 Keiichi Yabu Size ERA BH% HR% BB% SO% Projection 1030 4.30 0.284 0.041 0.076 0.149 MLB 1st 3 years 262 4.50 0.330 0.033 0.089 0.170
These last four were all primarily starting pitchers in Japan, but did most or all of their major league pitching out of the bullpen. All showed better-than-expected strikeout rates, with Uehara almost doubling his rate after the Orioles removed him from the rotation.
It is known that on average pitchers perform better out of the bullpen. Tango calls it his rule of 15: Home runs and walks down 15 percent, strikeouts up 15 percent. I believe I can improve the Japanese translation factors by adjusting the stats as starters and relievers to the same baseline before compiling sets of matched pairs. Where I have play-by-play data from Gameday I am able to tabulate how each pitcher has performed as a starter and as a reliever, which then needs to be regressed to the standard splits. However, the available seasonal level stats from Japan do not offer this breakdown. The number of innings pitched as a starter and reliever can be estimated, but the Japanese leagues have not published games started for the past three seasons.
The records for the eight starting pitchers above suggest that the translation factors currently being used by Oliver are too generous: As a group, the observed major league performances of the eight compared to their projections were 0.99 for base hits (BABIP), 1.11 for home runs, 1.24 for walks and 0.91 for strikeouts. But, how much more should we trust the record of eight starting pitchers in the majors compared to the 75 Japanese pitchers who have pitched in the minors and majors over the past 13 seasons? How much different should we expect them to be from the 185 pitchers who have left here for Japan?
Yu Darvish Size ERA BH% HR% BB% SO% Projection 1799 2.57 0.280 0.019 0.058 0.272 Adjusted 0.278 0.021 0.071 0.248
The first line is Darvish’s current Oliver projection, while the second shows the rate stats adjusted for those eight starters (still very good).
These are Darvish’s top comparables using his current projection—a higher ERA than 2.57, but the top five still puts him right at the top with Kershaw and Strasburg, while a larger sample of comps still rates high enough to rank him fifth of sixth in the major leagues.
Rank Name Season ERA BH% HR% BB% SO% 1 Martinez, Pedro 2004 2.55 0.288 0.020 0.056 0.285 2 Verlander, Justin 2012 2.87 0.281 0.033 0.064 0.263 3 Johnson, Randy 2005 2.96 0.290 0.034 0.054 0.272 4 Santana, Johan 2007 2.78 0.274 0.039 0.056 0.269 5 Kershaw, Clayton 2012 2.75 0.284 0.024 0.078 0.274 6 Prior, Mark 2003 3.19 0.302 0.032 0.073 0.278 7 Schmidt, Jason 2004 2.97 0.283 0.028 0.074 0.247 8 Peavy, Jake 2008 3.47 0.304 0.034 0.063 0.254 9 Greinke, Zack 2010 3.20 0.307 0.029 0.058 0.253 10 Lincecum, Tim 2012 3.27 0.300 0.030 0.084 0.268 11 Schilling, Curt 2005 3.02 0.292 0.039 0.042 0.248 12 Matsuzaka, Daisuke 2008 3.29 0.283 0.038 0.072 0.243 13 Hamels, Cole 2008 3.52 0.290 0.043 0.070 0.246 14 Bedard, Erik 2008 3.39 0.303 0.031 0.079 0.250 Top 5 2.78 0.283 0.030 0.062 0.273 Top 10 3.00 0.291 0.030 0.066 0.266 All 3.09 0.292 0.033 0.066 0.261
Now using the adjusted projection. The composite ERA of the top five comps again puts Darvish fifth or sixth, while the larger list drops him closer to 15th.
Rank Name Season ERA BH% HR% BB% SO% 1 Schmidt, Jason 2004 2.97 0.283 0.028 0.074 0.247 2 Martinez, Pedro 2006 3.01 0.281 0.032 0.059 0.243 3 Matsuzaka, Daisuke 2008 3.29 0.283 0.038 0.072 0.243 4 Verlander, Justin 2012 2.87 0.281 0.033 0.064 0.263 5 Latos, Mat 2012 3.25 0.290 0.032 0.069 0.234 6 Hanson, Tommy 2012 3.43 0.285 0.036 0.072 0.233 7 Peavy, Jake 2010 3.40 0.298 0.034 0.076 0.243 8 Lester, Jon 2011 3.34 0.298 0.029 0.083 0.241 9 Hamels, Cole 2008 3.52 0.290 0.043 0.070 0.246 10 Kennedy, Ian 2012 3.24 0.277 0.036 0.071 0.226 11 Jimenez, Ubaldo 2012 3.49 0.295 0.024 0.091 0.240 12 Scherzer, Max 2011 3.59 0.296 0.038 0.084 0.249 13 Kershaw, Clayton 2012 2.75 0.284 0.024 0.078 0.274 14 Bedard, Erik 2009 3.57 0.296 0.036 0.083 0.237 15 Beckett, Josh 2005 3.50 0.303 0.031 0.077 0.238 16 Santana, Johan 2009 3.37 0.286 0.043 0.059 0.235 Top 5 3.08 0.284 0.032 0.068 0.246 Top 10 3.23 0.287 0.034 0.071 0.242 All 3.29 0.289 0.033 0.074 0.243
For the final set of comparable projections, I used a defense independent approach, using only groundball, walk and strikeout rate. Assuming that major league baseball has a slightly lower rate of ground balls than the Nippon league, I found Darvish’s top comps using a ground ball rate of 0.55, a walk rate of 0.071, and a strikeout rate of 0.248. There’s no difference between the different sized groups, each with a composite ERA out of major league baseball’s top 15, but much of the ERA difference between this and the previous sets of comps is in the home run rate, almost 50 percent higher here than in Oliver’s projection.
Rank Name Season ERA GB% BH% HR% BB% SO% 1 Liriano, Francisco 2007 3.58 0.53 0.304 0.037 0.087 0.254 2 Hernandez, Felix 2011 3.16 0.54 0.287 0.026 0.071 0.219 3 Burnett, A.J. 2008 3.81 0.55 0.295 0.037 0.082 0.217 4 Jimenez, Ubaldo 2011 3.18 0.52 0.284 0.020 0.097 0.240 5 Lester, Jon 2011 3.34 0.51 0.298 0.029 0.083 0.241 6 Wainwright, Adam 2011 3.12 0.51 0.295 0.028 0.061 0.226 7 Garcia, Jaime 2012 3.64 0.54 0.310 0.027 0.069 0.201 8 Carpenter, Chris 2006 3.27 0.54 0.292 0.031 0.052 0.205 9 Zambrano, Carlos 2006 3.23 0.51 0.276 0.023 0.088 0.215 10 Chacin, Jhoulys 2012 3.61 0.52 0.271 0.033 0.105 0.213 11 Halladay, Roy 2012 2.96 0.52 0.305 0.024 0.034 0.216 12 Wilson, C.J. 2012 3.47 0.51 0.290 0.024 0.089 0.212 Top 5 3.41 0.53 0.294 0.030 0.084 0.234 Top 10 3.39 0.53 0.291 0.029 0.079 0.223 All 3.36 0.53 0.292 0.028 0.076 0.221
Yu Darvish is clearly a very talented pitcher, enough that the Texas Rangers were willing to put $51 million down and $60 million over the next six years to have him in their starting rotation. Just how well his future major league performances can be projected is a work of art, with different available methods where even small changes in estimated base hits allowed can vary the ERA estimate by a few tenths. Oliver has had a good record so far, such as with Stephen Strasburg and Ian Kennedy. However, players have some amount of natural variance each year as well as changes in their true talent.
Examining several sets of comparable pitchers shows an expected ERA for Darvish anywhere from 2.78 to 3.40, which is from excellent down to merely very good, but no recent major league pitchers have the combination of Darvish’s expected home runs, walks and strikeouts. Looking at those comparables and Darvish’s pitch metrics give me a personal opinion: I would compare him to Felix Hernandez with more strikeouts or Ubaldo Jimenez with fewer walks.
Meanwhile, as these customized estimates all gave a higher ERA projection than Oliver, I’ll retreat to my office, where first things on the drawing board are incorporating ground ball rates to give regression means for base hit and home run rates, and separately consider pitching as a starter and reliever.
Great article, Brian! However, Kei did not return to Japan. He still pitched in the Yankees org and is now a FA looking for work. He reportedly does not want to go back to the NPB and wants to play MLB.
Tom Tango’s rule for converting relief performance to starter performance is actually a “rule of 17,” not a “rule of 15.”
http://www.insidethebook.com/ee/index.php/site/comments/starter_v_relief_1953_2008/
Yeah, it’s the rule of 17 and the walk rate doesn’t change – only BABIP, K/PA, and HR/PA.
Really interesting article. Looking forward to the Oliver updates.
Sorry, should have googled for Tango’s rule. 15 was in my head because that’s what my own research found, but I didn’t find as much effect on BABIP, which was -4% for relievers, HR -15%, BB +2%, SO +14%
The problem with Darvish is not his physical skills, but between his ears. One has to live in Japan to see just how entitled the players feel about themselves. Darvish is nowhere near Matsuzaka’s arrogance level, and he seems to be much smarter, but that really isn’t saying that much. He also gets away with a lot of mistake pitches that MLB players should tee off on. If Darvish really takes coaching well, he could be great, sure. The Rangers are a really good fit for him, especially with Nolan Ryan around.
If Darvish’s rate of getting away with mistake pitches is the same as other pitchers in Japan, then it’s part of the translation factors already. The problem in projecting is when a pitcher does something like that consistently differently than others.
It would be very impressive if Darvish can produce a 0.4 HR/9 in Arlington. His season ought to be fascinating to follow.
I love seeing that my NL-only fantasy team contains four of the top 15 projected MLB ERA leaders (Strasburg, J. Johnson, Latos and Wainwright), all for a total of less than $20. I sure hope those numbers come to pass.
What was his HR% in Japan?
Brian,
Thanks for the great article and groundball info which has been impossible to find.
Here is an issue I see with the numbers, though. Your BB% and SO% for the Darvish projection in the middle of the page seem to be based on a different number of total batters faced when compared against the projection at the top of the page.
In other words you have him listed at 198 K’s and a 9.6 K/9, which would suggest around 729 batters faced to come up with a SO% of .272. However, the 41 BBs and 2.0 BB/9 rate suggest that it is based on a total batter count of about 708 batters to get the BB% of .058. Is this just an odd quirk of Oliver in the way that the percentages are calculated or are they supposed to be based on the same number of total batters and therefore incorrect?
The projected BB% and SO% will differ from the raw original data, as they’ve been adjusted for park and league. I then compared the translated data to actual MLB performance to check the accuracy of the translation.
Darvish’s unadjusted rate stats
Year HR/BC BB/PA SO/PA HR/9 BB/9 SO/9
2007 .017 .061 .266 0.4 2.1 9.1
2008 .022 .058 .272 0.5 2.0 9.3
2009 .019 .064 .238 0.4 2.2 8.3
2010 .010 .058 .276 0.2 2.1 9.9
2011 .009 .041 .312 0.2 1.4 10.7
</pre>
NPB adopted a new ball standard in 2011, which droped the HR% to 64% of previous, so the .009 HR/BC in 2011 is equivalent to .014 in the other seasons.
When I do all the projections, I have a set of 260 pitchers who have pitched in Japan and the US from 1998-2011. The factors are calculated so that the average error is zero, where all the plus errors cancel out the minus errors. Regression helps reduce the total error (does not care whether high or low) by bringing all the projections closer to the center, thus reducing the outliers.