Minor league run environments

Minor league run environments vary substantially from league to league.  As a result, any time we evaluate a minor leaguer’s hitting or pitching stats, we need to consider the context of those performances.  Alex Pedicini had a brief but nice series at Hardball Times breaking down these league differences, but I wanted to take a deeper look at run environments, including an investigation of how they would model using Base Runs.

More on Base Runs later.  Let’s start with a graph looking at how the run environments of the minor leagues (and major leagues) vary:

image

The first thing to take from that graph is how much the run environments of these leagues vary.  The NL and AL are fairly intermediate, while the minor leagues vary by about a half-run per game in either direction.  The Florida State League (high Single-A) is a notorious pitchers’ league, but I was surprised to see the International League (Triple-A) virtually tied with the Gulf Coast league (low Rookie) for second-lowest runs environment.  Maybe I need to give the Reds’ Triple-A Louisville players a bit more credit for their production (and be more cautious about their Triple-A pitchers!).  At the other end of the spectrum are the high-scoring leagues: the California (high Single-A), the Arizona Summer League (Rookie), and the Pioneer League (Rookie) all averaged more than five runs per game from 2007-2009.

Here is a sampling of offensive statistics from each of these leagues so that we can see where the differences come from (again, using 2007-2009 data).  The table is sortable if you click in the header.

 

League Level R/G BsR/G AVG OBP SLG HR% niBB% E% SBA/OPP DER
American MLB 4.8 4.7 0.268 0.334 0.424 2.7% 8.1% 1.5% 7.1% 0.698
National MLB 4.6 4.6 0.262 0.328 0.415 2.6% 8.2% 1.6% 6.8% 0.701
Pactific Coast AAA 5.1 5.0 0.276 0.342 0.433 2.5% 8.5% 2.1% 7.9% 0.684
International AAA 4.4 4.4 0.262 0.327 0.398 2.1% 8.1% 2.2% 9.5% 0.693
Mexican AAA 5.2 5.2 0.295 0.361 0.426 2.0% 8.4% 2.3% 7.2% 0.675
Southern AA 4.5 4.5 0.260 0.332 0.391 1.9% 9.0% 2.4% 9.6% 0.693
Eastern AA 4.5 4.5 0.261 0.332 0.395 2.0% 8.7% 2.4% 8.0% 0.695
Texas AA 4.8 4.7 0.267 0.337 0.403 2.1% 8.7% 2.4% 9.2% 0.692
Carolina A+ 4.6 4.6 0.259 0.330 0.390 1.8% 8.6% 2.8% 12.2% 0.694
Florida State A+ 4.2 4.2 0.256 0.324 0.374 1.6% 8.2% 2.8% 10.3% 0.695
California A+ 5.3 5.1 0.271 0.339 0.418 2.2% 8.4% 2.9% 10.3% 0.677
Midwest A 4.4 4.3 0.254 0.323 0.372 1.6% 8.2% 3.2% 11.6% 0.691
South Atlantic A 4.7 4.6 0.258 0.326 0.383 1.7% 8.1% 3.2% 11.7% 0.688
Northwest ss-A 5.0 5.0 0.259 0.340 0.379 1.5% 9.5% 3.4% 11.4% 0.680
NY-PA ss-A 4.4 4.3 0.250 0.323 0.360 1.3% 8.5% 3.4% 11.2% 0.689
Pioneer Rook 5.5 5.5 0.274 0.344 0.417 2.0% 8.5% 3.8% 11.2% 0.665
Gulf Coast Rook 4.4 4.3 0.248 0.324 0.350 1.1% 8.7% 3.9% 12.7% 0.693
Appalachian Rook 4.9 4.8 0.259 0.329 0.383 1.7% 8.1% 4.1% 11.5% 0.679
Arizona Smr Rook 5.5 5.3 0.264 0.342 0.377 1.1% 9.2% 4.6% 14.5% 0.664
Venezuelan Smr F-Rook 4.9 5.0 0.259 0.341 0.355 1.1% 8.9% 4.8% 12.5% 0.695
Dominican Smr F-Rook 4.8 4.8 0.239 0.335 0.324 0.7% 10.6% 5.2% 16.0% 0.700

 

These are substantial differences.  For example, both the Florida State League and the California League are high Single-A leagues, and thus (probably) have roughly equivalent talent levels.  What these data are saying, therefore, is that you could take an average hitter from the Florida State league (hitting .256/.324/.374) and move him to the California League, where he’d “improve” to .271/.339/.418.  The reason?  Nothing to do with the player himself.  Rather, it’s probably some combination of environmental factors (humidity, altitude) and ballparks.  Nevertheless, it means that you have to be very careful about how you interpret a player’s statistics coming from these lines.

It’s also the case that hitter leagues don’t “achieve” those run environments in the same way. The Pacific Coast League gets its five-plus runs per game thanks to the highest minor league home run rate, second-highest OBP, but relatively low error rates (again, you can click in the header to sort the table and see this more easily). The Arizona Summer League, in contrast, has pretty high AVG and OBP, very high error rates (better only than the foreign rookie leagues), but relatively weak power totals (including one of the lowest HR/PA rates in the minors at 1.1 percent). The Dominican Summer League is one of the more interesting: the worst AVG, SLG, and HR percentage in baseball, but also the highest walk rate, error rate, and stolen base attempt rates you will find: Small ball is alive and well in the Dominican Republic!

Speaking of error rates, another interesting finding from this table can be seen if you click on the Error (E%) header above.  This is showing the percentage of plate appearances that involve an error, and you can see that there is a virtually perfect relationship between error rate and quality of play. This has been observed before (see also this related piece by Harry), but I was surprised how strong the relationship is. Better-quality leagues have fewer errors.  You see similar, though not as strong, relationships between league quality and stolen bases attempted per opportunity (SBA/Opp; more attempts at younger levels, almost without regard to run environment), home run rate (HR%; better leagues have more home runs, probably a reflection of hitter quality), and (perhaps) unintentional walk rate (niBB%; more walks at lower levels, probably due to young pitchers with lousy command).

 

The final thing I’d like you to take away from this is that “my” Base Runs model is doing a nice job of estimating the runs produced within each league.  What is Base Runs?  It’s the best available run estimator today.  If you’re not familiar with it, Patriot wrote probably the best introduction, while Tango’s series really demonstrates its power over other run estimators.  Briefly, though, Base Runs is a simple approach to modeling run scoring in baseball, and written in English looks like:

 

    [Baserunners] * [Baserunner Scoring Rate] + [Home Runs] = Runs

 

The major innovation of Base Runs over earlier estimators like Runs Created was the special treatment given to the home run.  This helps it handle a much wider range of offensive environments than any other run estimator

What I did is start with a base runs equation (from this work by Tango) that was tweaked (equation shown in resources below) to accurately predict linear weights values for MLB 2007-2008 (kindly provided by Colin Wyers via e-mail).  With that equation in hand, I then ran it on 2007-2009 totals for both of the major leagues as well as all of their affiliates.  I am using the exact same equation in each league, and yet the square root of the mean square error is just 0.06 runs per game (that figure is after you adjust for the average shortfall of about 0.05 runs per game: I’m including as many events as I can find (passed balls, wild pitches, errors, etc), but there are ways to score runs that I don’t have in my minor league data set, and perhaps as a result I’m missing about eight runs per 162 games from each league).

 

One of the nice things about having a good-performing Base Runs equation like this is that one can use it to produce league-specific linear weights.  With the aid of Patriot’s spreadsheet that automates this process, here are the linear weights (in absolute runs) in a Google Spreadsheet.  How well they work depends on how well the Base Runs equation is working: we’re getting good matches in overall runs estimates, but the truth is that I don’t really know if what is happening at the individual event levels are correct (except that they are probably very close for the major leagues).  However, in the absence of pbp data (maybe someday I’ll try to get there with Gameday), this is probably the next best approach to linear weight generation.  At least for players who don’t have particularly unique skill sets, using these linear weights should give you a good estimate of their absolute runs production.

References & Resources

All minor league data were pulled from baseball-reference.com.

The Base Runs equation I used was modified from one by Tango: (BsR = A* B/(B+C) + D):

 

EVENT A B C D
1B 1 0.726 0 0
2B 1 2.050 0 0
3B 1 3.280 0 0
HR 0 1.850 0 1
SB 0 0.790 0 0
CS 0 -0.825 1 0
NIBB 1 0.100 0 0
SO 0 -0.057 1 0
GDP 0 -0.825 1 0
HBP 1 0.220 0 0
SH 0.08 0.727 0.92 0
IBB 1 -0.450 0 0
nonK out 0 -0.004 1 0
BK 0 1.060 0 0
WP 0 1.125 0 0
ROE 1 0.950 0 0
OthrE 0 1.138 0 0
PB 0 1.150 0 0

 

The major differences are that I separated the error terms (“Error” in Tango’s work follows Retrosheet and refers only to Reached On Errors as far as I can tell), added GDPs, and tweaked the “b” coefficients. The largest tweak was on non-intentional walks, which saw its coefficient nearly doubled. I don’t claim that my approach was particularly scientific here, but again, the equation now does a good job of matching 2007-2008 Colin Wyers’ empirically-derived linear weights when run on 2007-2008 MLB data…and, in general, seems to get you to an average of 0.11 runs per game across all leagues (0.5 of which is a systematic underestimate of runs—I could force the b-term to match actual production, but my preference was to avoid doing this in this case).


4 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
obsessivegiantscompulsive
14 years ago

Great article!

FYI, even within a league, there can be huge variations.  For example, San Jose is in the California League, a notorious hitter’s league (as you note), but a study by MLB.com’s Jonathan Mayo a few years back showed that San Jose’s strikeout rate was the highest in all the minors.  An interview with a player noted that the background there makes it hard for hitters to see the pitches and thus swing and miss a lot more often. 

Baseball Prospectus’ park factor for SJ probably would drop the run production from the extreme to somewhere around 4.8 runs, near the middle and the American League.

jinaz
14 years ago

Thanks! 

I’m sure that you’re right about individual park effects.  Coors’ in MLB, of course, is a fairly extreme hitters park, while PetCo is an extreme pitchers park.  Both are not only in the NL, but are both in the NL West!  I think I did see some good minor league park factors published at BBTF a while back, probably by Szymborski if memory serves.

That said, my primary goal in this article was just to get a handle on variation across the individual leagues.  That’s a large enough effect in and of itself.

Cheers,
Justin

jinaz
14 years ago

Sure, it’s sort of an add-on.  You can read the article without it and get the main point about differences in offensive environments.  I’m not sure that it really adds confusion (or, at least, I try to minimize confusion) as I largely ignore it until the end of the article.  But sure, it adds a bit of complexity.

In truth, the entire reason I put this together was to see for myself how Base Runs would perform in the different run environments of the different leagues (quite well), and by extension to see how much the linear weights differed (not much).  It was useful to me, and probably to some others, who are interested in a base runs equation that works well.  It also supplies us with linear weights that we can easily apply to players within the various leagues to get good wRC-like estimates.  One could use these to cross-check against the wRC reported at FanGraphs, though I have yet to do that.
-j

Gen3blue
14 years ago

It looks to me like the quantity BsR/G in this case only adds confusion and complexity to the useful point here. I appreciate the revealing of the basic offensive nature of each league,but it is useful in itself, before we go to a more “sophisticated stat.”