# Circle the Wagons: Running the Bases Part I

*“Who says there’s an unemployment problem in this country? Just take the five percent unemployed and give them a baseball stat to follow.”*

–Outfielder Andy Van Slyke

Way back in 1984 Bill James wrote in *The Baseball Abstract*:

“Baserunning is perfectly measurable; it can be easily defined and, given properly maintained scoresheets, easily researched. Our lack of knowledge on the subject is attributable entirely to record-keeping decisions that were made a little over a century ago and have never been intelligently or systematically reviewed.”

In other words, our lack of knowledge about baserunning is a matter of historical contingency. In 1845 the first box score appeared in the *New York Morning News* and contained only runs and outs or “hands lost” in the nomenclature of the day. By the end of the 1850s box scores included nine additional columns per player including foul outs and put outs which contained times catching a ball on one bounce (at that time counted as outs).

A few years later a cricket enthusiast named Henry Chadwick, as documented in Alan Schwarz’s wonderful book *The Numbers Game*, invented the nine-by-nine grid and the system of letters and numbers that became the standard scoresheet and scoring system used to record the events of games. And although Chadwick as the acknowledged father of baseball statistics codified the definition of base hit, total bases, unearned runs, batting averages and more, neither his scoring system nor his plethora of statistics captured how baserunners advanced around the bases outside of counting when a runner reached base and if that runner scored.

An attempt to do so was made, however, when stolen bases were tracked beginning in 1886 and included both traditional stolen bases and “extra” bases gained on hits. For example, a runner going from first to third on a single would be credited with a stolen base. Under these rules Hugh Nicol was credited with 138 stolen bases in 1887. The modern stolen base definition was adopted in 1898 (although the AL didn’t count caught stealing until after 1919 and the NL until after 1950) and took with it any attempt at quantifying baserunning. Things have pretty much remained the same ever since.

These and other problems with the traditional scoring system led James and John Dewan (who later went on to co-found STATS, Inc) to create Project Scoresheet in the late 1980s. Under Project Scoresheet, volunteers across the country used a new scoresheet format and scoring system that better captured the information that tends to get lost in the cracks of Chadwick’s method. The essence of the system is that each of Chadwick’s cells representing plate appearances is subdivided into three sections – pre events, the primary event, and post events. By capturing this level of detail, the tracking of play-by-play data is possible. The codes that were developed by Project Scoresheet now show up in the event files published by Retrosheet and the codes used by DataCasters like myself for MLB.com’s Gameday system.

All that to say that with the appropriate scoring system in place and the data available, last winter on my blog, I took a stab at quantifying baserunning using the play-by-play data for 2003 and 2004 . Shortly after, James Glick at Baseball Prospectus did the same and published his excellent analysis in the *2005 Baseball Prospectus* under the title “Station to Station: The Expensive Art of Baserunning”. However, at the time I wrote on the subject, I could only obtain play-by-play data from 2003 and 2004. Now Retrosheet has made available the event files for 2000 to 2004, and so I thought I’d update my own framework and report the results in this week’s and next week’s articles.

### The Methodology

The methodology I used when creating my baserunning framework is really quite simple. First, I examined the following scenarios:

- Runner on first, second not occupied, and the batter singles
- Runner on second, third not occupied, and the batter singles

Although these are not the only possible scenarios that might be used to measure baserunning, I chose them since I assumed they were fairly common and could provide a baseline to measure the magnitude of the difference between good and bad baserunners.

Next, I calculated the number of bases that runners advanced in each scenario, broken down by the number of outs and which fielder fielded the ball. Third, I took these aggregates and created a matrix of expected outcomes. For example, the expected outcomes in terms of percentages encompassing the five year period from 2000 to 2004 are shown in the three tables below.

**Runner on first, second not occupied, and the batter singles**

Outs/Where Opp To2nd To3rd Score OA 0 Other 1611 0.779 0.196 0.012 0.014 0 Left 2696 0.854 0.136 0.004 0.006 0 Center 2393 0.731 0.255 0.005 0.008 0 Right 3114 0.570 0.413 0.007 0.011 1 Other 1709 0.737 0.239 0.008 0.016 1 Left 3453 0.851 0.135 0.006 0.009 1 Center 3018 0.705 0.281 0.004 0.010 1 Right 4060 0.562 0.419 0.007 0.012 2 Other 1945 0.714 0.258 0.014 0.014 2 Left 2886 0.839 0.143 0.009 0.009 2 Center 3219 0.664 0.312 0.013 0.011 2 Right 3278 0.522 0.459 0.013 0.006

**Runner on second, third not occupied, and the batter singles**

Outs/Where Opp To3rd Score OA 0 Other 1078 0.718 0.198 0.011 0 Left 1189 0.583 0.405 0.009 0 Center 1267 0.361 0.617 0.019 0 Right 1219 0.555 0.433 0.011 1 Other 1339 0.618 0.269 0.011 1 Left 2199 0.449 0.515 0.031 1 Center 2191 0.239 0.725 0.033 1 Right 2109 0.388 0.578 0.032 2 Other 1791 0.621 0.336 0.038 2 Left 2315 0.098 0.832 0.069 2 Center 2690 0.030 0.932 0.037 2 Right 2220 0.076 0.864 0.062

**Runner on first, second not occupied, and the batter doubles**

Outs/Where Opp To3rd Score OA 0 Other 509 0.566 0.432 0.002 0 Left 1028 0.699 0.281 0.019 0 Center 385 0.460 0.522 0.018 0 Right 728 0.709 0.283 0.008 1 Other 660 0.552 0.447 0.002 1 Left 1333 0.683 0.290 0.028 1 Center 533 0.336 0.585 0.081 1 Right 976 0.686 0.288 0.026 2 Other 641 0.359 0.640 0.002 2 Left 1264 0.498 0.441 0.063 2 Center 499 0.164 0.788 0.048 2 Right 900 0.449 0.492 0.062

The “Other” category includes balls fielded by all other positions.

From these tables it is immediately obvious that both the number of outs and the location where the ball is hit play a large role in determining the advancement of the runner—hence the need to take these into account. For example, with two outs and a man on first when the batter doubles, the runner scores 80% of the time when the ball is fielded by the center fielder, but just 43% of the time when fielded by the left fielder. By using the position that fielded the ball, the differences in left-handed or right-handed hitters hitting behind certain baserunners is also taken into account.

Finally, for each individual player I compared their performance in each scenario to the matrix above (with all positions enumerated and for each season) and calculated not only how many bases they advanced but how that differed from the expected number of bases gained. For example, the expected number of expected bases gained with two outs and the runner on second when the batter doubles to left is:

2.19 = (.513 * 2) + (.428 * 3) + (.059 * -2)

Note that the runner is penalized one base in this situation for getting thrown out since I’m assuming they would have advanced two bases. So if Carlos Beltran scored in this situation he would be credited with .81 bases (3-2.19).

I call Incremental Bases (IB) the sum of the differences between the actual number of bases and the expected number of bases (EB). I christened Incremental Base Percentage (IBP) the ratio of IB to EB . An IBP of greater than 1.0 is good since it means that the runner advanced more bases than expected given the situations they found themselves in, while an IBP of less than 1.0 indicates a bit of a plodder.

### The Results

Enough of the preliminaries. First, the top 10 baserunners with 100 or more opportunities from 2000-2004 in total number of incremental bases. Here OA stands for “out advancing” and records the number of times the runner was thrown out in these scenarios.

Name Opp Bases EB IB OA IBP Juan Pierre 259 411 367 44 4 1.12 Luis Castillo 272 450 410 40 4 1.10 Mike Cameron 168 286 252 34 2 1.14 Cristian Guzman 209 349 315 34 3 1.11 Ray Durham 203 334 301 33 0 1.11 David Eckstein 216 352 319 33 3 1.10 Carlos Beltran 208 346 313 33 0 1.10 Johnny Damon 256 420 390 30 3 1.08 Edgar Renteria 210 338 310 28 4 1.09 Jay Payton 168 280 252 28 3 1.11

And the bottom 10…

Name Opp Bases EB IB OA IBP Mike Lieberthal 130 165 192 -27 4 0.86 Richie Sexson 151 203 233 -30 7 0.87 J.T. Snow 150 202 233 -31 5 0.87 Edgar Martinez 178 238 269 -31 3 0.89 Ben Molina 138 177 208 -31 4 0.85 Dmitri Young 139 185 217 -32 9 0.85 Rafael Palmeiro 207 273 307 -34 8 0.89 John Olerud 195 248 285 -37 7 0.87 Bill Mueller 191 256 293 -37 10 0.87 Carlos Delgado 237 324 362 -38 8 0.89

From looking at these lists it is pretty apparent that indeed the system is measuring something—at the very least, raw speed since the first list is dominated by speedsters and the bottom by plodders. In addition, the plodders were thrown out on the bases a significant number of times, which of course has the tendency to dramatically drive down the number of incremental bases since the runner is credited with negative bases in those cases.

As you can see the spread here is between about +40 to -40 bases over the period of five years. In other words, the best baserunners are worth about sixteen bases more per year than the worst over that span. Looking at an individual year, the spread is about +15 to -15 or a span of 30 bases as illustrated by the leaders and trailers for each season (with 25 or more opportunities).

Year Name Opp Bases EB IB OA IBP 2004 Rafael Furcal 59 101 87 14 0 1.17 2003 Brian Roberts 57 94 81 14 0 1.17 2002 Ray Durham 40 73 59 14 0 1.23 2001 David Eckstein 51 91 77 14 1 1.18 2000 Luis Castillo 57 105 86 19 0 1.22 Year Name Opp Bases EB IB OA IBP 2004 Bill Mueller 47 56 74 -18 3 .76 2003 Juan Encarnacion 41 46 62 -16 5 .74 2002 Frank Thomas 40 46 60 -14 4 .77 2001 Luis Gonzalez 51 63 77 -14 2 .82 2000 Joe Randa 50 64 79 -15 2 .81

But because incremental bases is akin to a counting statistic like RBIs, it is heavily influenced both by the number of times the runner was on base and how often the hitters lower in the order got hits. Therefore, perhaps a better way to rank the runners is using the rate statistic, IBP.

Name Opp Bases EB IB OA IBP Jack Wilson 117 203 179 24 1 1.14 Mike Cameron 168 286 252 34 2 1.14 Raul Mondesi 112 190 168 22 1 1.13 Chris Singleton 112 192 171 21 2 1.12 Juan Pierre 259 411 367 44 4 1.12 Vernon Wells 104 175 157 18 4 1.12 Jay Payton 168 280 252 28 3 1.11 Ray Durham 203 334 301 33 0 1.11 Torii Hunter 170 279 252 27 3 1.11 Cristian Guzman 209 349 315 34 3 1.11 Name Opp Bases EB IB OA IBP Edgar Martinez 178 238 269 -31 3 0.89 Bill Mueller 191 256 293 -37 10 0.87 John Olerud 195 248 285 -37 7 0.87 Richie Sexson 151 203 233 -30 7 0.87 J.T. Snow 150 202 233 -31 5 0.87 Fred McGriff 110 140 161 -21 2 0.87 Frank Thomas 121 158 182 -24 5 0.87 Mike Lieberthal 130 165 192 -27 4 0.86 Dmitri Young 139 185 217 -32 9 0.85 Ben Molina 138 177 208 -31 4 0.85

So are Jack Wilson and Mike Cameron really the best baserunners of the last five years and are Ben Molina and Dmitri Young really the worst? I don’t know for sure, but once again it seems like the framework produces reasonable results.

### Odds and Ends

In looking at the aggregate data a few interesting nuggets turned up:

- Bobby Abreu, who one would think would be a good baserunner, was thrown out 10 times during the five-year period, which drove his IB to -5 and his IBP to .99, pushing him to 117th out of 177 players with 100 or more opportunities. Bill Mueller and Juan Encarnacion were also thrown out 10 times
- The leader in getting thrown out in a season was Juan Encarnacion who was nabbed five times in 2003, twice when he was on first and the following batter doubled, and three times when he was second when the batter singled
- Larry Walker, not a particularly fast man, is ranked 17th in IBP at 1.10, validating his reputation as a good baserunner
- The most opportunities in a single season was 91 by Juan Pierre in 2003. Ichiro was a close second with 89 in 2004
- Speaking of Ichiro, it is strange that for a player with such a good baserunning reputation, he comes in with only an IBP of 1.04, good for 65th place out of 177 players with more than 100 opportunities. Still, that’s better than average, and his yearly IBPs were a consistent 1.08, 1.04, 1.04, and 1.01
- Carlos Beltran had the most opportunities, 208, without ever being thrown out. Ray Durham and Garrett Anderson both had 203 chances without being nailed, but while Beltran and Durham have IBPs over 1.0, Anderson is a conservative runner with an IBP of .96

### Next

But of course this isn’t the end of the story. In addition to simply counting the number of incremental bases and calculating IBP there are, as Glick discussed, team effects to consider including the effect of managers and third base coaches as well as park factors. And of course, there is also the issue of converting these incremental bases into the number of runs gained or lost. I’ll explore these and other issues next week.

**References & Resources***The Numbers Game: Baseball’s Lifelong Fascination with Statistics* – by Alan Schwarz

Hi, Mr. Dan Fox! RE: Circle the Wagons: Running the Bases Part I My name is Freedom, I am a professor in the Dept. Industrial Engineering of one of Korean national universities. . These days, I am working on a baseball simulation. For most of its input, I use players’ data of Korea Pro-Baseball League. Thanks a lot for your nice articles! Question: Table for runner on second, third not occupied, and the batter singles: the probabilities of each row do not sum up to 1; e.g., those of first row sum up to .927 and those of 5th row… Read more »