Evaluating Talent Distribution on Rosters

by John LaRue
December 4, 2017

Roger Clemens and the 1988 Red Sox represented the best-case scenario for the “Stars and Scrubs” method. (via Steve Lipofsky)

Much has been written during the last calendar year about the rise of the super team in baseball. Over the last two seasons, the Dodgers, Astros, Cubs, Cleveland, and the Nationals have mixed some combination of years of effective draft and development, wise free agent acquisitions, enormous roster depth, superstars who are frequently young and cost-controlled, and (in a few cases) large payrolls to build division-winning titans capable of 100+ wins.

However, I come to bury these Caesars, not to praise them. I’m not interested in how these types of teams were built. Rather, I’m more interested in determining which other talent distributions are most effective. Not every team can amass the combination of depth and superstars the super teams have.

Obviously, it’s best to be both deep and talented. But if being both deep and star-laden isn’t an option, is it better to be extremely deep? Or is it better to possess lots of top-end talent, even if it’s surrounded by subpar talent, the so-called Stars-and-Scrubs model? Let’s take a look at MLB roster compositions and see which non-super team methodologies work best.

Methodology

To start testing out the effectiveness of talent distributions, let’s first establish some parameters. I’ve chosen to collect data since the 1988 season. This gets us a nice round number with 30 years of data. It also gets us past the collusion issue from the mid-1980s, as the ruling that owners had colluded came down in September of 1987.

Most importantly, the streams with which teams acquire talent have not changed in that time frame. Free agency, trades with a late-July/early-August trade deadline and waiver trades afterwards, the Rule 5 draft, international markets, and the waiver wire are all methods that have been used since 1988. Talent evaluation has changed significantly in that time, and the amount of use for each of those methods of talent acquisition has ebbed and flowed, but the mechanics are almost exactly the same today as they were in 1988.

I’ve collected WAR for all position players with 50 plate appearances or more, and all pitchers with 20 innings pitched or more, since 1988. Then I’ve determined the percentage of a team’s playing time that went to each player and pitcher. To make that determination, I’ve divided each hitter’s total plate appearances by his team’s total number of non-pitcher plate appearances. For pitchers, it’s the same formula with innings pitched substituted for plate appearances. Next, that percentage is multiplied by 0.57 for hitters and 0.43 for pitchers, based on Neil Weinberg’s calculations.

For a real-world example, let’s honor the recently retired Carlos Beltran. In 2013, Beltran took 600 of the Cardinals’ total 5,830 non-pitcher plate appearances. That’s 10.29 percent of all plate appearances. Using the 57 percent rule, we multiply his 10.29 percent of all plate appearances by 0.57. Beltran’s total contribution to the 2013 Cardinals was 5.87 percent.

After determining how much of a team’s available playing time each player absorbed, it was then time to categorize how each player performed. Again, using WAR and FanGraphs’ definitions, I classified the players using the table found here:

WAR Groupings

Player Type	WAR
Scrub	0 – 0.9 WAR
Role Player	1 – 1.9 WAR
Solid Starter	2 – 2.9 WAR
Good Player	3 – 3.9 WAR
All-Star	4 – 4.9 WAR
Superstar	5 – 5.9 WAR
MVP	6+ WAR

Relief pitchers pose their own problem, as relief pitcher WAR is depressed. Using the above table, a 3 WAR reliever would be a good player. However, a 3-WAR reliever is one of the best at his craft in all of baseball. Just 29 individual reliever seasons since 2000 have reached the 3-WAR barrier.

To remedy this, I’ve instead used WPA to rank individual reliever seasons. Using the order of relievers from top to bottom, I then distributed all relievers in the same frequency that hitters and pitchers were distributed. Of all hitters and starting pitchers, 1.67 percent produced 6+ WAR (an MVP designation). As such, the top 1.67 percent of relievers by WPA were given the MVP designation. Scrubs accounted for 57.47 percent of all starting pitchers and hitters, so the bottom 57.47 percent of relievers by WPA were designated as a Scrub, and so forth.

From this point, I added all of the percent contributions for every player on each team since 1988, broken out by player type (MVP, Superstar, All-Star, Good, Solid, Role, Scrub). Moreover, I needed a good working definition for three new categories: Depth, Star, and Scrub. The Scrub designation was obviously the easiest, simply using the same definition offered by FanGraphs (players with < 1.0 WAR). For Depth players, I added the percent contributions received for players between 1.0 and 3.9 WAR (or the corresponding WPA for relievers). Finally, Star players are defined as all players at 4+ WAR.

This gave me a talent distribution for each team: the total percentage of available playing time each team gave to Scrubs, Depth players, and Stars. For some perspective, the average team since 1988 gave 14.63 percent of its playing time to 4+ WAR players (“Stars”), 44.84 percent to players between 1.0 and 3.9 WAR (“Depth”), and 37.79 percent of their playing time to players under 1.0 WAR (“Scrubs”).

The final step was to define which teams were deep and which teams were top-heavy (Stars and Scrubs). I assigned each team a percentile rank for all three categories, Stars, Depth, and Scrubs. And I’ve carved out the following samples:

Deep teams: upper third percentile in percentage of time from depth players, lower third percentile from star players and scrubs players (41 teams qualify)

Stars and Scrubs: upper third in percentage of playing time from star players and scrub players, lower third from depth players (34 teams qualify)

Deep Plus: upper third in depth players, lower third in scrubs, and middle third in stars. This is the model for deep teams listed above, but with slightly more playing time from star players. (29 teams)

Stars and Scrubs Plus: upper third from star players, lower third from depth players, and middle third from scrubs. This is the same model for Stars and Scrubs above, but with slightly less playing time for scrubs. (31 teams)

Deep with Stars: 2nd quartile in percentage of playing time from star players and depth players, lower than 50th percentile from scrubs (52 teams)

There are many reasons for including these categories. First and foremost, I want to determine if either the Depth model or the Stars and Scrubs model is better. Second, I want to illustrate just how much more effective it is to have both depth AND stars. And finally, I want to illustrate how much of a difference it makes for a deep team to add some more star power, and how much of a difference it makes for a Stars and Scrubs team to decrease their scrub playing time.

Results

First, let’s take a look at the average composition of teams in these categories. The following graph shows the percent of playing time the average team from each category gave to players designated as Stars, Depth, and Scrubs. I’m also including the average of all teams in the sample (870 total since 1988) for context.

You’ll note the total percentages add up to approximately 97 and 98 percent. This is because, depending on the individual team, approximately two percent of playing time goes to players who didn’t amass 50 plate appearances or 20 innings pitched.

As far as some of the differences in percentages between the various categories, a good rule of thumb is that a full-time position player or a 200-inning starting pitcher receives between six and seven percent–approximately–of a team’s overall playing time. The difference in composition between the Deep and Deep Plus category, essentially, is that the average Deep Plus team has one more regular contributor in the Star category and one fewer in the Depth category.

Now let’s take a look at how these teams fared, using the collective Pythagorean record for all teams in each category. This is presented as a box plot. The top and bottom dots represent the maximum and minimum pythagorean record, respectively, for teams in each category. The top box illustrates the range of the second-highest quartile for pythagorean records, and the bottom box is the third quartile.

To help illustrate the type of teams that fit the edges of these categories, I’ve identified the maximum and minimum teams. Finally, to show the full range, I’m including a dot plot for each team in each respective category. These are listed to the left of the box plot for the category.

We are admittedly dealing with some small sample sizes, but the Deep teams clearly outperform the Stars and Scrubs model. The difference in the collective pythagorean record between the two categories is .038, or 6.16 wins over the course of a season.

And lest you think it’s a function of a few particularly strong teams in the deep category, we can answer that by looking at the second and third quartiles. The bottom of the third quartile range for Deep teams is just a smidge less than the top range of the second quartile of Stars and Scrubs teams. The 1991 Cardinals are the 27th percentile of deep teams, and they come in with a .502 pythagorean winning percentage. The 1990 Padres are the 76th percentile of Stars and Scrubs teams, and they carried a nearly identical .500 pythagorean winning percentage. We have our first answer. It’s clearly better to be a deep team than a Stars and Scrubs team.

Now let’s take a look at our Deep with Stars category to get an idea of how much more effective those teams are compared to the other categories. The collective pythagorean record (.538) is .019 better than the Deep category (3.08 wins), and .057 better than the Stars and Scrubs teams, a whopping 9.23 wins per season. Where the Deep with Stars model really proves itself is in the second and third quartiles. The bottom of the third quartile is the 2014 Royals, with a .519 pythagorean winning percentage.

Put another way, take all the teams since 1988 who gave between 13.8 and 20.35 percent of their playing time to stars, between 45 and 51.8 percent to depth players, and between 25 and 36.3 percent to scrub players. Of all of those teams, 75 percent of them were better by Pythagorean winning percentage than a World Series runner-up. The top of the second quartile is the 1989 Giants, another World Series runner-up, this time with a Pythagorean winning percentage of .569 (92 wins). Other categories have single outliers that outperform the maximum of the Deep with Stars group, but there’s so much quality packed into the Depth with Stars set that the outliers are completely moot.

It’s also worth noting that this group doesn’t even represent the best possible version of a Deep with Stars model. The list of teams reaching the top quartile of Star playing time and the top quartile of Depth playing time (not listed in the graph) is a who’s who of juggernauts and World Series participants. Their collective pythagorean winning percentage is .570, a 92-win pace. Depending on the degree with which teams can collect stars and depth, it’s worth nine to 14 more wins than a Stars and Scrubs team, and three to eight more than a Deep team.

An interesting pattern emerges when we infuse the Deep teams with more playing time from star-quality players and remove slight playing time from scrubs from Stars and Scrubs teams. The Deep teams receive a pythagorean bump of .012, or 1.9 wins. The Stars and Scrubs teams, however, leap all the way up from .481 to .520 in their collective pythagorean winning percentage. That’s an increase of .039, or 6.3 wins. The bottom of the third quartile for Stars and Scrubs Plus is higher than the top of the regular Stars and Scrubs model. That’s a significant jump.

When the Depth teams become Depth Plus teams (slightly more playing time for star-quality players), the collective bump isn’t that significant. However, the floor rises considerably. Five of the 41 Deep teams had lower pythagorean records than the worst team in the Deep Plus group. And the second- worst team in the Deep Plus group–the 2004 Tigers and their .491 pythagorean winning percentage–fared better than nine of the 41 Deep teams.

There’s a lesson to be learned from the way the Depth and Stars and Scrubs groups reacted to their adjustments. The Depth adjustment to Depth Plus involved adding star-quality players. The Stars and Scrubs adjustment to Stars and Scrubs Plus involved decreasing the playing time contribution of scrubs.

I Don’t Want No Scrubs

Once I noticed the impact decreasing scrubs had on the Stars and Scrubs group, I decided to test out each individual category (Stars, Depth, Scrubs) in a linear regression with Pythagorean winning percentage as the dependent variable. The Scrub category (playing time given to players with less than 1.0 WAR) returned the highest adjusted R-squared value by far, coming in at 0.5985. The Star category registered an adjusted R-squared of 0.4319, and the Depth group came in at just a 0.075 adjusted R-squared.

The Scrub relationship is strong. Let’s take a look at that relationship in a simple scatter plot.

Depth has its purposes, and it’s preferable to Stars and Scrubs. And giving a high volume of playing time to star-quality players obviously helps any team. But the quickest route a franchise can take in building a winner is simply to avoid giving playing time to scrubs, players under 1.0 WAR.

This is easier said than done, of course. There are more considerations than you can count when it comes to roster construction: payroll, the quality of a farm system and the proximity of its talent in relation to the major league level, where players are on the aging curve, and many more factors. But the data still stand. The most impactful route to a winner isn’t in improving its depth or its stars. It’s in raising the floor of the roster, avoiding the scrubs.

References & Resources

Travis Sawchick, FanGraphs, “Has the Era of the Super Team Arrived?”
Neil Paine, FiveThirtyEight, “Think The 1927 Yankees Were Great? Meet The 2017 Dodgers And Astros”
Travis Sawchick, FanGraphs, “What If More Teams Follow the Astros’ Extreme Roadmap?”
Neil Weinberg, FanGraphs Library, “Calculating Position Player WAR, A Complete Example”
FanGraphs Library, “What is WAR?”

John LaRue is a graphic designer, former minor league baseball media relations director, and data visualization enthusiast. His work has been featured in The Best American Infographics 2013 and I Love Charts: The Book. Follow him on Twitter @tdylf.

10 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Paul Moehringermember

7 years ago

How did someone like Justin Verlander from last year rate?

Was his entire WAR counted towards what his “role” would considered to be, or just what he did with the Tigers/Astros?

John LaRuemember

7 years ago

Reply to Paul Moehringer

I left it strictly in his production for each specific team. So in his case, it was a 1.1 WAR with Houston and 3.0 for Detroit. Both teams have him as a Depth player. In reality, his 4+ total WAR should have counted as Star quality. That’s a great catch, and I did not adjust for it.

If I get a little time today, I’ll see if I can parse those out and see where it might make a difference with some teams.

Sam Sharpemember

7 years ago

Really enjoyed the analysis. Did you compare these different categories looking at batter vs pitcher? For example, I think it would be interesting if you had a heat map with the y axis being the team category for pitchers and the x axis being the team category for batters and the measure being their pythag-Win%. Maybe star pitchers with scrubs and a deep lineup performs better? It might thin out the data, but just a thought.

John LaRuemember

7 years ago

Reply to Sam Sharpe

I didn’t compare pitcher depth to hitter depth (or stars or scrubs), but I think that’s a great idea for a future article. I can play around with it and see what turns out.

channelclemente

7 years ago

How do the Giants teams of 2010,12,14 stack up?

John LaRuemember

7 years ago

Reply to channelclemente

The 2012 squad lands in the Deep with Stars sample. 2013 does as well. The 2014 team had above average star power, but was very ordinary in their depth. They did do a good job of minimizing scrub playing time. Essentially 2014 was an average team in talent distribution except they swapped out 5-7% of scrub playing time for star playing time.

The 2010 team had lots of star power (4+ WAR players)- 31.6% of their playing time went to star-quality players, good for the 94th percentile. Their scrub playing time percentile was also very positive- 86th percentile (85% of teams gave more playing time to scrubs).

Dave Tmember

7 years ago

There’s an interesting idea here, but I wonder if some of this analysis is question begging because it’s an after the fact look back at how much WAR players compiled during a year.

One clarifying question: are these WAR totals for player quality something like WAR / 600 (or WAR / 180 innings for pitchers) or simply raw WAR compiled by players during the season?

I ask because I see a few paths to giving a large amount of playing time to sub – 1 WAR players, especially if that’s judged on what they compiled for the year.

One way is to have a lot of average-ish (2 WAR) or slightly below players get hurt, with extended absences, so they’re compiling a fair amount of playing time in the aggregate but not enough individually to surpass 1 WAR. That’s obviously exacerbated if their replacements don’t perform very well. This could help highlight the importance of depth to cover for injuries, but at some point a team is only going to have so much depth if it suffers a rash of injuries (to starting pitchers, for example).

Another is by having players who do play a high percentage of the season individually but still compile less than 1 WAR. That could be a conscious roster decision to focus resources on paying for star players at the expense of other positions, or it could just mean that the roster looked pretty good at the start of the season and then a big chunk of the average-ish players badly underperformed their projections.

None of these are good outcomes, and there’s some varying level of blame that can be assigned to a front office depending on the exact path to these bad outcomes.

I’m sensing, however, an underlying tone from the author that a lot of these “scrubs” could have and should have been identified as such prior to the season, and I’m questioning whether that’s the case in a lot of circumstances where a team ends up with a “stars and scrubs” roster when looking back at the season.

And, if I understand the definitions correctly, if a team gambles on risky options to fill out starting positions, and those risky players do perform, then it rises out of the “stars and scrubs” category to a category that we don’t see in these charts. Similarly, if there are injuries and the bench players or minor league call-ups play well, including better than we’d except before the fact, then it’s defined out of the “stars and scrubs” category.

In other words, a roster reasonably is full of players who each have some expected variance around their projections. Some of these categories seem to be selecting after the fact to find when certain types of rosters underperformed but leaving out the observations when those types of rosters performed to expectations or overperformed.

John LaRuemember

7 years ago

Reply to Dave T

For clarification, it’s raw WAR. I thought about the WAR/600 or WAR/180 options but wanted to avoid mediocrities who pulled a hot hand for 75 at-bats (Pete Kozma circa 2012, for instance, comes to mind. Instead I opted to use raw WAR but then weight the contribution based on playing time (using the Kozma example, his 1.3 fWAR lifted him to the Depth category, but his contribution was only .78% of all playing time).

You raise good points, and it’d be interesting- if I had the time and can find the data assets- to plot these by pre-season projections rather than actual results. Theoretically, that would scrub out failed expectations, unexpected injuries (and in some cases, it might capture the risk of injury with some high ceiling/low floor injury risk players), etc… And that would be the way to capture what you’re referring to at the end- the underperforming or overperforming rosters.

You do understand the definitions correctly, as well. Some of this gives a face to the risks incurred by, say, playing lots of young players, or taking on injury risks. And it helps define the importance of draft, development, and having a steady MiLB pipeline to provide depth- to lift the bottom of your roster past the scrub level. And properly spending in free agency- acquiring players with higher certainty rather than gambling on cheaper alternatives… and knowing when to cut bait on the cheaper alternatives. But, I think we both agree- there’s no doubt that luck and simply a wide range of outcomes will always be a factor, regardless of a team’s farm system or how properly they’ve vetted their free agent acquisitions.

Dave Tmember

7 years ago

Reply to John LaRue

Thanks for the response. Hope that I didn’t come across as being overly critical – these results are interesting and I’m sure it took a lot of work. Sounds like we are on a similar page as to what this analysis tells us.

Perhaps one other thought of an additional direction to go is to look at some measure of “sustainability” or “improvement potential” – how did these same teams perform in year n+1 and (maybe) n+1? I can see the counters to my own suggestion, however, as a host of other factors (age of key players, free agents added or lost, prospects graduated to the majors) would impact these comparisons, not just performance variance (including injuries).

Dominikk85member

7 years ago

shouldn’t negative WAR be considered? I think negative WAR guys are a factor that additionally hurt stars and scrubs Teams. the difference between a minus 1 and a Zero war guy is the same as 1 vs 2 war or 2 vs 3. Long term negative WAR Players usually don’t stick but in a single season having a mat wieters (not sure if he was that bad) or albert Pujols with -1.5 WAR can have a pretty bad effect because it means you Need a solid bench Player to even get back to Zero.

for bad Teams those guys are a pretty big factor, for example the padres had more than 8 negative WAR in 2017. replacing them with Zeros doesn’t make them a contender but it improves the team

BAL	CHW	ATH
BOS	CLE	HOU
NYY	DET	LAA
TBR	KCR	SEA
TOR	MIN	TEX

ATL	CHC	ARI
MIA	CIN	COL
NYM	MIL	LAD
PHI	PIT	SDP
WSN	STL	SFG