MLB Revenues, Population and Social Media

It's no surprise California is the state with the highest revenue in baseball.

We often hear about the woes of “small-market” teams that can’t generate the requisite revenue to compete with the “large-market” teams. This is usually approximated by population size and ignores the fact that St. Louis generates 50 percent more revenue than Tampa Bay despite equivalent Census Metropolitan population levels. As another example, the Red Sox have a relatively average metro population but have a dominant revenue stream.

Let’s take a deeper look at some of the statistical indicators that a team has a strong revenue base. I’ve excluded my hometown Toronto Blue Jays due to differences in culture and currency. (Beer-can incident aside, the harshest attack ad in our most recent election read “Justin, He’s Just not Ready.”)

Map of MLB Revenues by Census Metropolitan Area


(Note: You can see a larger version of all of these images if you open them in a new tab.)

We begin with a map of revenues across the United States, with filled in metro areas that generate MLB revenues; colors go from green (low revenue) to red (high revenue). Each metro area includes all revenue for that area, i.e. New York includes the Mets and Yankees, and Oakland-San Francisco includes the As and the Giants. Using census metro areas may not be perfect, but for the purposes of this piece we will assume the U.S. Census Bureau had good reasons to group things the way it did.

It would appear that just based on empty spots on the map, Portland and Las Vegas would be good potential landing spots for MLB expansion, when and if that ever happens. We see something similar in the state view, where it would appear there are coverage gaps in the Carolinas, Oregon and Nevada:

Map of MLB Revenues by State


The Link Between Revenue and Population

Let’s tackle the traditional variable attributed to large-revenue teams, specifically the size of their general market, as measured by population. What’s more important, metro area population or state population? What happens when teams share a market? Do they split the potential, or does the pie become larger? Let’s take a look:

What drives revenue – Local Metro Population or State Population?

We see a much stronger signal when we compare revenue at the metro level, which isn’t all that surprising. What we do see, though, is that it appears the more popular city-sharing team gets the larger share of the pie, leaving the other team to underperform based on population. However, if we were to add the two revenue numbers, the combined revenue of the two teams is significantly higher than the trend line would suggest.

Let’s take a look at the data in a slightly different way, where instead of breaking out the individual teams, we’ll amalgamate all the baseball revenue the metro area generates and see how strong a correlation we get.

Metro Area Populations with Revenue from all MLB Teams in Area | R-Squared 0.7

Here we see a very strong correlation between general market size and overall baseball revenue the market can sustain, with a few outliers, specifically the Bay area, which outperforms its population by a very significant margin (mostly thanks to the Giants). However, I’m still not convinced there really is that strong of a linkage between revenue and market size outside the four biggest markets — and here’s why:

Metro Area Populations with Revenue from all MLB Teams in Area | R-Squared 0.08 (Top 4 Metros Excluded)

We see only a very loose correlation in the smaller markets; clearly the large markets are skewing the correlation with such a small sample size. This would suggest that other metrics such as wins and fan intensity play a larger role in determining franchise revenues. Let’s take a look at wins first to determine if we can see any connection there.

The Link Between Revenue and Recent Wins

The following four pictures shown, in sequential order, the growing relative importance of recent wins to franchise revenue, providing evidence that wins provide a short-term boost in revenue but not necessarily a long-term one. The 2016 season wins were pulled around the same time the Forbes revenue list came out.

We see a progression in the quality of our R-Squared correlations from 0.01 in 2013, 0.06 in 2014, 0.18 in 2015 to 0.21 in 2016. I’m not totally convinced this is that strong of an indicator, especially since it quite possibly could be the other way around, i.e. large-revenue teams are finding ways to better leverage their resources and produce more wins. The Cubs would be a prime example of that.

Let’s step away from wins and market size and look at fan engagement/intensity, about which we can make gross generalizations (e.g., Yankees are waaay more popular than the Mets). But do we have a way to quantify this, other than looking at revenue?

The Link Between Revenue and Social Media

One could posit that if a team is really popular, then it should have a much stronger following on Twitter and Facebook than its competitors. Based on that hunch, I gathered Facebook Likes for each team’s official Facebook page as well as Twitter followers for the team’s official Twitter account. Interestingly, /Giants on Twitter is for the football Giants, whereas on Facebook it is for the baseball version. I digress; however, it is interesting to note that this does speak a little to cross-league competition.

Before we test the relationship to revenue, lets take a look at the link between Facebook and Twitter:

Unsurprisingly, we see a very strong linkage between the two metrics, with the Phillies exhibiting a much stronger Twitter following than Facebook. The Yankees are stronger on Facebook and also far ahead of any other team on social media, as evidenced by the following graph:

What we see here is an independent measure of fan intensity with respect to the Boston Red Sox, who do not have anywhere near the metro area population of their peers. The Tigers are an interesting one for me, since their FB and Twitter followings would suggest a rabid fan base, held back only by a city that has had financial turmoil.

Facebook Likes and Revenue | R-Squared 0.75

We see an extremely tight linkage between Facebook likes and revenue. The small sample correlation is stronger than with Twitter, depicted below…

Twitter Followers and Revenue | R-Squared 0.70

However, if we look at the picture and ignore the Phillies as a clear exception, the teams are much more tightly clustered around the trend line. Removing the largest metro areas from the sample does not affect the strength of the relationship, suggesting it is a more stable predictor of revenue than market size. Further, fan intensity, as measured by Twitter + Facebook, only has a 0.28 R-squared correlation to metro size, implying there is huge room for growth, irrespective of the market size as measured by population. Here’s the cross-chart:

Notice how the Red Sox, Giants and Yankees are very far above the trend line, whereas the Miami Marlins are far below the trend line.

Building a Simple Revenue Regression Model for No Good Reason

What happens when we take our three inputs (market size, social media and 2015 wins) and put them in the regression blender? Will it predict revenue with greater accuracy than any of the variables on their own? The model spits out the following formula: $78.6M + $3.15/Capita + $57/Twitter + $20/Facebook + $1.3M/2015Win.

The Texas Rangers, as per this model, generate approximately $100M from fan intensity, about $18M from the sheer size of the market and about $10M from their 2015 incremental wins. The remainder comes from just being a member of the exclusive 30-team club. According to this model, increasing fan intensity by 20 percent will result in a $20 million increase in revenue, all else being equal.

The model’s accuracy improves from 0.75 just by using Facebook data to 0.84 when combining the various data points and spits out this cross-chart:

We see a tight cluster of teams all around the trend line, with only the Giants and Dodgers significantly outperforming their predicted revenue.

Concluding Thoughts / Rant on Rampant and Obtrusive Advertising

A team with a relatively small market can generate a disproportionate amount of revenue simply by cultivating a strong, deep fan base. In fact, what the simple data in this article show are that market size is really only predictive for the large metros, whereas fan intensity (at least as measured by Twitter followers and Facebook likes) is far more important to generating revenue. There is a lot of talk lately about why NFL ratings are down, with reasons ranging from the outrageous (Kaepernick, seriously? Ratings are down 12 percent because of a political statement made by a backup quartermback?) to more plausible (warmer weather, people watching endless hours of political coverage). I have a much simpler theory, which will tie back to baseball:

There are far too many commercial advertisements for me to watch a game straight through. I used to watch more NFL football. However, it feels like every time there is a scoring play, it is followed by a five-minute commercial break, then a kick-off (usually just kicked out of bounds), followed by another five minutes of commercial advertisements.

Now, in the MLB playoffs, I’ve noticed they’ve introduced a soccer style tactic of shrinking the game and putting advertisements on the side. I can’t underscore how annoying and atmosphere killing it is. For something like soccer, NASCAR or Formula 1, I get it. There is continuous action, and you have to have some form of revenue generation, ideally without having to turn away from the live action. With baseball, you have so many guaranteed spots, it’s almost ridiculous — not to mention all the pitching changes that occur at the latter stages of the game when audiences are more captive.

No sport, no team should take the loyalty of its fan base for granted. Teams and leagues have far more to gain by ensuring fans enjoy the product than they do by short-term cash grabs via jammed-in advertising.

  • Revenue data as per Forbes; demographic data from the 2011 US Census.

Eli Ben-Porat is a Senior Manager of Reporting & Analytics for Rogers Communications. The views and opinions expressed herein are his own. He builds data visualizations in Tableau, and builds baseball data in Rust. Follow him on Twitter @EliBenPorat, however you may be subjected to (polite) Canadian politics.
5 years ago

Very interesting, especially that chart showing Twitter and Facebook correlation and the independent revenue graphs that follow. I think you are onto something in terms of measuring fan intensity that people haven’t been able to do using just TV ratings.

CSA may give a better representation of teams’ fan base populations in general and certainly the Red Sox fandom’s population. It is a regional team, with most of New England supporting it. State boundaries and even metro area won’t reflect the population fanbase as well as the CSA will. I would be interested to see how using CSAs would look compared to Metro Areas on your Twitter and Facebook v Metro chart.

5 years ago
Reply to  mustbunique

I’ve never bought the “it’s a regional team” argument. All teams are regional and generate fans outside of their CSAs. For example, should fans in Toledo be counted in the Tiger’s territory? The ability to a team, like the Red Sox or Cardinals to generate revenue outside of the immediate area is testiment to their marketing and should be viewed on par with teams’ ability to identify talent cheaply

5 years ago
Reply to  Scott

I agree that all teams are regional. 100%. And yes, teams certainly generate fans outside their CSAs. And fans move outside their CSAs and remain fans of teams. All true. I would be interested to know what the % of any fanbase lives within their team’s CSA. My guess would be most of the fanbase lives in the CSA.

As far as the regional argument goes, I was pointing out (maybe not well) that drawing lines at state boundaries will not be useful for smaller states like MA, or even for other cities near state borders, and that the whole region needs to be considered. Use of a Metro area is one way to accomplish that. I think the CSA in all cases is a more accurate reflection of a fanbase’s population than Metro areas. CSAs are more inclusive than Metro areas. Using CSA instead of Metro may increase the r2 results.

5 years ago
Reply to  mustbunique

What is CSA? A quick google search was not helpful

5 years ago
Reply to  mark

Combined Statistical Area. It’s kind of an extended metropolitan area that includes “satellite” cities that are too large and too far from the central city to really be called a “suburb” of the central city:

As an example, the Los Angeles metro area (what the Census Bureau calls a metropolitan statistical area, or “MSA”) doesn’t includes places like Riverside, San Bernardino, and Ventura, but they are included in the Los Angeles CSA. Similarly, the Philadelphia MSA doesn’t include Atlantic City NJ, Dover DE or Reading PA, but they are included in the Philadelphia CSA.

Some large cities that are close together are combined into a single CSA. San Jose, which has its own MSA, in included in the San Francisco-Oakland CSA. Washington and Baltimore, which each have their own MSA, are combined into a single CSA, although they weren’t always – IIRC, the Census Bureau combined their CSAs about ten or fifteen years ago, and the decision to do so was somewhat controversial.

The Washington-Baltimore thing aside, I agree with ‘mustbunique’ that it would be better to use CSAs for this type of exercise than MSA. The article suggests that Boston is not really that large of a market, with only a “relatively average” metro population. But its CSA is the 6th largest in the country, exceeded only by New York, Los Angeles, Chicago, and the combined Bay Area and Washington-Baltimore CSAs. The Boston CSA includes places like Providence RI, Worcester MA, Manchester NH, and Cape Cod, each of which has their own MSA separate from Boston’s.

5 years ago

I wonder what you’d find if, instead of population, you factored in measures of metro areas’ wealth, income or (perhaps best of all) business revenue. I’m from Tampa, where the median household income is dead last among the 25 largest metro areas. Maybe the Rays’ woeful attendance record comes down to a simple matter of dollars and cents for fans. Further, it surely seems to me that too few businesses are shelling out for luxury suites or big blocks of prime seats at the Trop. It’s possible that the Rays just failed to tickle the fancy of enough local lawyers, doctors and business execs. But it seems more likely that local businesses simply are too few in number and/or can’t see a return on investment from treating their clients to a day at the old ball yard. St. Louis may be comparable in size to the Tampa-St. Pete-Clearwater area, but median income there is a good 15% higher. Presumably, that means the Cards can draw from a comparably larger pool of disposable income. Meanwhile, the Marlins are struggling in a market that ranks next-to-last in household income.

5 years ago

I’m a little late, but as you’ve mentioned, regional wealth is a driver. Competition from other pro sports franchises for entertainment dollars is too. On the demographic side, mlb fandom skews white and older compared to the MLB metros as a whole, so maybe metro white population is a better fit than general population (some multicollonearity between income and race/age though). Finally, I think there is something to be said for the time in which a franchise has been in a particular market. Teams are brands, and the longer and more established that brand is, the greater its acceptance in the market. This probably isn’t linear though. Incremental acceptance decreases with age, so you’d need to transform this variable to something else like a log function.

5 years ago

