How are wins, attendance and payroll all related? by Dan Lependorf February 2, 2012 “…there are rich teams and there are poor teams. Then there’s fifty feet of crap. And then there’s us.” — Moneyball (2011) The release of Moneyball helped bring baseball economics into the mainstream. Obviously, given the subject matter, a lot of the attention has been centered on the plight of the small-market team, and how small-market general managers need to be smarter to survive. A common refrain I hear is that small-market teams are often pulled into a death spiral where payroll, attendance, and the quality of the team drag each other down, sinking the team into the gutter with no easy way out. And it intuitively makes sense. But is it really that simple? How strong are the links between those three variables? Thankfully, testing the strength of the links is fairly easy. I created a data set comprised of all MLB teams between 2000-2011, consisting of team payroll at the beginning of the season (via USA Today), total attendance per game figures (via Baseball-Reference), and winning percentage. Sure, it seems intuitive that all three of the possible pair combinations between payroll, attendance, and wins would have noticeable correlations. More money should buy more wins, wins should go hand-in-hand with more people in seats, and attendance should mean more revenue dollars for the front office to play with. But instead of relying on conjecture, why not actually test it? Before I continue, I need to touch on a very important point. Correlation does not necessarily imply causation. Two variables may be correlated, but the existence of a correlation does not mean that one of the variables caused the other. The percentage of US households with a television over the last 50 years correlates with the price of a gallon of milk, as both have increased over time, but the correlation certainly doesn’t mean that the price of milk caused more US families to purchase TVs. Correlation can certainly point in the direction of causation, but proving causation is a rather tricky proposition that requires research in a controlled environment. So think of this as a loose suggestion, rather than anything that’s set in stone. Now, onto business. What conclusions can we draw? By far, the biggest correlation is between payroll and attendance. Variable Pair R2 Payroll/Wins 0.16 Attendance/Wins 0.27 Payroll/Attendance 0.54 The R2 between payroll and attendance is 0.54, which is a fancy, statistical way of saying that 54 percent of the variation in payroll can be attributed to changes in attendance. The R2 figures for payroll/wins and attendance/wins are far lower, indicating a lesser degree of correlation. It makes sense that the two other pairs should be related, but the links between those pairs don’t seem to be quite as strong as payroll/attendance.A Hardball Times Updateby RJ McDanielGoodbye for now. But is it possible to shed a little more light on the correlation? Sure, A and B share a correlation, but does A influence B more than B influences A? It’s easy enough to test by looking at correlations between a set of variables and another set of variables from the previous year. Again, before delving into this, I have to give a similar warning as above. Another common logical fallacy is to assume that if an event happened after a previous event, the first caused the second. After writing this article, I made myself a sandwich, but typing about baseball didn’t cause me to get hungry. An event that follows another can certainly be the caused by the first event, but it’s not necessarily true. So again, nothing here is set in stone as an emphatic “this is how it is” conclusion. Attendance follows wins, not the other way around. Variable Pair R2 Attendance/Wins 0.27 Attendance/Last Year’s Wins 0.30 Wins/Last Year’s Attendance 0.13 If we compare attendance with the previous year’s wins, the R2 jumps up a little from the same-year correlation. But when reversed, when wins are compared to last year’s attendance, the R2 falls to 0.13. This seems to suggest that a fanbase shows up in larger numbers if the team is doing well, but the reverse effect of a team doing well because of a large fan base (more revenue from ticket sales) doesn’t generally exist. Payrolls expand after a team does well more often than the reverse. Variable Pair R2 Payroll/Wins 0.16 Payroll/Last Year’s Wins 0.25 Wins/Last Year’s Payroll 0.12 Again, we see an R2 change as these variables are moved around in time. This indicates that teams generally expand payroll to push a talented team over the edge, instead of using payroll to give a bad team a talent spike. Nothing surprising here, but it’s nice to have it in black and white numbers. Payroll and attendance are far “stickier” than wins. Variable Pair R2 Payroll/Last Year’s Payroll 0.83 Attendance/Last Year’s Attendance 0.80 Wins/Last Year’s Wins 0.31 For this set, I ran a correlation analysis between each variable and it’s value in the previous year. (To borrow a term from economics, a variable is called “sticky” when it isn’t very susceptible to change over time.) The year-to-year correlations between payroll and attendance are extremely high, whereas wins are much more volatile. So what does it all mean, as far as small-market teams? Having a healthy fan base and a strong payroll is extremely important for the health of a franchise. Wins are volatile, but payroll and attendance will stick around for a while. Of course, the problem is that wins lead to payroll and attendance, so there seems to be truth in the death spiral idea after all. It’s also interesting to note that the correlation between payroll and wins isn’t nearly as strong as one might expect. An R2 of only 0.16 is absolutely tiny, indicating a relationship that is far from ironclad. (That’s your cue to gloat or sulk, Rays and Cubs fans.) References & Resources USA Today MLB Salary Database Baseball-Reference Attendance Database My data set, with payroll, attendance, and win percentage figures from 2000-2011.