Are High-Scoring Games Making a Comeback? by Jonathan Luman November 4, 2015 Higher-scoring baseball games could become more normal. (via Andrew Malone) Early in September, Jon Roegele pointed out that scoring in August was way up, countering a longstanding downward trend. My first thought was, “Hooey! This is no more than the result of random variation.” This idea is supported by observing that the monthly scoring rates were roughly normally distributed and that the August value was two standard deviations above the mean, as you can see in Figure 1. The August scoring rate was high, but a normal distribution predicts there will be some extreme values like this. Stated another way, assuming that the run environment had changed in August appeared to me to be the base rate fallacy. The base rate fallacy can occur when an unlikely explanation (August scoring was high due to random sampling, P ≈ 0.02) is dismissed, even though the alternative (an increase in the run environment, P = unknown, presumed to be << 0.02) is even less likely. But then scoring in September was just as high as it was in August. The likelihood of consecutive months sampling at extreme values is very low if due to randomness alone (P ≈ 0.02*0.02). This suggests a more causal variable was introduced into August and September. So, my second thought was that the August and September average scoring rate was elevated by a rash of high-scoring games. If this were the case, it would be observable in the distribution of game scores (i.e., the actual tally of runs). Figure 2 shows the distribution of 2015 game scores, all 4,858 of them (2,430 games, two scores each game, one Detroit-Cleveland game not played). The bars are the actual distribution; the line is a negative binomial fitted distribution (R2 = 0.99). A negative binomial fit distribution is derived from the mean and variance of the empirically sampled game scores. Sean Dolinar provided a really clear explanation of this math in a baseball context on his blog before joining FanGraphs. This distribution is about what you’d expect; the most probable scores are two or three runs (30 percent of all games), the existence of high-scoring games draws the mean game score up to 4.25 runs, and about five percent of games score 10 or more runs. The probability mass function is defined so that the sum of probabilities is equal to 1: In practice, this is just pi/N, where i is the game score and N in the total number of games played. The mean game score (4.25 runs in 2015) is the product of the game score and probability of that score, summed across all possible scores: The mean score is really just a weighted average of the game scores (weighted by the probability of occurrence). Figure 3 is a comparison of the April-July and Aug-Sept distributions, now shown as lines because they overlap. Observe that in the latter months low-scoring games (zero to two runs) occur with reduced frequency and that scoring three to nine runs occurs with an increased frequency. Consider that the April-July mean score is:And the Aug-Sept mean score is:Then difference between these mean scores is:Where pi corresponds to the probability of i runs in the August-September period and qi to the probability of i runs in the April-July period. There is a “problem” in this definition — p and q tend to be very similar in magnitude, so taking their difference tends to amplify sampling uncertainty. The sampling error can be mitigated, without change to the mean and variance, by substituting fit distributions for the sample distributions. This only “works” if the empirical data truly conforms to a negative binomial distribution. Fortunately, the quality of fit (R2) is very high (>0.99) for both sample periods. Figure 4 repeats the distributions shown in Figure 3 and adds negative binomial fit distributions ghosted behind the data curves. Also shown in a grey curve (plotted on the right axis) is the run-value-weighted difference between the two “half”-season fit distributions. As expected, the integral of the grey curve is equal to the difference in mean scores between the April-July and Aug-Sept periods, 0.35 runs (4.47 – 4.12). The point of the grey curve is that is shows us which run totals most greatly contributed to the elevated run environment. The fat part of the grey curve is where teams made the “extra” runs: in 6-10 run games. We can tell the blowouts (say 15+ run games) weren’t major contributors to the change in run environment, because the area under the curve for big scores isn’t all that large. Figure 5 repeats the grey curve from Figure 4 and adds the run-value-weighted difference between the sample distributions as vertical bars. The comparison between the two data sets is a little rough (R2 = 0.11). Upon close inspection, the discrepancies between the two sets correspond to where the sampled data sets have the greatest deviation from expected values. I thought the increased run production came in bunches (say a handful of 15+ run games), but it turns out the blowouts didn’t move the needle much. Had that been otherwise, it would have required looking for different phenomenon than Jon was searching for. Instead it looks like teams were pushing across an extra run across twice a week (two runs / six games ≈ 0.35 runs/game). So where did these extra couple runs per week come from? That’s a great topic for a follow-up study. I’d suggest at least two areas to explore, focusing on context-specific metrics. Did teams get better at scoring runners on third? Did hitters get better with runners in scoring position? I could be talked into one of those two (probably), but obviously there are other potential explanations besides these two. Who knows if this increased scoring will continue? More runs are good, right? (Well, to a point.) I seem to recall that was a theme in new commissioner Rob Manfred’s comments last offseason. A Hardball Times Updateby RJ McDanielGoodbye for now.