The Probability of Streaks

Just what was the probability of Cleveland’s 22-game winning streak two seasons ago? (via Erik Drost)

On September 15, 2017, Cleveland lost a close game to Kansas City. Cleveland would go on to clinch the division the next day while the Royals were hovering around .500, a long shot to qualify for a Wild Card spot. It wasn’t the type of game that would typically catch much attention, except for one exceptional circumstance: Cleveland had not lost a game in over three weeks.

Cleveland’s streak ended that day at 22 games, an AL record and second in major league history only to a 26-game winning streak by the 1916 Giants. This is obviously a remarkable feat, but before we start talking about the probability of something like this happening, I want to start with something a little simpler: flipping coins.

Say we want to flip a coin to land heads 22 times in a row. Assuming we have a fair coin, half the time it will land heads and our streak will keep going, and half the time it will land tails and the streak will end.  No matter how long our streak continues, each additional flip carries a 50/50 chance of ending the streak.

This means that if we’re aiming for 22 successful flips in a row, our chances of success get cut in half 22 times, or 0.522. The general formula for this is pk, where p is the probability of success in one flip and k is the length of streak you are aiming for.

Baseball teams aren’t coins, but the same logic applies. If a team is expected to win half its games, then each game has about a 50% chance of ending a streak. Things get more complicated with baseball teams, because the probabilities continually change from game to game — a team might have a 60% chance of winning with their ace at home but only a 40% chance with their fifth starter on the road, for example — but for now, let’s stick with this basic formula.

Let’s say that our coin represents an average team. A 22-game streak would be incredibly unlikely — 0.522 is about one in 4.2 million — so let’s look at something a bit more modest. If we instead flip the coin 10 times, the chances of it coming up heads each time are 0.510, or about 0.098%. Still a long shot, but we’d expect it to happen every thousand or so attempts.

We keep repeating this so that every 10 ten flips of our coin represent the first ten games of one team-season. Since the American League became a Major League in 1901, there have been 2,526 team-seasons in MLB (not counting the short-lived Federal League, which is sometimes but not always considered a Major League).  If we expect 0.098% of those to start with a ten-game winning streak, we should only see it actually happen around two or three times (0.00098 * 2526 ≈ 2.5). In reality, there have been six 10-game season-opening winning streaks: the 1955 Dodgers, 1962 Pirates, 1966 Cleveland, 1981 A’s, 1982 Atlanta, and 1987 Brewers.

Now let’s say we repeat the coin-flipping experiment, only instead of flipping one coin over and over, we alternate between two coins every ten flips. And instead of using two fair coins, we alternate between a two-headed coin and a two-tailed coin. Since the two-headed coin will always (or almost always) come up heads and the two-tailed coin always tails, we would end up with 1,263 successful streaks and 1,263 unsuccessful streaks.

This is obviously an extreme example, but it illustrates that having a group of coins whose average probability of landing heads is 50% is not necessarily the same as if every flip has a 50% chance of landing heads. This is important because baseball teams on average win half their games, but some teams are clearly better than others.

Instead of two coins, then, let’s say we have 2,526 different coins, and instead of them being fair or two-headed or two-tailed, they’re all weighted differently. And let’s say that these coins are weighted following a normal distribution with a mean of .500 and a standard deviation of .072, which is pretty close to how the talent levels of MLB teams are distributed historically.

For k=10, the average value of the pk formula across all 2,526 coins is about 0.22%. This means that if we flipped each coin ten times, we’d expect to see around 5.5 (.0022 * 2526) successful streaks, or almost exactly what we observe in MLB.

When we ensure that the probabilities for each individual coin match up pretty well with the talent levels of actual MLB teams, they give a result very close to reality. This suggests that the simplified probability of flipping coins, while not a perfect description of the actual probabilities of baseball teams, still works pretty well for predicting streaks as long as we have a reasonable estimate of a team’s overall talent level.

Counting Opportunities

This works well when looking at streaks that begin a season, but it gets more complicated when you start looking for streaks at any point in the year. The pk formula is only designed to give the probability of a streak over exactly k games; to calculate the probability of a streak appearing over a larger sample of games, we need to know how many independent opportunities a team gets to start a streak.

There are 153 different ten-game windows where a streak could occur in a 162-game season (games 1-10, 2-11,…153-162). So it might seem that each team has 153 opportunities to go on a ten-game winning streak. This is sort of true, except for the fact that these windows are not independent. Since they overlap, a single loss can eliminate any chance of a streak in several ten-game windows at once, and multiple windows can contribute to the same streak. As a result, we can’t treat these as 153 independent opportunities.

A Hardball Times Update
Goodbye for now.

If we think about how streaks work, though, we can estimate how many chances a team gets to start a winning streak. The first game of a winning streak will always either be the first game of the season or the first game immediately following a loss because any win that follows another win will not start a new streak but rather continue an existing one. This means we can estimate the number of opportunities a team will have to start a streak by estimating how many losses we expect them to have.

In a 162-game season, any ten-game streak will have to start by the 153rd game at the latest, so any loss in the first 152 games can potentially be followed by a ten-game winning streak. For a .500 team, we expect 76 losses over the first 152 games, plus the first game (which can always start a streak). That means a .500 team will have, on average, about 77 opportunities to begin a winning streak of at least ten games.

We can test this estimate against actual MLB streaks. To avoid dealing with different season lengths, we’ll limit ourselves to the years 1962-2018 and exclude the strike years of 1981, 1994, and 1995.  That gives us 1,436 team-seasons.

If we repeat the above experiment using 1,436 coins to represent the team-seasons in our new sample (and lower the standard deviation to .060 since team talent levels have narrowed compared to the first half of the twentieth century), we find that we would expect about 0.17% of opportunities to end in successful streaks.  If we have 1,436 teams with 77 opportunities each, that would mean we’d expect to see about 190 such streaks over this period.

This is a slight overestimate because better teams — which make up the bulk of successful streaks — lose fewer games, and therefore have fewer opportunities to start a new streak. We can account for this by using a new formula which incorporates the number of opportunities into the pk formula:

Number of successful streaks expected:

N * pk

p = probability of success (team’s true-talent W%)
k = length of streak
G = number of games in a season
N = estimated opportunities to begin a streak
= (G-k) * (1-p) + 1

Using this instead of just multiplying the pk formula by 77 for every team drops the expected number of streaks to about 170. In reality, teams have had 179 ten-game winning streaks over this period.

This tells us how many successful streaks we would expect to see from a team of a given talent level, but it doesn’t tell us the probability of a team having a successful streak. That’s because it’s possible for one team to have multiple successful streaks.

The problem becomes apparent if we try to calculate the probability of a team having a two-game winning streak. The above formula gives a value of about 20 for a .500 team, meaning an average team would expect to see around 20 two-game streaks over the course of a season. This obviously can’t be the probability, since probabilities are constrained to the range [0,1].

So we need to approach the formula differently if we want to instead estimate the probability. To do this, we switch out the probability of success for the probability of failure. Instead of trying to calculate the chance of success in at least one opportunity, we calculate the chance of failing in every opportunity.

The probability of a successful streak in one opportunity is pk, which means the probability of failure is (1- pk). Now we can calculate the probability of a streak of failures covering all N opportunities. That gives us the probability of not having any successful streaks, so the probability of at least one streak will be one minus this value:

Probability of at least one successful streak:

1 – (1-pk)N

The distinction between these two formulas is entirely due to the chance of a team having multiple streaks, which means the longer the streak, the less difference it makes which formula you use.  For a 22-game streak, for example, both formulas will give nearly identical values because the likelihood of a team having more than one 22-game winning streak in a season is negligible.

Exact Probability for the Coin-Flipping Example

The pk formula gives an exact probability as long as p is known and constant, but the last two formulas we derived are only estimates based on an expected number of opportunities. The actual number of opportunities will vary based on a team’s results, which can have an asymmetrical effect on the true probability. It is possible to compute a true probability (again, assuming a known and constant value of p), but it is a much more convoluted process, so if we can show that these estimates are reasonable approximations, their ease of use can save us a lot of trouble.

The site Ask a Mathematician gives one approach for the exact calculation. It’s a lot to follow, but the general idea is that you can start by calculating the simplest parts of the solution and then keep recursively plugging those values back into the formula until you eventually work out the whole thing. In the end, it collapses to something reasonably compact using summation notation, which doesn’t really make it any less arduous to calculate manually, but it does allow us to program the solution into a computer so we can check our formula values. 

The following graph shows how well the formula estimates the exact calculation for a variety of values of p (the different blue lines) and k:

The formula tends to slightly underestimate the probability of a streak, especially on streaks that are pretty likely but not guaranteed to occur (the error generally peaks for streaks that have about an 80% chance of occurring). For example, our formula estimates that a .500 team would have about a 71.2% chance of putting together a win streak of at least six games at some point in the season, whereas the exact calculation gives 72.8%. Once the probability of a streak drops below about 30%, though, this difference becomes negligible, so for long streaks, the formula gives a fairly precise estimate.  For example, if we assume Cleveland was a true-talent .600 team, our formula gives a 0.074997% chance of a 22-game winning streak, while the exact calculation gives 0.075004%.

Simulations

So far, we’ve shown that our formula works pretty well for estimating the exact probability of a streak given a constant probability of winning each game. If we took a weighted coin that comes up heads 60% of the time and flip it 162 times, there’s a 0.075% chance of seeing a streak of 22 heads at some point. What we really care about, though, is whether that is also true of a baseball team that we expect to win 60% of its games.

Our earlier coin-flipping example showed that coins can give reasonable estimates for baseball teams as long as the overall probabilities are similar, but we also know that having a collection of teams that averages .500 is not the same as each team being .500. It’s possible, then, that one team that averages .600 works differently from a team that is always .600. A team’s chances of winning will fluctuate depending on whom they’re facing, who is pitching, where the game is, etc., so to see if our formulas still work under these conditions, we can compare them to simulations that mimic these changing probabilities.

The most obvious factor that changes from game to game is the starting pitcher. Let’s consider the Cleveland team that won 22 in a row. If we stick with our estimate that Cleveland was a true-talent .600 team overall, that could range from something like a .660 team when Corey Kluber is on the mound to a .540 team with Josh Tomlin pitching.  We’ll simulate this using a five-man rotation where each pitcher adds or subtracts a set amount to the team’s true-talent W% using the following values:

Simulated Rotation Values
#1 +.060
#2 +.030
#3 .000
#4 -.030
#5 -.060

After simulating one million seasons for a true-talent .600 team that ranges from .540 to .660 depending on the starting pitcher, 0.071% of them had a streak of at least 22 wins. Our formula gives 0.075%.

We can also add whether the game is at home or on the road to our sim.  Home teams win about 54% of the time, so for each home game we’ll add .040 to the team’s true-talent W%, and for each away game we’ll subtract .040. To approximate MLB’s schedule, we’ll randomly arrange 27 three-game home series and 27 three-game road series for each simulated season in chunks of six series at a time (to avoid things like the occasional 50-game home stand you get from randomizing the whole schedule at once). Repeating the sim with home-field advantage included, we end up with 0.075% of seasons including a streak of at least 22 wins.

Finally, we can add varying opponents to the sim. To keep things simple, we’ll have our team face eighteen different opponents in three series each, and each opponent adds or subtracts between -.085 and .085 in .010 intervals to our team’s expected W%. This isn’t exactly how MLB schedules work, but using numbers that divide evenly it makes it easier to ensure our team’s overall true-talent W% stays at exactly .600, and since the point is just to see what happens when we add variance to a team’s day-to-day win probabilities, it shouldn’t make much difference for our purposes.

In the final sim, our team’s expected W% ranges anywhere from .415 (when they have their fifth starter on the road against a strong opponent) up to .785 (ace pitching at home against a weak opponent). This time, the sim gives us a 0.078% chance of seeing a 22-game winning streak.

Even with significant day-to-day variations in expected winning percentage, our simple formula using the overall true-talent W% still works pretty well, at least for long streaks. Testing it out with some other combinations of team talent level and streak lengths, we can see that the formula tends to slightly underestimate simulated streaks under these conditions, but unless you need a high level of precision, both our formulas continue to do pretty well:

Probability of Streak
Overall W% Streak Length Sim Formula
.600 22 0.078% 0.075%
.600 15 2.90% 2.77%
.600 10 32.7% 31.3%
.500 15 0.24% 0.23%
.500 10 7.63% 7.25%
.500 5 93.9% 92.0%
Number of Streaks per 100 Team-Seasons
Overall W% Streak Length Sim Formula
.600 22 0.078 0.075
.600 15 2.94 2.81
.600 10 38.5 37.4
.500 15 0.24 0.23
.500 10 7.89 7.52
.500 5 2.53 2.48

It’s worth noting that one could possibly set up our sim to give results almost identical to the formula by tweaking the parameters — for example, simply increasing the spread of talent in the starting rotation by 50% puts the simulated probabilities slightly below what the formula gives. (As a rule of thumb, if favorable conditions have a tendency to clump together, like games in a homestand or a string of weak opponents in the schedule, it increases the likelihood of a streak, and if they don’t, like your starting rotation cycling through each pitcher in order, it decreases the likelihood of a streak.)

This means the formula isn’t necessarily less precise than a simulation if you don’t have precise parameters to feed into the sim.  Simulations have the potential to handle more detailed information and can be customized to a specific situation, but that also means that without that detailed information, they don’t necessarily offer many advantages.

Hitting Streaks

While we’ve only talked about winning streaks, the same math can apply to any kind of streak. Say, for example, we want to know the probability of Mike Trout having a 56-game hitting streak this season.

The probability of hitting safely in a game depends on how many at-bats a hitter gets, so we need to find the probability for each number of at-bats separately and then take a weighted average to find an overall probability. Steamer projected Trout to hit .302 coming into the year, so if Trout gets one AB, the probability of hitting safely is 30.2%. For two ABs, it’s 1-(1-.302)2, or about 51.3%. Continuing, we get:

Probability of Trout Hitting Safely
ABs Probability of Hit % of Trout’s games started
1 30.2% 1.2%
2 51.3% 8.3%
3 66.0% 30.2%
4 76.3% 45.1%
5 83.4% 13.4%
6 88.4% 1.5%
7 91.9% 0.1%
8 94.4% 0.2%
Overall 71.7% 100%

Based on Steamer’s projection and Trout’s historical AB distribution, we get a 71.7% chance of Trout hitting safely in a randomly selected game. Coming into this season, Trout has hit safely in 71.0% of the game’s he’s started, so that is probably a sensible estimate.

Using p=.717 and k=56, our formula gives about a 0.000025% chance of Trout recording a 56-game hitting streak in a 162-game season, or about one in 3.9 million (for comparison, a true-talent .302 hitter has about a 1 in 780 thousand chance of hitting .400 over 500 ABs). With a probability that low, even simulating a million seasons with a true-talent .302 AVG and Trout’s AB distribution might not show any successes, which is in fact what happened. I repeated the sim for another million seasons and ended up with one successful streak.

Since even a hitter like Trout is such a long shot to match DiMaggio’s streak, let’s look at a hypothetical batter who has a true-talent .350 average and averages about 4 ABs per game rather than Trout’s 3.67, which is probably around the limit of what we could expect to appear in MLB. This hitter would have about a 0.013% chance of a 56-game hitting streak according to our formula, or 0.012% in the sim. If we push even further and give our hypothetical batter a true-talent .400 average, that still only goes up to 0.26% in both the formula and the sim.

Conclusion

Being able to estimate the probability of streaks gives us the context to properly appreciate them.  Knowing just how hard it is to win 22 in a row or hit safely in 56 straight games puts those records in perspective, as does being able to quantify all the streaks that fall short but are nonetheless impressive.  It also gives us a solid baseline to compare to if, for example, we want to test if “streaky” teams or hitters actually end up with more or longer streaks than we’d expect.

The problem is that these calculations get messy quickly, even for a simple process like flipping coins.  When you add in the factors that complicate probability in a sport like baseball, those calculations go from cumbersome to Sisyphean. The typical strategy for dealing with this issue is to run simulations that model those complications, but this takes programming knowledge as well as time and effort to write or customize a sim for different scenarios.

Instead, we can use a relatively compact formula that works pretty well to estimate the probability of streaks, even with the varying probabilities seen in baseball.  The true probability of a streak does depend to some degree on the specifics of a team or hitter’s circumstances, so a simulation is still preferable if you need a high level of precision, but the error in the formula tends to diminish the longer and more unlikely the streak.

References & Resources

Q: What’s the chance of getting a run of K or more successes (heads) in a row in N Bernoulli trials (coin flips)? – Ask a Mathematician

What Happens When Maths Goes Wrong?Matt Parker/The Royal Institute

R code for sims used in this article – Github


Adam Dorhauer grew up a third-generation Cardinals fan in Missouri, and now lives in Ohio. His writing on baseball focuses on the history of the game, as well as statistical concepts as they apply to baseball. Visit his website, 3-D Baseball.
4 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Luke
4 years ago

Really enjoyed this one – thanks Adam!

Jetsy Extrano
4 years ago

Identifying losses as streak-starting opportunities is really elegant.

channelclemente
4 years ago

Great fun to read. I was wondering how the model performs if you use the Pythagorean estimates as the data source for W/L and compare it to actuals.

Born1951
4 years ago

Nicely done, like how you accounted for the different variables involved in streaks.