A Combination of Coronavirus and the Flaw of Averages Can Drive You Nuts

by Sam L. Savage

All infectious disease epidemics start life with exponential growth. For example, suppose that each person infected with a disease in the first month infects one other person by the second month. Then those two people will infect two more, and so on, and the number infected will double with each time period. This exponential growth obviously can’t go on forever because eventually you run out of people, or at least susceptible people. So, in the end, the total number infected over time resembles an S curve as shown in Figure 1 below.

Figure 1: The Number of People Infected Over Time given “Average” assumptions.

Figure 1: The Number of People Infected Over Time given “Average” assumptions.

 
A model like this assumes that you know the initial infection or growth rate. This is referred to as the Reproductive Ratio or R0 (R naught) by epidemiologists and it would have been 2 in the example above. That is, in each time period the total number infected is 2 times the number in the previous period. Of course, R0 can’t be known with certainty, so a statistical estimate is used. This results in both an “Average” value of R0 and a standard error reflecting its uncertainty, the latter of which is usually flushed down the toilet. This leads to a classic case of the Flaw of Averages, also known as Jensen’s Inequality.

When someone brought this problem to my attention during the Ebola scare, I built a Monte Carlo simulation, available on our Models page, which reflects the uncertainty in R0 as shown in Figure 2.
Figure 2 - Simulated Disease Trajectories Given the Uncertainty in R0

Figure 2 - Simulated Disease Trajectories Given the Uncertainty in R0

 
The uncertainty in a parameter in a model such as R0 is often referred to as Model Risk. That is, each of the paths above is a potential model of the epidemic, like parallel universes in a Rick and Morty cartoon. The risk is that we don’t know which one is correct, so it makes sense to pick the “Average” of all 1,000 paths, which The Flaw of Averages tells us will be different from the path associated with the “Average” R0 in Figure 1. When I ran the simulation, I was surprised to see that the average path is systematically different from the path of the “Average” R0, regardless of disease, as shown in Figure 3.
Figure 3 – Path of the Average R0 vs Average Path

Figure 3 – Path of the Average R0 vs Average Path

 

For all infectious diseases, the flawed path associated with the average growth rate systematically underestimates the severity early in the epidemic (before month 12 in the above example) and overestimates the severity later in the epidemic (after month 12). For this example, in month 9, you expected 5% of the population to be infected, but on average you will observe 10%. Then at month 18, you expected 35% but only observed 30%. Although a single case does not win a statistical argument, the Ebola epidemic of 2014-2015 fits the bill perfectly. It started out with fears of “We’re all going to die! We’re all going to die!” and ended up with the development of effective medications, and the realization that “they did not have many cases left to test it on.”

Remember, this says that if you average over all epidemics, you will underestimate the early growth and overestimate the late growth. And although the uncertainty in R0 is only one of many, it seems unwise to leave any systematic error in calculations with this much social impact.

As to the model, like all nonlinear difference equations, this one can go chaotic for some input values. This model was inspired by by James Gleick’s book “Chaos: Making a New Science.” 

For a chaotic time, download the model and set the parameters to the values suggested.

© Copyright, Sam L. Savage 2020