You Think You Understand Exponential Growth?

You Think You Understand Exponential Growth?

Most of us think we do by now. But even most mathematicians and statisticians seem to overlook the same characteristics that invalidate all projections being made about how bad this pandemic is going to be. Everyone knows by now the impact of the exponential growth of Covid-19 cases on how wide it spreads. But exponential growth is a devil in a few other ways as well. Any numbers you may have heard or read about how many deaths there will be by the time this is over, or when we will see a peak, or how high it will be are almost assuredly dramatically wrong. There may be a couple of lucky guesses. At this time we just do not know which ones are the lucky ones. This article aims to help the reader understand how to judge predictions made, and how to recognize and value overly-optimistic as well as overly-pessimistic predictions. In short: if the prediction is a single number (say 50000 deaths) you can throw it straight in the garbage. Do not make important decisions on it.

Note that this article copies liberally from a discussion following another article by Mark Chockalingam. Disclaimer: this will contain copious self-plagiarism.

Unlike my normal hobby horse, I will NOT make a case here for probabilistic forecasting. That needs enough reliable historical data and a controlled response to estimated parameters to function properly, and we do not have either in this stage of the pandemic. We are grasping in the dark about pretty much every piece of data we need. The data we have is incomplete, dirty, biased and noisy. Under normal forecasting conditions, this leads to significant inaccuracy. But we are not dealing with normal forecasting conditions currently. We are dealing with exponential growth and forecasting that is orders of magnitude more sensitive to bad data. This is where the applied mathematician and the theoretical mathematician will get into a violent argument (a stare-down of each others shoes).

Exponential growth is not only the cause of society losing control of this thing, it is also the dominant cause of mathematical models losing control. Something akin to the butterfly effect.

An epidemic will go through a couple of phases with different mathematical behavior. First, there is the exponential growth phase. If we consider the number of new cases over time it may look like the curve below. Only the first section of which exhibits exponential growth:

No alt text provided for this image

Figure 1: only the first part of the new case curve shows exponential growth.

During this exponential growth phase, every day brings more new cases than any day before. You can compare this to a car that keeps accelerating. Not only that, the acceleration keeps increasing. The driver keeps pushing down on the gas pedal harder and harder. But at some point, the rate of acceleration does not increase, yet the car is still accelerating. It just stabilized. This is what I call the leveling off point. The car is still going at great speed and keeps going faster and faster since the acceleration is still positive, but from that point onwards the rate of acceleration starts to decrease. The driver is taking his foot a little off the gas. Until at one point the car stops accelerating altogether. This is the peak. The time when the car is going at its maximum speed. From that point onwards the driver starts to push on the break a little and the car starts to slow down. It will still be quite some time before the car reaches lower speeds, but at least it is under control. The trajectory of new pandemic cases follows this same pattern, going through the very same phases.

But what is this curve in figure 1? Is it new infections, new tested cases, new positive tested cases, new hospitalizations, new recoveries, new deaths? Well, it could be any of them. They each will show a similar pattern. And all these patterns will be related in some way. The next graph shows how this may occur:

No alt text provided for this image

Figure 2: comparing the various "new" cases curves.

This set of curves is roughly similar to New York's and Italy's situations. The number of infections doubles roughly every 3 days reaching a maximum of 10,000 new infections (blue curve) per day at its peak. However, the number of new symptomatic cases (yellow curve) is both lower (only 80% get symptoms) and delayed (symptoms start on average a week after the contraction of the virus). The number of positive tests recorded (black curve) is lower and delayed even further because not everyone gets tested and most tests are performed sometime after we suspect it may be Covid-19. How much lower this curve is than actual infected cases is unknown and depends greatly on diligence in testing. In this case, we assume 50% of infected cases are tested at some point. Not everyone who gets infected, nor everyone who gets tested positive will need to be hospitalized. Here, we assume 20% of infected cases will at some point be hospitalized (orange curve) and will show an average delay of 28 from time of infection to hospitalization. Finally, the number of new deaths (red curve) will be roughly 10% of new infections with a further delay of 21 days after hospitalization.

The new cases we can measure underestimate the real cases and any measured slow down does not mean hospitalizations and deaths will slow down soon.

From the above, we share an understanding of which new curves exist and how they are related. We may differ on what we believe the correct growth factors, conversion factors, and delays are, and those constitute the assumptions that are put into every model that aims to predict the future path of the pandemic. But these curves are not the only ones that matter. We also care about totals, such as total deaths, and running totals, such as currently contagious people and concurrent hospitalizations:

No alt text provided for this image

Figure 3: comparing recorded new cases to total deaths and concurrent hospital load.

You will notice the heights of the different curves start to get further apart. The blue and black curves are the same as before, new infections and new positive tests respectively. We care about the blue one, but can only measure the black one. The highest curve is the gray total positive tested cases to date. This is the total cases we see on every news channel. Unfortunately, it is useless, other than for the bean counters. You can tell nothing from this curve that matters and you cannot use it to determine any other meaningful information. The way to calculate this curve is just to keep adding new positive tested cases and never subtract any that recover or die. If we apply the same logic to new deaths we get the total deaths shown with the red curve. This is what we really care about. This is what we need to minimize. It looks like total deaths are 20% of the total cases in this graph. That is a distortion that we all encounter every day in the news.

The problem is that reported total cases is not really total cases. It is only total positive tested cases.

Since this model assumes only 50% of infections get tested, the total infected cases (not shown) would be twice as high as the highest gray curve. And total deaths (when the pandemic is over, i.e. the far right of the graph) is the assumed 10% of total infections. We can see this distortion for example in the Italy data, where at the peak they reported 18% fatality but official estimates are at 10.8%. The 18% is of known cases. The 10.8% is of the estimated total cases that were infected but no longer (either died or recovered).

The other extremely important curve is the dark green shaded area that represents the number of people simultaneously being hospitalized. This is the curve that needs to be flattened when we talk about flattening the curve. This curve determines how many people need beds and ventilators at the same time. These numbers represent how many are contending for the same limited resources. As Wuhan, Italy and Spain have clearly shown, once the peak of that curve exceeds the level at which healthcare systems can cope, death rates go through the roof. Those rates were 10 to 15 times greater than normal death rates!

If minimizing the number of deaths is the objective, then flattening this curve is the means.

Two more curves are important to get right. The light green shaded area represents that part of the current active infections that are not hospitalized. This is the number of people out in public potentially infecting other people. To gain control of how fast the virus is spreading we need to control that curve. And the last one - not shown - is total people recovered. This one with the used assumptions would reach twice as high as the highest curve. That curve would not only tell us the portion of the population that will contribute to herd immunity but would also indicate how many people can safely go back to work to restart the economy.

How To Determine the Curves That Matter

So we know the new positive tested cases, total positive tested cases to date and total deaths to date (to some degree). None of these are the critical ones we need. We still need to determine the ones we care about through some translations from the curves we have and a whole bunch of assumptions for which we can only get bad data. This is where it gets hairy. The graph below highlights some of the translation issues:

No alt text provided for this image

Figure 4: It is not easy to translate new cases to real infections or to hospital load.

If this graph conveys anything it should be that this is complicated. The black curve is the one we know (or at least the part that has already happened). The exponential growth that matters has already occurred by the time we can measure it. We are chasing the facts, weeks after they have occurred. The peak of simultaneous hospitalizations, which is what we really care about is delayed. By the time our measured new cases peak, the number of hospitalizations will continue to grow for quite a while longer. We may very well still be in the exponential growth phase of that curve with no way of knowing for sure. And the difference in height of the curves also matters. The green curve is more than 6 times higher than the black curve. This means any error we make in estimating the black curve will become 6 times greater in the curve we care about. This factor of 6 is a result of the assumptions that went into this model. It is unknown how large it will be in practice.

But it gets worse. To determine a running tally of active hospitalizations we cannot just compare it directly to the black curve. The current hospital load is the sum of all people that were newly hospitalized minus the ones that were released and the ones that died. The number of deaths can be estimated and shows as the red curve in figure 3. Naturally, the accuracy of this is highly dependent on a proper estimate of the death rate, the testing rate, and the hospitalization rate, none of which are known. With an assumed death rate, we can also assume the total number of patients released and we have all the pieces to calculate hospital load. All these pieces are highly uncertain and at least once transformed from the one curve we can measure.

Let it be clear that every model is based on large amount of assumptions, all with high uncertainty.

It does not matter if it is the CDC model, the WHO model, some university model or a hobbyist's model. They are all based on big assumptions about unknowable information, some more informed than others.

What Kind Of Error Are We Making?

So, now we know where the exponential growth occurs and which of all the possible curves are important and how they are related. But up to this point, all assumptions have linear error implications. Now let's talk about exponential errors.

Whilst we are in the exponential growth period of a pandemic it is impossible to predict the time when we reach the peak. For New York, we can venture a guess since they seem to be leveling off. With growth rates being a guess, it is also impossible to determine the height of the peak. The two factors that dominate any projection are impossible to know at this time. Everything else is minuscule compared to these. Yes, that includes under-testing and underreporting. Anything that leads to linear error is dwarfed by the exponential error.

Death rates have a linear effect. Case growth rates have an exponential effect.

Getting death rates right will lead to a difference between 10K deaths or 100K deaths projected. Getting growth rates right will lead to a difference between 10K deaths or 10 Million deaths projected. Whilst death rates are what everyone seems to give most thought to in their models, growth rates are what we should be focusing on. If you get the growth rate wrong, you cannot fix it by getting the death rate right.

There is one big exception to the otherwise linear effect of death rates. Death rates are vastly different when critical cases can get proper treatment compared to when they cannot. When healthcare systems can cope it seems death rates are around 2%. When they lose control (lack of ICU beds, ventilators, etc) death rates seem to hover around 20%. That is a factor of roughly 10. One example everyone is pretty familiar with is Italy, where the death rate spiked to over 18% during the period where they could not cope. The linear effect in such cases becomes a quadratic effect. Not quite exponential, but still very bad.

So, keeping the peak of the number of simultaneous cases below healthcare ability to cope has the potential to reduce the total number of deaths by a factor of an order of magnitude 10. This is assuming no change in the number of cases. This is why "flattening the curve" is so very important.

The accuracy of any model hinges on whether we can stay below that coping threshold or not.

If the model assumes lockdowns stay in place until it is permanently under control (due to a vaccine developed and/or herd immunity) then those lockdowns better stay in place or the model can be dumped.

How Can We Estimate The Continuation Of Exponential Curves?

The problem with determining the time of the peak has to do with both the exponential behavior and the dirtiness of the data. A peak can only occur sometime after the exponential behavior has ended. The number of new cases will keep increasing, then level off, and finally start to decrease. When it levels off, we have broken the exponential growth stage and now are in linear growth. It will still be sometime before that leveling of has also occurred for hospitalizations. When it hits that linear growth it may still be higher than what healthcare systems can cope with if it is larger than the number of people recovering or dying at the same time. But mathematically we can start to attempt to make guesstimates of when the peak will be. This happens when the number of new cases matches the sum of number of new recoveries and new deaths. Once the number of new cases is lower than that sum we are past the peak.

One problem with determining when we hit the leveling off point is that it is gradual and needs very smooth data to determine. If the data is complete, pure, unbiased, and smooth we can determine a second derivative and predict exactly when that leveling off point will be reached (when 2nd derivative is zero) and more so determine when the peak will be reached (first derivative is zero). But the data is none of those. It is incomplete, dirty, biased and noisy.

If today's number of new cases is a lot lower than yesterday's, did we hit the peak? Likely not. If we get two days in a row of lower new cases did we hit the peak? Again, likely not. But maybe, very maybe, we hit the leveling off point. Too soon to tell. And certainly too soon to predict when the peak will be. And if this pattern continues we will start feeling more confident. But while new case data is still showing growth, determining the leveling off point is much more difficult still.

We are looking for a minuscule change in the growth rate itself based on noisy dirty biased data.

It really can only be determined after the fact. This is one of those situations where practice is different from theory even though in theory they are the same.

No alt text provided for this image

Figure 5: many valid projections are possible based on this noisy US new case data.

In this graph, the big question is what happened with those last 2 recorded new case levels. We have a few sequential decreases in new cases and may start feeling more confident. But what caused the decrease? That is the critical question to answer before we can determine if we can use it to predict a leveling off and a peak. Did they run out of tests in some places? Did testing go down even while there was capacity? Was there a change in accounting for cases? Did one epicenter get past its peak, but are other ones still expected to accelerate? Without knowing that, any of the 4 projections in figure 5 are equally valid.

You need to do all kinds of sanity checks before you can mark the model as having broken the exponential phase. Like proper journalism, you need to find alternate sources. For New York, the number of hospitalizations has also reduced. That is important, measurable, verifiable supporting data. For New York, this makes it plausible that they hit the leveling off point. And with data of patients being dismissed can provide a reasonably likely timing of when it will hit its peak. New York being the largest epicenter of the virus also means its projection will have a measurable impact on the total projection for the country in figure 5.

But what about all the other states? All but Colorado still have exponential parts of the state. And we can be hopeful that some states that have not seen a spike in cases may avert it altogether.

However, it is lunacy to assume we can predict the timing of the peak of the rest.

Some Control Over The Exponential Behavior

All is not lost. There are two dampening factors we could use on a model. One is the total population, the other is that when it gets hotter towards summer it will slow down. The first is the most reliable piece of information we have. The second is wishful thinking.

Many hope, as do I, that when it gets hotter infection rates will slow down. However, countries in the (sub)tropics and in the southern hemisphere (where summer just ended) are also seeing a lot of Covid-19 cases. There are too many unknowns to make a clear correlation between cases in the south versus north and to determine the impact temperature may have.

It may be worth doing an exercise to see how local temperatures correlate to local cases, probably with a lag of around 14 days.

It would be extremely useful information for planning how to tackle the upcoming months if there were more clarity on whether the virus will subside. If anyone knows of such research, please comment. Or is anyone interested to take this on?

The total population will tell us the total number of people that can get infected. But also, once herd immunity occurs it will tell us when a natural peak would occur if we stop isolation at some point. The going assumption currently is that when 60% of the living population has recovered from the virus a meaningful herd immunization effect will be achieved and the infection rate will go down.

The Big Catch-22

Last but not least, by far the biggest factor impacting all the rates is what governments and societies do. Whether lockdowns stay in place for a week, a month, a whole summer. Whether enough people follow guidelines and stay home when told to do so. This makes it a chicken-and-egg problem. The models make assumptions based on behavior and behavior changes based on what the models say.

The relatively low number of deaths currently is in large part due to draconian measures. If you compare that to the seasonal flu, we are now approaching its typical mortality across a whole season, with no lockdown. The flu season starts in late September and ends in May, a good 8 months, with its peak in months 4 and 5. We are only a good 3 months into this pandemic.

Can anyone really fathom how bad it would have been if we had not acted as drastically as we did, when we did.

Millions of deaths in the US alone would certainly be realistic, with both a much higher infection rate and death rate.

All this to say, we need to run scenarios with various assumptions along multiple dimensions. What-iffing on steroids with every assumption sanity-checked a dozen ways. We need to verify the chance they would occur. And be very open about which assumptions lead to which predictions. If you reach similar conclusions when coming at it from very different angles that will give confidence that results are plausible. If you cannot get different approaches to agree, there is a problem. If you use only a single approach you have no clue as to the validity of the projections.

Conclusion

Summarizing, the key factors:

  1. Exponential growth assumptions dwarf every other assumption we could possibly make.
  2. What society does is the biggest impact on those exponential rates
  3. Death rates get dramatically higher when healthcare systems cannot cope. Flattening the curve to stay below that coping level is critical.
  4. Before we hit a slowing point of new cases it is impossible to predict when the peak will happen.
  5. Together with the uncertainty of the rate of exponential growth it also makes it impossible to predict the height of the peak.
  6. All the data is bad. any change can be just a blip. Do not assume new trends.
  7. Scenario planning is a minimum requirement due to all the unknown and uncontrollable factors.
  8. Running a simulation across a single set of assumptions is meaningless. Running simulations for each scenario is better than just deterministic scenario planning.
  9. No predictions are remotely close to certain. List assumptions when making them.

We know everyone is really eager to restart the economy. And we have witnessed how governments and groups of people around the world have demonstrated a total ignorance of both the gravity of the situation and the uncontrollable nature of exponential growth behavior. This leads me to believe that there are many realistic scenarios where world leaders and local leaders will ease off too early, and growth will increase again, and in some places will get completely out of control again. We'll exceed healthcare coping ability in those places and the number of deaths there will skyrocket as a result. Let us pray sane minds will prevail...

Stefan de Kok

? Supply Chain Innovator ?

4 年

Here's another very interesting perspective, reproductive values per US state (Rt, which is like R0 but also includes transmission time): https://rt.live/

回复
Stefan de Kok

? Supply Chain Innovator ?

4 年

HIGHLY RECOMMENDED: here is really insightful video explaining a way to identify if a region has bucked the exponential trend (https://www.youtube.com/watch?v=54XLXg4fYsc) And here is a chart that uses this visualization for the US states:

  • 该图片无替代文字
回复
Stefan de Kok

? Supply Chain Innovator ?

4 年

And a very short follow-up, explaining why the death rates reported in the media are all wrong: https://www.dhirubhai.net/pulse/we-using-wrong-death-rates-stefan-de-kok/ Exponential growth strikes again!

回复
Rob Garrett

Service-Driven Supply Chain | ToolsGroup

4 年

I just learned more in 15 minutes reading this than a month of media coverage.

要查看或添加评论,请登录

Stefan de Kok的更多文章

  • The Futility of Mapping Forecast Error to Business Value

    The Futility of Mapping Forecast Error to Business Value

    A number of years ago my mom asked me if I held it against her that she beat me so much as a child. My honest answer:…

    18 条评论
  • Safety Stock vs Inventory Optimization

    Safety Stock vs Inventory Optimization

    Many companies go about setting safety stock levels the wrong way. Often this is driven by a misunderstanding by upper…

    24 条评论
  • You Think You Understand Safety Stock?

    You Think You Understand Safety Stock?

    There are a lot of misunderstandings about safety stock. Most people will quibble about what formula makes a safety…

    9 条评论
  • Why Not To Use The Normal Distribution

    Why Not To Use The Normal Distribution

    If you use demand variability, forecast error, or lead time variability to determine safety stocks or other buffers…

    39 条评论
  • Your Forecast Is Already Probabilistic

    Your Forecast Is Already Probabilistic

    If you are like most forecasting practitioners, you may believe probabilistic forecasting is extremely difficult to do,…

    10 条评论
  • How to Measure Forecastability

    How to Measure Forecastability

    In recent weeks the same question in the title above has come up repeatedly. Since I like the DRY principle (don't…

    30 条评论
  • Is Your Accuracy Lagging?

    Is Your Accuracy Lagging?

    A problem I often encounter is companies measuring their demand forecast accuracy incorrectly. Most demand planning…

    37 条评论
  • How to Integrate Information

    How to Integrate Information

    Removing Silos in Planning - Part 6 This article explains how to remove information silos in an easy incremental way…

    5 条评论
  • Decision Silos

    Decision Silos

    Removing Silos in Planning - Part 5 In previous parts, we described process silos, data silos, and information silos…

    2 条评论
  • Information Silos

    Information Silos

    Removing Silos in Planning - Part 4 So, you fixed your process silos and data silos. All your processes are aligned in…

    3 条评论

社区洞察

其他会员也浏览了