The US Just Surpassed Its Previous COVID-19 Peak Case Rate: Why Are We So Bad at This?
On June 25, the US hit yet another grim milestone in our languishing fight against COVID-19, having reached a new 7-day moving average case rate peak of 33,035 cases per day, and having surpassed the previous peak of 31,564 cases per day set on April 7. With continued rhetoric and conjecture about this being a so-called “second wave,” “second peak,” or “resurgence,” decomposition of the national COVID-19 case rate trendline helps demonstrate how individual states (and counties) are contributing to this overall rise in cases. This graph understandably causes panic in some and consternation in others who question why the US inexplicably cannot control its COVID-19 case rate.
No, not all states have increasing COVID-19 case rates, as the more densely populated Northeast has predominantly continued its decreasing trend after an initial bout as the national epicenter of the outbreak. And within those states that do have increasing case rates, many counties are successfully combating COVID-19 at the local level. Despite these individual successes, however, the national trend of increasing case rates both is alarming and was inarguably preventable.
Heartfelt thanks to JHU for creating and continually updating the COVID-19 GitHub repository (https://github.com/CSSEGISandData/COVID-19) whose raw data comprised the substrate of this analysis. JHU simultaneously maintains the COVID-19 international tracker (https://coronavirus.jhu.edu/map.html), and both resources continue to be instrumental to the researchers, epidemiologists, medical professionals, and others combating the spread of the virus.
Case Rate Peaks: Informative Yet Imprecise
The current peak case rate may be numerically similar to the April 7 peak although several distinctions should be noted, all of which contribute to understanding case rates (and death rates) and their respective peaks and troughs within the JHU data sets. Many factors collectively contribute to reported cases “shifting to the right,” especially earlier in the pandemic, in which case reporting is delayed. Notwithstanding these shifts in data, as well as the known limitation that only a subset of actual COVID-19-positive individuals are ever tested or reported, case rates and case rate peaks are still critical to understanding the spread of COVID-19 and in defining data-driven policy-making.
First, raw case rates (i.e., the number of COVID-19-positive and presumed positive cases reported on a single day) have a high degree of variability, some of which can be mitigated by evaluating the 7-day moving average for cases rather than raw case rates. The 7-day moving average is used exclusively throughout this analysis, and is explained more thoroughly in a previous post: https://www.dhirubhai.net/pulse/combating-covid-19-data-case-moving-averages-troy-hughes/?trackingId=wOKB5GJeieS7umh23p93LA%3D%3D. 7-day moving averages eliminate some unwanted variability that results from counties and states that have systemically fewer reported cases on specific days of the week (e.g., weekends) due to reduced staffing or other resources. This variability was greater earlier in the pandemic, as counties and states ramped up their staffing, so the previous April 7 peak demonstrates higher variability in case rate than is currently experienced.
For example, New Mexico had a relatively low case rate on April 7 at only 78 cases per day (7-day moving average), yet the variance of these data was greater due to fractured reporting at the local level, with a negative case rate even erroneously reported on April 9. And although the New Mexico case rate has more than doubled since April 7, the consistency of reporting has increased, as evidenced in part by the reduced variance in reported cases from one day to the next.
Second, JHU data are not updated retroactively, thus a “case” reflects the date on which a positive (or presumed positive) test was reported, which may correspond with neither the date of detection (i.e., positive COVID-19 test) nor the date of onset of symptoms. For this reason, longitudinal graphs that rely on JHU data should be interpreted as charting the reporting of COVID-19 cases rather than the true incidence thereof. This delay is most observable when counties or states have failed to report data to JHU in a timely manner and then “dump” multiple days of data at once. For example, Mississippi recently failed to report COVID-19 cases (and deaths) on four consecutive days, both to its own dashboard (https://msdh.ms.gov/msdhsite/_static/14,0,420.html) and to JHU. Mississippi has retroactively corrected these delayed cases on its dashboard, yet all analyses and visualizations that rely on JHU data will reflect the following omitted dates and June 22 spike (illustrating again why moving averages are essential to COVID-19 data analysis).
Similar delays in case reporting were more common earlier in the pandemic, and contribute to cases (and possibly case rate peaks) shifting to the right, but do still occasionally occur.
Third, as counties and states experienced exponential or high growth of cases early in the pandemic, staff were overrun—not only healthcare providers but also those involved in recording and reporting COVID-19 data. This contributed to a backlog of data processing in many jurisdictions, which again caused a delay in reporting not retroactively corrected by JHU. In some instances, cases were reported at the state level yet lacked county-level attribution. For example, as depicted below, at its peak on April 2, New Jersey reported nearly 5,000 cases in which the county was “Unassigned.” New Jersey did retroactively correct these records by reporting the county for each case, but the graph of “Unassigned” cases helps visualize the bottleneck of cases the state faced during its peak.
Not all states utilize the “Unassigned” nomenclature, as some states simply delay their reporting of cases (when overrun) rather than reporting data initially and correcting those data subsequently. Thus, a state like California (which does not report Unassigned) may have a significant bottleneck of past cases not yet reported, yet 1) there will be no way to identify current bottlenecks by viewing JHU data alone, and 2) there will be no way to identify past bottlenecks because any delays from diagnosis date to reporting date are not reported within the JHU data structure. More information about JHU case and death records in which the county is missing can be found in a previous post: https://www.dhirubhai.net/pulse/county-less-covid-19-how-unassigned-out-state-counties-troy-hughes/?trackingId=kyMwLeJOuP9OlH276Q8IPw%3D%3D.
Fourth, a paucity of testing availability, especially earlier in the pandemic, caused the reporting of cases to be delayed, as individuals struggled to receive COVID-19 tests, and as most jurisdictions limited testing to only those presenting the most severe symptoms. For example, criteria to be tested in hardest hit New York City were more stringent when New York and New Jersey were peaking in March and early April than today. Thus, many of the reported cases on April 7 likely represent individuals who were unable to be tested days and even weeks prior, again contributing to a shift of cases to the right.
Fifth, throughout the pandemic, states have redefined their definitions of COVID-19 “cases," typically to better align with current CDC definitions. In doing so, some states have reported hundreds and even thousands of cases in a single day, which JHU data sets fail to retroactively redistribute across prior dates. These data dumps appear as outliers and may create false peaks because they are not distributed across the preceding weeks or months. For example, on June 5, Michigan began reporting "probable" COVID-19 cases, which caused a tremendous spike of 5,536 cases (https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data). Were these cases to be retroactively distributed, the April 4 peak observed within Michigan could have been shifted, and the June 5 "peak" observed in the following figure likely would not have been observed.
Sixth, public policy and recommendation from the CDC and other agencies has shifted throughout the pandemic, from initial guidance that asymptomatic individuals not be tested (to facilitate the most severe cases to be prioritized for testing) to current guidance that asymptomatic individuals be tested to help diagnose less severe cases, and to identify and prevent community spread. For this reason, cases reported on April 7 were more likely to be severe and more likely to result in hospitalization, ventilation, and death than cases reported today. The decreasing mean age of individuals who test positive, as well as the decreasing death rate, help demonstrate how equivalent case rates (at two points in time) do not necessarily denote equivalent severity of COVID-19 in the nation. The following figure demonstrates that the US death rate has consistently decreased since its peak on April 16.
Note that the tremendous spike in deaths on June 25 can be attributed to New Jersey having begun reporting probable deaths on this date, thus these deaths represent previous deaths that JHU will not retroactively distribute. This outlier is demonstrated when New Jersey deaths are isolated, but without drilling down to state-level (and often county-level) trends, false conclusions can be drawn by evaluating national-level data only in aggregate.
Despite these and other issues that threaten COVID-19 data integrity, especially where more accurate data are now known yet not corrected retroactively within JHU data sets, case rates nevertheless provide invaluable insight into the incidence and persistence of COVID-19 at the national, state, and local level. Moreover, although case rate peaks should be viewed as an approximation—both in date and number of cases—rather than a precise measurement, peaks nevertheless provide further insight into case rate trends. Finally, those counties and states that are now or recently peaking demonstrate inexplicable and inexcusable increases in COVID-19 spread, given the months of preparation, increased health awareness, and public awareness available this late in the pandemic.
States with Increasing Case Rates: Choropleths
Various metrics can be utilized to illustrate the extent to which state case rates are increasing or decreasing. One method examines the number of days for which the case rate 7-day moving average was increasing in the past two-week period. On April 7, for example, the map of the US showed that all states were either stable or increasing.
By comparison, the current map of the US depicts a path toward recovery from COVID-19 throughout the Northeast and North Central regions. However, it also demonstrates that states like Florida, Arizona, Texas, and California have failed to learn from harder hit states and instead have allowed their case rates to continue steadily increasing.
More granular detail is exposed when the case rates of individual counties are evaluated. On April 7, COVID-19 hotspots are most clearly congregated in the densely populated Northeast.
The current county-level choropleth similarly demonstrates a shift in increasing case rates from the Northeast to the South and Southwest.
For example, Florida has had unprecedented increases in case rates over the past month, as indicated by increases in the case rate moving average on 86.7 percent (or 26 of 30) of days. This substantial and consistent increase contributes to Florida’s dark red on the state-level choropleth map as it summarily fails to control the spread of COVID-19.
When viewed at the county level, however, it is clear that at least a few Florida counties are green, which the state-level map fails to capture. Overall, however, the state's county case rates are overwhelmingly increasing.
These choropleths are useful in showing case rate directionality—which counties or states are getting better and which are getting worse—although they neither show the current severity (e.g., per capita cases) of states or counties nor the degree of acceleration of case rates for those regions that are increasing. This information, in addition to identification of states that are peaking, can be distilled from case rate trendlines.
States with Increasing Case Rates: Trendlines
A more precise method to differentiate the April 7 and June 25 case rate peaks instead compares the case rates themselves on some equivalent scale. For example, by displaying the per capita case rates (e.g., cases per million state residents), all states can be compared on equal footing and in one figure. I explain per capita methodology more fully in a previous post: https://www.dhirubhai.net/pulse/7-states-log-peak-covid-19-case-rates-weekendwhy-scaling-troy-hughes/?trackingId=LdkwaAr%2BDtxx8S%2FekQlZ3w%3D%3D.
On April 7, the highest per capita case rates were clustered in the Northeast with the exception of Michigan and Louisiana.
The following figure illustrates the seven states having the highest current case rate 7-day averages, with Arizona faring the worst at nearly double the next highest state, South Carolina. Also note that the states displaying the highest per capita current case rates are in the South and South West.
These two figures comparatively help depict that the US is in fact in its first wave; the states that had the highest per capita case rates on April 7, with the exception of Louisiana, have all tremendously reduced those case rates in the following months. Conversely, those states that now have the highest per capita case rates have not peaked in the past but rather have climbed consistently throughout the pandemic, even as states in the Northeast peaked, stabilized, and began and continued to eradicate the virus.
Also note that on April 7, some states with the highest per capita case rates had already begun decreasing, which contributed to April 7 being the previous high-water mark. Conversely, all of the states that currently have the highest per capita case rates also have accelerating case rates, indicating that the case rates are not only peaking but will continue to do so until federal, state, and local authorities finally take action and residents finally take personal responsibility to protect themselves and others.
Most discouraging about the states that are currently peaking is the fact that these increases (and the respective deaths that result) were largely avoidable. For example, on April 7, Arizona had only 2,870 total cases and 73 total deaths, with a case rate that was "only" increasing 168 persons per day. Arizona now has 63,281 total cases and 1,495 deaths, with a case rate increasing 2,834 persons per day. Arizona had the opportunity to learn from the mistakes of other states that preceded it, yet chose not to.
States That Have Peaked in the Past Week
Another method to evaluate national case rate peaks is to examine those states that are themselves peaking during the national peak. For example, if California peaked in the past week (which it did on June 22 with a 7-day moving average of 4,861 cases), then it most definitely contributed to the national peak that occurred coterminously. On June 22, the most recent day for which a 7-day moving average can be calculated, 15 states had experienced their highest ever case rate 7-day moving average within the preceding seven days.
This cluster of 15 states can also be observed on the far right of the following figure, which maps state abbreviations to the point at which each state reached its maximum case rate 7-day moving average.
States that have peaked in the past week include: Arizona, Arkansas, California, Florida, Georgia, Idaho, Missouri, Mississippi, North Carolina, Nevada, Oklahoma, Oregon, South Carolina, Tennessee, and Texas. Two figures are displayed for each state—the first displaying all previous case rate maximums (i.e., “local maxima,” for fellow calculus enthusiasts) that have been observed, and the second displaying the counties within each state that have the highest per capita case rates.
Local maxima are arbitrarily defined as case rate 7-day moving averages that are higher than all moving averages two weeks prior and two weeks after a maximum. In other words, each local maximum is guaranteed to be the highest case rate moving average in at least a 29-day period, excepting those maxima that occur within the two weeks preceding the current date.
1. Arizona
2. Arkansas
3. California
4. Florida
5. Georgia
6. Idaho
7. Missouri
8. Mississippi
9. North Carolina
10. Nevada
11. Oklahoma
12. Oregon
13. South Carolina
14. Tennessee
15. Texas
Data Sources and Analysis
All COVID-19 data downloaded from the Johns Hopkins University (JHU) COVID-19 GitHub repository (https://github.com/CSSEGISandData/COVID-19) on 6-26-2020 and reflect data current as of 6-25-2020.
State-level population statistics relied on US Census 2019 estimates (https://www2.census.gov/programs-surveys/popest/datasets/2010-2019/state/detail/).
County-level population statistics relied on US Census 2019 estimates (https://www.census.gov/data/tables/time-series/demo/popest/2010s-counties-total.html).
State-level shapefiles were downloaded from the US Department of Transportation (https://data-usdot.opendata.arcgis.com/datasets/states) and rendered in Python using the Matplotlib, Shapely, and Geopandas modules.
County-level shapefiles were downloaded from the US Department of Transportation (https://data-usdot.opendata.arcgis.com/datasets/counties) and rendered in Python using the Matplotlib, Shapely, and Geopandas modules.
All data download, cleaning, aggregation, transformation, and visualization performed in Python 3.7 through automated scripts that generate 4K-resolution data products nightly when the JHU GitHub repository is updated. These scripts scale to produce thousands of graphs, maps, and videos daily, of which a handful were selected for this analysis.
The Graph Guy
4 年Very nicely done graphs and article!
Analytic Lead / Programmer Specializing in Complex Reporting and Visualizations
4 年This is truly disheartening.