7 States Log Peak COVID-19 Case Rates This Weekend—Why Scaling These Peaks Matters
As states continued to implement their phased reopening strategies this weekend, seven states unfortunately marked the end of May with another grim milestone. Between May 29 and 31, California, Texas, Arizona, Wisconsin, Mississippi, South Carolina, and Utah each logged their highest-ever COVID-19 case rates, according to Johns Hopkins University (JHU) data. Graphs visualize and convey this critical information, but scale must be considered, and inconsistent scale—like that brandished above—can spur false conclusions.
Heartfelt thanks to JHU for creating and continually updating the COVID-19 GitHub repository (https://github.com/CSSEGISandData/COVID-19) whose raw data comprised the substrate of this analysis. JHU simultaneously maintains the COVID-19 international tracker (https://coronavirus.jhu.edu/map.html), and both resources continue to be instrumental to the researchers, epidemiologists, medical professionals, and others combating the spread of the virus.
New York Times “Coronavirus in the U.S.” Dashboard
In addition to the JHU repository, the NYTimes’ freely available COVID-19 dashboard (https://www.nytimes.com/interactive/2020/us/coronavirus-us-cases.html) has been a lifeline for researchers and others seeking insight into the pace and spread of the virus, and has underpinned hundreds of excellent, data-driven articles published on the pandemic. One of the dashboard's first sections broadly categorizes states into those whose COVID-19 daily case rates are increasing, decreasing, or remaining stable. The following figure demonstrates the 16 states that the NYTimes described as increasing on June 1.
Credit: NYTimes, updated 0200 E.T. June 1, 2020
The graphs are useful because the dark-red 7-day moving averages (overlaying the lighter bar graphs of raw cases) immediately indicate rising case rates. States such as California, Virginia, North Carolina, and Mississippi unmistakably show continuously increasing case rates throughout the pandemic, whereas Vermont, Montana, and Alaska illustrate prior peaks from which they have recovered (and hope not to re-summit).
The abscissa (X axis) is both straightforward and consistent among all 16 graphs, representing March 1 through May 30. But below the dashboard, the NYTimes responsibly caveats, "Scales are adjusted for each state to make the curve more readable." And herein lies the issue—the ordinate (Y axis) scaling is inconsistent among graphs. This favorably facilitates each graph filling its thumbnail more fully, yet confounds readers who naturally seek to compare the heights or slopes of the graphs to each other.
For example, inspection of the curves alone might lead one to believe that California and Mississippi are faring equivalently. Yet, the cumulative case counts reveal California and Mississippi having 113,114 cases and 15,501 cases, respectively. So are they increasing similarly...or not? Consistent scaling can provide more context, as well as comparison among individual graphs.
Scaling Context into Graphs
The following graphs recreate the NYTimes figure, with states ordered (from California in the upper left to Alaska in the lower right) by the number of cumulative COVID-19 cases, which is displayed below each state name. NYTimes and JHU case numbers differ slightly, although negligibly so for this and most analyses.
Because readers are naturally drawn to compare a matrix of graphs, consistently scaled X and Y axes can help mitigate both misrepresentation and misconstruction. This consistency requires an axis extensive enough for all values (from all graphs) to fit. In other words, California's high case count cannot scale to West Virginia's graph, so West Virginia's curves must scale to California's axis.
The following figure now consistently scales all graphs by case count, without having modified the underlying data. Note that California now proportionally dwarfs all other states, accurately representing that it has seven times more cumulative COVID-19 cases than Mississippi. Yet, despite this proportionality, the figure nevertheless lacks utility because several states cannot even be viewed; Alaska could quadruple its case rate and this massive increase might be overlooked.
Utility can often be improved through population scaling, in which per capita case totals and case rates are graphed in lieu of raw values. For example, in previously comparing California's trends to Mississippi, California's status as the most populous state was not considered. Would this population be sufficiently high to offset its commensurately high COVID-19 rates? Per capita case rates can support this and other questions while facilitating comparison of graphs on shared axes.
The updated figure demonstrates that when California's nearly 40 million residents are accounted for, its cumulative per capita COVID-19 cases (per 100,000 residents) is 283—still much higher than more rural Alaska, Montana, and West Virginia, but lower than Virginia, Mississippi, Wisconsin, Utah, and Alabama. Moreover, because the Y axes are identical among the following graphs, you can compare current per capita case rates by evaluating the most recent Y values of the 7-day moving average trendlines.
7 States Log Peak COVID-19 Case Rates Revisited
With this brief introduction, it's apparent that for some purposes, per capita curves (that are consistently scaled) can be more useful than graphs having independent axes. Especially where multiple graphs are matrixed together, comparison among states should be expected, so visualization should support this pursuit. The final graph demonstrates per capita COVID-19 cases per 100,000 state residents. For reference, Alaska was added to the "7" given that its May 31 case rate nearly approximated its highest case rate.
Data Sources and Analysis
All COVID-19 data downloaded from the Johns Hopkins University (JHU) COVID-19 GitHub repository (https://github.com/CSSEGISandData/COVID-19) on 5-31-2020 and reflect data current as of 5-31-2020.
State-level population statistics relied on US Census 2019 estimates (https://www2.census.gov/programs-surveys/popest/datasets/2010-2019/state/detail/)
All data download, cleaning, aggregation, transformation, and visualization performed in Python 3.7 through automated scripts that generate results and 4K-resolution data products nightly when the JHU GitHub repository is updated. These scripts scale to produce thousands of graphs and maps daily, of which a handful were selected for this analysis.
Additional 3-Month Case Rate Graphs
Graphs for each of the NYTimes' 16 states (having increasing case rates) are included without comment as a reference. As bar graphs depict raw (uncorrected) case rates, negative values may be shown where a state's cumulative case count was reduced in JHU data from one day to the next.