California COVID-19 Case Rates Just Surpassed New York and New Jersey: Why This Is Troubling, Why It Is Not, and Why Derivatives Matter

California COVID-19 Case Rates Just Surpassed New York and New Jersey: Why This Is Troubling, Why It Is Not, and Why Derivatives Matter

COVID-19 per capita case rates (i.e., cases per million state residents) for California just overtook those of New York and New Jersey for the first time. This convergence highlights not only the current COVID-19 incidence for these states but also the dramatically different courses they have respectively taken over the past three months. New York and New Jersey are examined because they have experienced both the highest case rates (per capita) and case rate accelerations of all states. Yet, despite this criticality, both have continued to bend the curve. But are these bends enough and how do they compare to those of their neighbors?

This analysis examines four states that have decreasing case rates (New York, New Jersey, Illinois, Pennsylvania) and four states that have increasing case rates (California, Texas, Michigan, and Florida), and explores the use of case rate derivatives to support interstate comparison and visualization. It concludes with animation (linked here https://www.dhirubhai.net/posts/troy-hughes-27a998a8_coronavirus-datavisualization-covid19-activity-6675732943920594944-VsLG) that reliably demonstrates the progression of cases and the acceleration or deceleration thereof over time.

Heartfelt thanks to Johns Hopkins University (JHU) for creating and continually updating the COVID-19 GitHub repository (https://github.com/CSSEGISandData/COVID-19) whose raw data comprised the substrate of this analysis. JHU simultaneously maintains the COVID-19 international tracker (https://coronavirus.jhu.edu/map.html), and both resources continue to be instrumental to the researchers, epidemiologists, medical professionals, and others combating the spread of the virus. 

Comparison Requires Equivalent Scale

A concept often lost in COVID-19 graphs (of individual states) is the importance of comparing performance among states. In my last post (https://www.dhirubhai.net/pulse/7-states-log-peak-covid-19-case-rates-weekendwhy-scaling-troy-hughes/), I demonstrated how equivalently scaled axes support interstate comparison—for example, by graphing case rates per million state residents.

The NYTimes “Coronavirus in the U.S.” dashboard (https://www.nytimes.com/interactive/2020/us/coronavirus-us-cases.html) is continually updated and, among other information, displays thumbnails for states whose cases are increasing, decreasing, and remaining stable. For example, on June 7, the dashboard reported that 21 states were increasing, the first four of which are shown here.

No alt text provided for this image

Credit: NYTimes “Coronavirus in the U.S.” dashboard, updated June 7 (8:39 pm ET)

Similarly, the dashboard reported that 19 states were decreasing, the first four of which are shown here.

No alt text provided for this image

Credit: NYTimes “Coronavirus in the U.S.” dashboard, updated June 7 (8:39 pm ET)

The NYTimes thumbnails employ a technique in which each Y axis (for each graph) is scaled individually to maximize the visible area under the curve—a common technique that has both advantages and disadvantages. This figure is recreated below utilizing only the raw JHU data with a superimposed 7-day moving average. From the figure alone, one might misinterpret that both California and Texas have dramatically higher current case rates than New York and New Jersey; however, because each of the below graphs employs a separate Y axis, this visual comparison cannot be made and this conclusion is inaccurate.

No alt text provided for this image

To facilitate visual comparison of the individual graphs, identical axes should be used; this can be accomplished by scaling the Y axes to represent COVID-19 cases per million residents, with population estimates from the 2019 US Census state projections. Not only are the graphs now equivalently displayed, but the textual inclusion of the 7-day moving average for case rate (calculated as the mean of case rates between June 1 and June 7) affords additional non-visual comparison.

No alt text provided for this image

It is now clear that although California has surpassed New York and New Jersey in daily case rate, it is not nearly as significant as the previous figure would have indicated; California has 67 cases per million residents, compared with New York at 53 cases per million and New Jersey at 59 cases per million. Moreover, Texas and New York have equivalent per capita case rates, a stark contrast to the previous figure that had seemed to indicate that Texas had long ago surpassed both New York and New Jersey case rates.

Another benefit of these graphs is their ability to show total COVID-19 cases—the case rate integral, or the area under each curve—in addition to case rates. Thus, in addition to the cumulative number of cases being displayed on each graph, the cumulative number of cases per capita can be evaluated as the area under each curve. For example, although California and Illinois have roughly equivalent total number of cases, at 130,615 and 127,757, respectively, the area under the Illinois curve is significantly larger. This disparity results from California being more than three times more populous than Illinois, and from Illinois having reached (at its COVID-19 peak) a per capita case rate more than three times higher than that of California.

Another method of comparison superimposes state graphs upon one other. The following graph demonstrates the cumulative number of COVID-19 cases per million state residents. Those states currently increasing are displayed in red while those with decreasing case rates are shown in green. Thus, although the downward trending states show marked improvement, their cumulative number of cases per capita typically exceeds that of other states.

No alt text provided for this image

For other purposes, display of the cumulative case count may be less important than that of the daily case rate. The following figure depicts the daily case rates of eight states, with state labels appearing at the respective maximum case rates observed during the pandemic. Note that neither California nor Texas have peaked yet, whereas all other states have; however, a past peak does not prevent a state from regressing, hitting subsequent peaks, and thus evincing much dreaded bimodal or multimodal behavior (which will ultimately infect and kill more individuals).

No alt text provided for this image

From this superimposition of case rates, it is clear that the current daily case rates fall within a relatively narrow margin, as compared to the tremendous range of case rates observed in April when many states were peaking. Case rates for all states are graphed in the following figure. The number of states huddled around the right axis is alarming, as these states have recorded peak daily case rates in the past few days or week.

No alt text provided for this image

Notwithstanding movement away from the unimaginably high case rate peaks that states such as New York, New Jersey, Massachusetts, Rhode Island, and Louisiana experienced in April, their case deceleration is slowing; these states are on average bending the curve less each day while other states continue to see increasing case rates day after day. Case rate derivatives can add even more context to this narrative.

Graphing Derivatives

The first derivative of COVID-19 case rates represents the acceleration or deceleration of case rates; in other words, the derivative will be positive when daily case rates are increasing, negative when case rates are decreasing, and zero when case rates are stable. Thus, in an assessment of which states have increasing case rates and which have decreasing case rates (such as the matrix of thumbnails differentiated in the NYTimes dashboard), the derivative mathematically facilitates this evaluation.

The following graph demonstrates New York daily case rates (shown in blue and silver), the 7-day moving average thereof (shown in red), and the first derivative of the 7-day moving average (shown in white). Note that although New York was initially bending the curve greatly, its progress has slowed over time; this may be due to a number of factors, such as reduced social distancing (aka quarantine fatigue), increased testing, increased availability of testing to less symptomatic patients, or other factors not explored in this analysis.

No alt text provided for this image

Whereas a case rate describes COVID-19 incidence at a point in time, its derivative describes the directionality of that rate—whether it is trending up or down. For this reason, it is important to assess both the rate and its derivative; however, when viewed longitudinally (as above), it may be difficult to compare both case rates and derivatives simultaneously among states. For example, consider the convoluted preceding figure in which case rates for all states are graphed throughout the pandemic; trend lines become a jumble as they begin to converge on the right axis.

This limitation can be overcome by plotting a case rate against its own first derivative, which enables a single point to represent both COVID-19 speed and acceleration. The following scatterplot graphs daily case rates against case rate derivatives, with the four states with increasing case rates (i.e., positive derivatives) shown in red and decreasing case rates (i.e., negative derivatives) shown in green. All remaining states appear gray.

No alt text provided for this image

Note that both axes are scaled to represent per capita case rates (per million state residents) so that all points can be equivalently compared. Although only the four designated "increasing" states are shown in red, all states that fall to the right of 0 have positive derivatives and thus have case rates that are increasing. For example, Arizona cases are currently increasing at the highest rate and Virginia rates are decreasing at the highest rate. However, although the states are headed in opposite directions, their daily per capita case rates are fairly equivalent at the moment.

When state case rates do stabilize, it is critical to understand whether they have stabilized at a low, acceptable rate, or a high, unacceptable rate. For example, Iowa and Maine both have derivatives near zero, representing more or less stable rates on this date; however, Iowa has three times more cases per capita at the moment, indicating that these states are in very different places in terms of their recovery.

Scatterplots such as this show case rates and their derivatives, but in doing so, omit all historical trends. However, snail trails can be added to show the full extent of state case rates. For example, the following scatterplot demonstrates the June 1 case rate (7-day moving average) graphed against the case rate derivative, with New York shown in blue and New Jersey in yellow. Note that California and Texas cases, despite having steadily increased throughout the past three months, remain relatively centered among other states.

No alt text provided for this image

For reference, these points are identical to the previous graph; however, the extent of both the X and Y axes have been expanded to demonstrate the tragic paths that New York and New Jersey have taken throughout the pandemic. As the two lines move to the rightmost extreme, these points indicate the days on which the state case rates were accelerating most rapidly. As the two lines move to the topmost extreme, these points represent the days on which state daily case rates were the highest. In showing the full extent of maximum case rates and maximum derivatives, this scatterplot adds the perspective of showing how much worse off New York and New Jersey have been as compared with the current rates and current acceleration of cases in other states.

From this broader perspective, it is clear that although Arizona is experiencing both a relatively high case rate and rate of case acceleration (i.e., relative to other states on this date), Arizona nevertheless is not yet near the COVID-19 rates or acceleration experienced by New York and New Jersey as they passed through their most critical periods of the pandemic.

Still, this graph demonstrates only a single point in time for those states not mapped with snail trail lines. In cases for which additional trending is required, these data can be added and visualized through animation. An example is published here (https://www.dhirubhai.net/posts/troy-hughes-27a998a8_coronavirus-datavisualization-covid19-activity-6675732943920594944-VsLG), which graphs New York and New Jersey lines with other states.

NYTimes Dashboard Revisited

For some purposes, knowing only whether a state's COVID-19 case rate is increasing, decreasing, or staying the same is sufficient; oftentimes, however, more detailed information is required. How quickly is the state's case rate increasing or decreasing? If stable, is the state's case rate at an acceptably low level? And how do these dimensions of pandemic recovery compare among states? Understanding and implementing derivatives can present a more contextual recovery (or disaster) story, and visualization of these metrics can be as straightforward as static or animated scatterplots.

Data Sources and Analysis

All COVID-19 data downloaded from the Johns Hopkins University (JHU) COVID-19 GitHub repository (https://github.com/CSSEGISandData/COVID-19) on 5-31-2020 and reflect data current as of 5-31-2020.

State-level population statistics relied on US Census 2019 estimates (https://www2.census.gov/programs-surveys/popest/datasets/2010-2019/state/detail/)

All data download, cleaning, aggregation, transformation, and visualization performed in Python 3.7 through automated scripts that generate results and 4K-resolution data products nightly when the JHU GitHub repository is updated. These scripts scale to produce thousands of graphs, maps, and videos daily, of which a handful were selected for this analysis.

要查看或添加评论,请登录

Troy Martin Hughes的更多文章

社区洞察

其他会员也浏览了