Is there a pattern in COVID cases?
Dr. Manoranjan Pattanayak (Manu)
Economics and Public Policy Practitioner
After first couple of weeks, I have stopped looking at COVID statistics – be it active cases, cured/discharged or the lost lives. It was depressing. It is not a cricket match and lives are not wicket. All these days are the saddest days of my life – I am concerned about my old parents who are with me as well as about my daughter and also equally concerned about all those precious lives that we are losing and the livelihood we had lost.
Besides, what would I do with those running statistics? Can I help govt. in any manner? How?
Since it is quite some time from 24th of March when the lockdown started, I just sat to see the statistics to find a pattern.
I had one question – whether the spread is widespread, have a consistent pattern or it is still occurring in random fashion?
What I just noticed is not Newton’s moment, but it is a matter to take note of it.
I found the State’s having high population density are having more covid cases. What does it mean? It means – given the nature of its spread, densely populated places pose a bigger threat.
Let’s look at the statistics.
I have downloaded the data from Ministry of Health and Family welfare website. The data is updated up to 13th of June 2020. I have considered 32 states/UTs barring J&K, Ladakh, Dadra and Nagar Haveli and Daman and Diu. I have taken the population density data from RBI site which is as per 2011 census. For Telangana and Andhra Pradesh, I did a google search and adjusted the density.
I arranged the data in order after calculating two ranks – rank of states in terms of Total Confirmed Cases and Population density. As you can see in the graph callout (figure 1), the rank correlation varies between 89% to 97% depending upon how many states I consider – when I took 23 States out of 32, the rank correlation was 89% and it increased to 97% when I reduced it to 15 states.
All the rank correlations are statistically significant where the null hypothesis was that – Population density and Total Confirmed Cases are completely independent. In each case, I found the Prob > |t| = 0.0000. I have calculated the simple spearman rank correlation.
Someone would argue that the small sample size does not make it amenable for a rigorous statistical test. I agree to that. Maybe one can test it at sub-State level with increased sample size or can use special tests that are designed to take care of small sample size. My focus was to see a pattern and correlation hence I didn’t venture into a full-fledged statistical exercise.
Why didn’t I consider all 32 states in the rank correlation?
Because there are states in both end – States with high density but low COVID cases as well as States with high covid cases but low in density. You need to take care of those extreme cases to find an average relationship.
Let’s look at those States. In case of Rajasthan (in Figure 2), its rank in population density is 22 but in terms of COVID cases, its rank is 6 amongst 32 states. Therefore, the resulting difference is 22-6=16. These are the States which shows the spread of disease higher than their density rank. I have not shown here all such states like Odisha, Uttarakhand, Himachal Pradesh etc which exhibit similar pattern though of lower magnitude (see figure 4 for full picture).
Figure 3 shows the States where their COVID rank is relatively better in comparison to their population density rank. For example – in case of Puducherry, though in terms of population density, its rank is 3, it’s COVID case rank is 26 out of 32 States/UTs. Therefore, the difference is 3-26=-23. The States that are shown here in this figure 3 are those states whose COVID rank is better if you look it in comparison to their population density rank. For full picture, please see figure 4.
Therefore, it begs the question – why these States (at least those are the Top – Puducherry, Chandigarh, Kerala, Goa, Jharkhand etc) are showing such a different pattern? Is there anything they have done differently or is it just a matter of time or it is a combination of both?
Now let’s get back to figure 1.
When I have calculated the rank correlation, for 23 states, I have left 4 states from figure 2 (that is Rajasthan, MP, AP, Tamilnadu where the difference between their COVID rank is far higher than their density rank) and from the bottom I did not include States like Puducherry, Chandigarh, Kerala, Goa and Jharkhand (where their COVID rank is better than their density rank). In any statistical analysis, you need to leave aside a few extreme values else you will either underestimate or overestimate the Statistics.
In the subsequent correlation, I continued to leave one State from the top and one state from the bottom on their ordering.
My final states include 15 States which are - Uttarakhand, Himachal Pradesh, Manipur, Telangana, Arunachal Pradesh, Chhattisgarh, Mizoram, Uttar Pradesh, Nagaland, Andaman and Nicobar Islands, Sikkim, Assam, NCT of Delhi, Haryana, West Bengal.
Irrespective of whether I took 15 states or 23 states in the rank correlation, the value of the correlation is quite high – varies from 89% to 97%. We all know that – correlation is not causation. From this analysis, we cannot say – whether high density is causing the spread of the disease. But, what we can say with some confidence is that – there is a pattern between density and spread of COVID leaving aside the outlier states.
Therefore, this analysis has two implications –
- Why some of these states (good or bad) stand out of the group and show a vastly different pattern?
- Since there is an exhibited pattern between density and spread of the disease, what can we do now to contain it?
A better analysis would have been possible with micro level data. Unfortunately, there is not much data available in the public domain. In case we can get hold of village level data or block level data of spread of the disease as well as other covariates, we can confirm these patterns much better.
Note: This analysis I have done out of my own curiosity. It is neither conclusive nor anyone should draw any conclusion out of it. I just wanted to understand if there is a pattern. You should do your own analysis and verify it independently.
____________________
Investment Promotion, Ease of doing Business, Regulatory Affairs and Legal in Power, Mgmt Consulting
4 年Manu, inclusion of odisha in your studies would have helped a lot.
Practitioner turned Academic
4 年Manu - some important factors that I think can better explain are - the state of medical infrastructure, past emergency situation that shows their disaster management capabilities (or at least the opportunity they had to improve on them) and how community level initiatives have done ( a surrogate for a sense of community which is needed to ensure last mile compliance). Typically I have seen villages are much better at containing spread than urban areas
Senior Account Manager
4 年As you said...i believe at sub state or micro level, the same study could really help the government to realise which areas to focus and where more funds are needed to contain the virus...really insightful