COVID 19 – HOW TO AND HOW NOT TO INTERPRET DATA AND STATISTICS
Dr. Senthil Nathan
Higher Education Management Expert | Managing Director at Edu Alliance
Public health experts and economists work with data analysis to understand past trends, make decisions for today and create policies for tomorrow. Covid-19 has created an instinctive awareness for data even among the general public. But how to and how not to use data, statistics and probability in decision and policy making?
As a long time of practitioner of data driven decision making, I have always kept in view the very first words of my professor and thesis advisor Prof. Loren D Lutes at Rice University, Houston in my first class of his famous graduate course on Probability, Statistics and Decision for Engineers: “Probability and Statistics help us quantify our ignorance”.
The disciplines of data analysis, probability and statistics have a deep mathematical underpinning; mastery of these areas require mastery of a wide range of topics in mathematics. It is often tempting for an expert working in areas such as public health, economics, weather forecast and the like to get too carried away with intricate math and elaborate data analysis but miss the forest for the trees in the process.
It is absolutely essential for analysts and statisticians to develop and apply a deep understanding and appreciation of the subject matter under analysis – limitations, assumptions, common-sense observations, vagaries and unusual and specific situations surrounding the collection of data – in order to appropriately compile, analyze and interpret the data for relevant decision and policy making.
Current Covid-19 data set - that is updated daily - may be used to illustrate Prof Lutes’ assertion about quantification of our ignorance and some of the fallacies that may arise out of simplistic interpretation of data sets and statistics. Data from worldometers.info {a} as of April 11, 2020 GMT 16:00 is used in the illustrations below.
Number of cases / new cases: This is not even a laterally (from day to day) comparable statistics even within a country or a region within a country, leave alone comparing the stats between countries – as these stats depend highly on the number of tests done on the preceding few days. For example, India – the second most populous nation on earth – only has a total of 8,000 cases and 875 new cases in the past 24 hours. As compared to many other nations, on the first look – this may look like a highly intriguing but encouraging data for Indians. Even though experts are looking at the rate of growth, days taken to double the number of cases and the like – the real challenge in comparing these numbers for India and for many other countries in Asia, Africa and Latin America is in the number of tests being administered overall.
Number of tests per million: this stat is also attracting significant attention among government leaders and their critics. Number of tests per million population for the most populous countries as given below clearly speak for themselves in anomalies:
Number of confirmed COVID cases reported in these countries seems to be related to this number (inordinately low tests clearly show very low number of cases per million).
In comparison, the United Arab Emirates (UAE) has tested 59,967 per million – one of the best in the world – yet only 378 per million cases have been confirmed in the UAE.
Insights into workings of government machinery and transparency of information are essential for appropriate comparisons between countries. For example, total number of cases in Russia is given as 93 per million; in China as 57 per million; in the USA as 1,529 per million. Having a good understanding of the socio-political systems in the respective countries would help the experts appreciate the validity or otherwise of these stats. It may be more meaningful to compare COVID-19 statistics from open, transparent and democratic nations in Asia, Europe and Americas.
Incompetence and/or lack of resources may explain the reason for low number of tests in some of the other highly populous nations. Accurate reporting of illnesses and deaths due to COVID-19 should also be a major concern to WHO and similar organizations.
Ratio of number of deaths & number of cases to number of tests: Almost 20% of the tests in the USA have resulted in positive cases. In Italy it is 16%; Spain 46%; France 37%; UK 24%. Germany 9%; and South Korea 2%.
All these countries have comparable transparent systems – number of total cases are the highest in the world. So why are these ratios of number of cases to number of tests widely different? Insights from the front line practitioners – as to the practical policies on administering tests – would be important to interpret and appropriately compare such stats. For example, in the USA, in most of the states only those showing strong symptoms are administered these tests. In Germany and South Korea, these standards for administering tests may be very different.
Number of deaths out of the total cases is another ratio that has attracted attention from the public, media and the governments. It is 12.8% for Italy; 12.5% UK; 10.5% France; 10% Spain; and 3.9% USA. It has already been noted that the average age of the population is a factor. Where each country is in the spread of COVID-19 incidents are - in terms of timeline - is also important for death count, as patients move into critical stages in week 2 or later. Hence this ratio for USA cannot yet be compared with that of Italy and Spain.
Underlying Factors of Ignorance: While all of the above issues could be addressed to a reasonably satisfactory extent in data analysis, the fundamental unknowns of Covid-19 – at this stage – are significant enough. This should explain why public health experts in open societies are reluctant to give definitive timelines for recovery, projections of cases, deaths and the like. The virologists, healthcare experts and public health researchers are still working on several unanswered questions {b} : how exactly does the virus spread; Can people become reinfected?; how many cases are actually there in each country?; how deadly is the virus?; is it seasonal?; why children are not getting sick? What role the children play in the spread of this virus?; when will it end? And how? Will it become endemic?
Even the planned human interventions such as the discovery of a successful vaccine; drugs and antibody treatments are currently only gross estimates – which complicate medium term projections.
Conclusion: At present, Covid-19 datasets and statistical / probabilistic projections may seem imprecise and speculative to a lay observer. However, keeping the basic definition of probability and statistics in view – as quantification of our ignorance – this level of impreciseness in projections and estimates is directly proportional to the level of ignorance in the scientific community about this new and deadly public health menace. More assertive inferences based on statistics can only be made at the risk of neglecting the lack of clarity on the underlying socio-political factors as well as the current gaps in the knowledge of epidemiology of Covid-19.
Sources:
{a} https://www.worldometers.info/coronavirus/#countries
RN, MSN,
4 年Great eye opener for interpreting data. Thank you
Principal at Space Analytics, LLC - expert services in architecture, site and urban design for real estate
4 年You are quite right. Senthil.
Retired from DP World
4 年Hi Senthil, how is the condition in Abu Dhabi? And what is happening about the repatriation of Indian workforce there who lost their jobs due to COVID-19?
Technology Leadership | Data Strategy | Data Governance | Data Analytics | Digital Transformation | Design Thinking | Consulting
4 年Thanks for sharing an eye opening article on "Data and Statistics on COVID-19" at the right time, Dr. Senthil!
Thank you Dr Senthil for compiling this . Good insights