Artificial Intelligence (AI) / Machine Learning (ML) experts supporting Epidemiologists. Let's think more locally than globally!
George Gvishiani
Machine Learning & AI Researcher | President of the Wharton Club of the United Kingdom | Board Director | Board Advisor | Non Executive Director | Quant | CEO | Founder of Wharton Alumni AI Studio
I hope you and all your loved ones are well.
For the last few weeks, I had an opportunity to brainstorm many issues / ideas with several leading Epidemiologists, based in different countries... and to think how AI (Artificial Intelligence) - ML (Machine Learning) approaches could help us to better manage the fight against COVID-19.
For a non-domain expert (in Epidemiology) like me, it is easy to get overwhelmed in the vast number of research articles and contradictory information. At this stage, we need to direct our intellectual capacities towards solving most relevant issues. I also believe it is vital to keep the holistic picture in mind and to distinguish major problems from details. So, I wanted to share with you my view of major families of COVID-19 related issues (and possible models solving them) that I encountered. ??
As many of you have probably noticed, I see two major trends that make research results less effective :
1. Variables related to the virus are evaluated without considering the context of the model. A simple example would be evaluating R0 in a region X, without considering the severity and expansion of the illness in the same region X (arbitrarily inferring several params for SIR).
2. Trying to extrapolate the results of a model to the largest possible population. This makes a model that could have been very useful for a specific geography or community, completely useless and/or even risky for any decision-making anywhere.
Starting with these two really basic issues, I think we need to keep fundamental goals for our models and a possible extrapolation of a model in control. ???
(I) Major Goals
Two fundamental goals, for all we do when managing an epidemy / pandemy (COVID-19 or other) are to:
(1) maintain continuous delivery of medical services active, so that no individual is denied this service.
(2) Identify the most vulnerable part of society and provide them with adequate prevention.
(II) Extrapolation problem
There are many variables to consider when building the SIR or any other model (access to healthcare, local hygiene, dominant public transportation modes, “version” of a virus, etc.). These variables are moving targets; RNA viruses mutate fast.
Hence, the model needs to be dynamic or retrained as frequently as needed and even more importantly, it must target a specific segment (especially geographies) of a population.
So we need to be extremely careful when extrapolating - actually, if possible, we need to avoid any geography and/or community-wise extrapolation. Similarities are very misleading. It is better to limit findings to specific communities/geographies.
Many relevant variables are much more stable within the same geographies / communities. Again, it is better to create models to help limited geographies / communities. This increases dramatically the effectiveness of the model / simulations / predictions / inferences.
?? With these two major goals and extrapolation problem in mind, I tried to classify all important challenges I saw on a daily basis in three groups:
1. Keep medical system operational
1.1. Predict whether a patient entering the hospital will need intubation (Ventilator, ER, ICU).
1.2. Estimate the time lapse from the entry point (at hospital) to intubation for severe cases.
This is different from only predicting whether a patient is ill / affected by the virus.
The purpose is to know with high accuracy (confidence / credible interval) whether the patient will need the assistance with ventilators and if yes, when.
These are also very important proxies (especially in post-quarantine stage), in addition to standard variables used in epidemiology (i.e. R0 (basic reproductive ratio), the denominator for death rate, etc.), as we need to know how strict Government measures need to be to ensure continuity of medical services.
If the number of patients in need of ventilator approaches a limit of capacity of a medical system, then we need to increase the severity of the social distancing. Again, there is a very little utility of knowing R0 if we do not know the severity of the illness caused by the transmitted virus in a specific geographic region.
These two points (1.1. and 1.2.) allow Governments, in collaboration with all relevant domain experts and specialists, to estimate with high precision the degree and severity of measures needed to:
- prevent and avoid overload at hospitals.
- ensure the continuity and efficiency of medical services.
2. Protect the Most Vulnerable while developing Herd Immunity
2.1. We can relatively safely assume that we can build herd immunity at least for the coming 2-3 years (if the virus does not mutate faster than the Epidemiologists predict).
This is probably the most likely scenario with a virus of the kind. And in principle, the information about patients being re-infected in South Korea was erroneous.
You can consult this reference: https://newsinfo.inquirer.net/1266758/tests-in-recovered-patients-in-s-korea-found-false-positives-not-reinfections-experts-say?fbclid=IwAR3AI0v4yxKSw8Mk1wiU0i3Bde_yP7L5DDijhl_Va5XkHqPY_NxD3CC5TZ0.
We just saw "dead" RNA remains in many patients, and these patients were NOT re-infected ones.
Hence, we need to think about the dynamic models / simulations for helping devising strategies for building herd immunity efficiently (to reach 60% - 70% infection rate in the general population), such that we still protect the most vulnerable part of the population.
In other words, we need to be able to keep the most vulnerable individuals in 30-40% of the population that hopefully will not be infected by the virus at all or not at this stage; and who hopefully will be later protected by herd immunity built within the society or a vaccine / medication.
So, ideally, this way we would have managed to break the chain of the virus, avoiding the virus to affect the most vulnerable while also succeeding in gaining herd immunity among general population.
2.2. If we cannot develop herd immunity, then medicine is the only most relevant and remaining solution.
(This is a less likely scenario; to date, everything points in the opposite direction.)
3. Models for drug discovery / medicine research
These are, unfortunately, medium to long term goals. But they are very important.
3.1. Alleviate / neutralize the complications that could eventually be lethal (or lead to death).
3.2. Help medicine / vaccine research.
?? Having these points (1 to 3) always in mind helped me a lot in remembering how output of one model can be considered as input to several more complex models. Hence, I quickly shared them here.
4. I would also like to add some personal notes :
4.1. If we follow a strict Bayesian approach, then we need to avoid choosing easy to compute but otherwise enigmatic priors to everything that looks difficult to tackle.
Huge amount of data is already available about many key pain points. So, we need to use available data fully.
If our model, on our validation set, predicts the number of deaths to date to be inaccurate (too high or too low), it is better to check and to correct the assumptions than moving forward with the prediction and publishing the results.
4.2. If we have extra time, we need to take the initiative, we need not to wait for someone to call on our expertise. Again, several data sources are freely available online.
4.3. There are apparently many contradictory claims... While, in reality, they are not that contradictory (at least for modelling purposes).
A good example that comes into mind would be the claims that "deaths are under-counted" vs "deaths are over-counted". I personally think both can be true at the same time. I think, last year or before, few doctors would have been thinking about or documenting the fact that a patient who had an obvious problem with cardiovascular disease and died from cardiovascular disease - also probably had asymptomatic flu that could have been a factor contributing to or causing the death. This year COVID-19 is present as number 1 possible cause of death on the list of majority doctors worldwide. On the other hand, there may be many cases of COVID-19 deaths gone uncounted, due to the chaos that the extremely fast spread of the disease caused, among other reasons.
With that, I wish you all the best in your research and thanks for reading,
George Gvishiani