Visualize/Analyze progression of COVID-19 (Part 1 of 2)
Similar to COVID-19 outbreak that started in China, Back in 1854 when London was emerging as the first modern city of the world, there was a widespread Cholera Outbreak that had no cure, with containment as the only means to stop it from spreading. Dr. John Snow a respected Physician vehemently refuted the established Miasma Theory(Cholera was spreading due to air) and came up with an ingenious idea to mark on a map of London, the locations of all known cases of cholera that led to death. To put his idea to execution, He went door to door and surveyed the entire city to mark all cases of death as stacks of bars on the map, with each bar attributing to a death. Based on the observations from the surveyed map where stacks of bars were concentrated near water pumps, he was able to prove to the civic authorities that Cholera was in fact spreading due to contaminated water and not through the air which eventually helped in containing the outbreak.
COVID-19 has affected billions of global citizens and changed our way of life across the world in a matter of days. Dr. John Snow's story inspired me to visualize the progression of this pandemic ever since the outbreak started spreading outside of China in January 2020 using Qubole Jupyter Notebooks and animated visualizations using Matplotlib & Plotly.
Data Ingestion & Curation:
For the task at hand, we will leverage Johns Hopkins University GitHub data repository for COVID-19 that's refreshed on a daily frequency ( https://github.com/CSSEGISandData/COVID-19 ). As the data is not the class of big data, the use of Python kernel to acquire and process the data on a single compute node will suffice. The routines for Data acquisition & curation are sourced from my GitHub repo https://github.com/Pradeep39/covid19-analytics/blob/master/utilities/covid19.py.
The last 2 lines of the code snippet below, will execute the ingestion routines from the sourced py file and retrieve the historical data from the referenced COVID-19 data repository. These ingestion routines run a series of transformations to curate a dataset that is conducive to visualizing the progress of COVID-19
import requests url="https://raw.githubusercontent.com/Pradeep39/covid19-analytics/master/utilities/covid19.py" exec(requests.get(url).text) covid_pdf=ingest() covid_ts_pdf=get_covid_ts(covid_pdf) covid_ts_pdf.head(3)
Now that we have curated the COVID-19 dataset, we will move to the next step of visualizing its progress using various visualizations.
Visualization 1: Visualize the Progression using a Racing Bar Chart.
The below code snippet helps visualize the progression of confirmed cases across the world using an animated racing bar chart developed using Matplotlib. The draw_barchart routine that is doing the heavy lifting here, is archived in the py file sourced from https://github.com/Pradeep39/covid19-analytics/blob/master/utilities/covid19.py
import os os.chdir('/tmp/') from datetime import date, timedelta import datetime fig, ax = plt.subplots(figsize=(15, 12)) sdate = date(2020, 1, 22) # start date edate = datetime.date.today() #+datetime.timedelta(days = 1) # end date periods = (edate - sdate).days # as timedelta rng = pd.date_range(sdate,periods=periods, freq='d').strftime('%m/%d/%y') animator = animation.FuncAnimation(fig, draw_barchart, frames=rng, interval=800,repeat=False, fargs=("Date","Confirmed", "Country",covid_ts_pdf, 20,"COVID-19 Racing Bar Chart")) HTML(animator.to_jshtml())
Visualization 2: Visualize the progression of Deaths, Recovered & Confirmed cases using an animated Scatter Plot.
Using a high-level plotly express visualization library, the below simple code snippet will help us visualize an animated Scatter Plot and see the progression of Deaths, Recovered & Confirmed Cases.
import plotly.express as px fig = px.scatter(covid_pdf, x="Deaths", y="Recovered", animation_frame="Date", animation_group="Country", size="Confirmed", color="Country", hover_name="Country", range_x=[-500,3500], log_x=False,log_y=True, height=800, size_max=150)
Visualization 3: Visualize the progression on an animated scatter map.
Using the same high-level plotly express visualization library, the below simple code snippet helps us visualize an animated Scatter Plot and see the progression of Deaths, Recovered & Confirmed Cases across the world.
import plotly.express as px fig = px.scatter_mapbox(covid_lat_long_pdf, lat="latitude",lon="longitude", animation_frame="Date", animation_group="Country", hover_data=["Country","Confirmed","Deaths","Recovered"], size="Confirmed",color="Country", color_continuous_scale=px.colors.cyclical.IceFire, size_max=150, zoom=1.5,height=950) #"open-street-map", "carto-positron", "carto-darkmatter", "stamen-terrain", "stamen-toner" or "stamen-watercolor" fig.update_layout(mapbox_style="stamen-watercolor")
Observations:
Based on the observations from the 3 visualizations, Countries like South Korea, Bahrain, and Taiwan did a commendable job in containing the spread of COVID-19. When searching for a correlation as to what different measures these countries have taken, I found the below chart published by https://ourworldindata.org/covid-testing, which points to the fact that the ability to offer and do more testing per 1 million population directly correlates with effectiveness in containing the spread of the outbreak.
Summary:
Assuming this correlation found above is right, I hope other countries will follow suit and implement the measures taken by countries like Bahrain, South Korea and Taiwan to make COVID19 testing widely available to the general public, and help drive this pandemic back into the wild.
* Concluding Part of this article: https://www.dhirubhai.net/pulse/visualizeanalyze-progression-covid-19-part-2-pradeep-reddy/
**Note: The above content was curated using Qubole’s Big Data Platform that offers a choice of cloud, big data engines, tools & technologies to activate Data in the cloud. At Qubole, We are excited about the launch of JupyterLab where this content was curated. You may test drive Qubole 14 days free at https://www.qubole.com/lp/testdrive/
Software Architect
5 年Thank you Pradeep. This is really ncie
Principal Engineer
5 年This is very cool Pradeep !!
Customer Success Leader | Achieved 120% NRR at Grafana Labs ? Passionate about driving retention and growth ? Transforming Relationships into Partnerships
5 年Great article Pradeep! Really nice story about Dr. John Snow and the cholera outbreak.....
VP and Chief Enterprise Architect at Eaton
5 年Great Analytics Pradeep !