Visualize/Analyze progression of COVID-19 (Part 1 of 2)

Visualize/Analyze progression of COVID-19 (Part 1 of 2)

No alt text provided for this image

Similar to COVID-19 outbreak that started in China, Back in 1854 when London was emerging as the first modern city of the world, there was a widespread Cholera Outbreak that had no cure, with containment as the only means to stop it from spreading. Dr. John Snow a respected Physician vehemently refuted the established Miasma Theory(Cholera was spreading due to air) and came up with an ingenious idea to mark on a map of London, the locations of all known cases of cholera that led to death. To put his idea to execution, He went door to door and surveyed the entire city to mark all cases of death as stacks of bars on the map, with each bar attributing to a death. Based on the observations from the surveyed map where stacks of bars were concentrated near water pumps, he was able to prove to the civic authorities that Cholera was in fact spreading due to contaminated water and not through the air which eventually helped in containing the outbreak.

COVID-19 has affected billions of global citizens and changed our way of life across the world in a matter of days. Dr. John Snow's story inspired me to visualize the progression of this pandemic ever since the outbreak started spreading outside of China in January 2020 using Qubole Jupyter Notebooks and animated visualizations using Matplotlib & Plotly.

Data Ingestion & Curation:

For the task at hand, we will leverage Johns Hopkins University GitHub data repository for COVID-19 that's refreshed on a daily frequency ( https://github.com/CSSEGISandData/COVID-19 ). As the data is not the class of big data, the use of Python kernel to acquire and process the data on a single compute node will suffice. The routines for Data acquisition & curation are sourced from my GitHub repo https://github.com/Pradeep39/covid19-analytics/blob/master/utilities/covid19.py.

The last 2 lines of the code snippet below, will execute the ingestion routines from the sourced py file and retrieve the historical data from the referenced COVID-19 data repository. These ingestion routines run a series of transformations to curate a dataset that is conducive to visualizing the progress of COVID-19

import requests
url="https://raw.githubusercontent.com/Pradeep39/covid19-analytics/master/utilities/covid19.py"
exec(requests.get(url).text)
covid_pdf=ingest()
covid_ts_pdf=get_covid_ts(covid_pdf)
covid_ts_pdf.head(3)

Now that we have curated the COVID-19 dataset, we will move to the next step of visualizing its progress using various visualizations.

Visualization 1: Visualize the Progression using a Racing Bar Chart.

The below code snippet helps visualize the progression of confirmed cases across the world using an animated racing bar chart developed using Matplotlib. The draw_barchart routine that is doing the heavy lifting here, is archived in the py file sourced from https://github.com/Pradeep39/covid19-analytics/blob/master/utilities/covid19.py

import os
os.chdir('/tmp/')
from datetime import date, timedelta
import datetime
fig, ax = plt.subplots(figsize=(15, 12))
sdate = date(2020, 1, 22)   # start date
edate = datetime.date.today() #+datetime.timedelta(days = 1)   # end date
periods = (edate - sdate).days       # as timedelta
rng = pd.date_range(sdate,periods=periods, freq='d').strftime('%m/%d/%y')
animator = animation.FuncAnimation(fig, draw_barchart, frames=rng,
                         interval=800,repeat=False,
                         fargs=("Date","Confirmed",
                            "Country",covid_ts_pdf,
                            20,"COVID-19 Racing Bar Chart"))
HTML(animator.to_jshtml())
No alt text provided for this image

Visualization 2: Visualize the progression of Deaths, Recovered & Confirmed cases using an animated Scatter Plot.

Using a high-level plotly express visualization library, the below simple code snippet will help us visualize an animated Scatter Plot and see the progression of Deaths, Recovered & Confirmed Cases.

import plotly.express as px
fig = px.scatter(covid_pdf, x="Deaths", y="Recovered", animation_frame="Date", animation_group="Country",
           size="Confirmed", color="Country", hover_name="Country", 
           range_x=[-500,3500], 
           log_x=False,log_y=True,
           height=800,
           size_max=150)
No alt text provided for this image

Visualization 3: Visualize the progression on an animated scatter map.

Using the same high-level plotly express visualization library, the below simple code snippet helps us visualize an animated Scatter Plot and see the progression of Deaths, Recovered & Confirmed Cases across the world.

import plotly.express as px

fig = px.scatter_mapbox(covid_lat_long_pdf, lat="latitude",lon="longitude", animation_frame="Date", animation_group="Country", hover_data=["Country","Confirmed","Deaths","Recovered"], size="Confirmed",color="Country",   
                  color_continuous_scale=px.colors.cyclical.IceFire, size_max=150, zoom=1.5,height=950)
  
#"open-street-map", "carto-positron", "carto-darkmatter", "stamen-terrain", "stamen-toner" or "stamen-watercolor"                
fig.update_layout(mapbox_style="stamen-watercolor")
No alt text provided for this image

Observations:

Based on the observations from the 3 visualizations, Countries like South Korea, Bahrain, and Taiwan did a commendable job in containing the spread of COVID-19. When searching for a correlation as to what different measures these countries have taken, I found the below chart published by https://ourworldindata.org/covid-testing, which points to the fact that the ability to offer and do more testing per 1 million population directly correlates with effectiveness in containing the spread of the outbreak.

No alt text provided for this image

Summary:

Assuming this correlation found above is right, I hope other countries will follow suit and implement the measures taken by countries like Bahrain, South Korea and Taiwan to make COVID19 testing widely available to the general public, and help drive this pandemic back into the wild.

* Concluding Part of this article: https://www.dhirubhai.net/pulse/visualizeanalyze-progression-covid-19-part-2-pradeep-reddy/

**Note: The above content was curated using Qubole’s Big Data Platform that offers a choice of cloud, big data engines, tools & technologies to activate Data in the cloud. At Qubole, We are excited about the launch of JupyterLab where this content was curated. You may test drive Qubole 14 days free at https://www.qubole.com/lp/testdrive/

Balaji Chopparapu

Software Architect

5 年

Thank you Pradeep. This is really ncie

Alok Srivastava

Principal Engineer

5 年

This is very cool Pradeep !!

Eshwar Dandapani

Customer Success Leader | Achieved 120% NRR at Grafana Labs ? Passionate about driving retention and growth ? Transforming Relationships into Partnerships

5 年

Great article Pradeep! Really nice story about Dr. John Snow and the cholera outbreak.....

Kamal Syal

VP and Chief Enterprise Architect at Eaton

5 年

Great Analytics Pradeep !

要查看或添加评论,请登录

Pradeep Reddy的更多文章

社区洞察

其他会员也浏览了