登录查看更多内容

Accelerate analyzing the influence of regional factors on COVID-19 using IBM PAIRS & Jupyter Notebooks.

Marc Fiammante

Inventor à Paris Brain Institute / AP-HP - leader of Newborn Neurodigital AI Convergence project. Retired IBM Fellow.

发布日期: 2020年6月12日

Authors (alphabetical): @Marc Fiammante, @Merijn Weiss, @Wiktor Mazin, PhD, MMT

Introduction

Recent studies have looked globally at the influence of various climatic factors and occurrences of COVID-19 ([uk uvindex],[indonesia sunlight], [global temp, humidity, latitude]). Even though these studies show some correlation, consistently interpreting, or even reproducing the results, is a challenge.

The available COVID-19 data, the measures to contain the spread and the behaviour of people change over time. They are different between countries or even different for regions in a country. In addition, the granularity of available data greatly differs.

However, given the fact that the spread and impact of COVID-19 is dependent upon local factors there is a need for multi-factor, low granularity (regional) data. Data that can be consolidated and analyzed in an iterative and agile fashion, so that more precise influences can be detected.

With my colleagues Merijn and Wiktor, we decided to start an exploratory research on assets that can help data scientists analyze influences of geospatial-temporal data on the current pandemic.

In a series of articles, we will share our progress as we continue with our exploration. Re-usable assets will be made available where applicable. We will discuss data access, review some of the recent studies and explain the analytical approaches included (eg Spearman, GAM). These studies and correlations will feed into practical examples on how to gather, analyze and visualize results.

Our exploration is only to identify & create assets for data scientists to explore geospatial-temporal data. The examples should not be taken as any interpretation of the results. We are not trained epidemiologists and therefore leave all interpretations to those that have the professional expertise.

IBM PAIRS

Climatic data can be found online from diverse sources, but getting access to all possible influencing factors, including non-climatic, often is a lengthy process. Many sources need to be integrated, coordinates aligned, and licensing must be taken care of.

However, the availability of such a consistent, fine-grained dataset is a pre-requisite for any geospatial-temporal analysis. A pre-requisite that IBM fulfills with the IBM PAIRS Geoscope platform.

IBM PAIRS Geoscope is a platform specifically designed for massive geospatial-temporal data. Data is ingested from a wide variety of sources and prepared for search-friendly access. The platform provides access to a rich, diverse, and growing catalog of continually updated, geospatial-temporal aligned information.

The current catalog has over 4 petabytes of data collected, curated, and ready to use, available in various categories, aligned geographically with consolidated resolution and coordinates.

Application: Agriculture, Rapid response, Wildfire
Domain: Atmosphere, Land surface, Oceans/lakes/rivers, Urban,
Sector: Animals/livestock, Economic, Energy, Geologic/soil, Political, Social, Transportation/infrastructure, Vegetation/crops, Weather/climate
Source: (IoT) sensor, Aerial/drone, Radar, Satellite, Survey
Type: Analytics product, Data product, Forecast, Measurement/survey

The platform is available on IBM Cloud and accessible via a GUI and API. A free edition with a subset of data is available to everyone via the GUI on https://ibmpairs.mybluemix.net/. For API access a Python SDK is available on https://github.com/IBM/ibmpairs

In our exploration we will use IBM PAIRS for climatic factors such as UV Index, Temperature, Humidity and Wind Speed.

Accessing COVID-19 pandemic country data

IBM PAIRS also includes data from John Hopkins University for the global spread of COVID-19. This data includes confirmed cases and deaths, however tracked on a country-level.

In our approach we wanted data at the regional level, and if available hospitalized and intensive care daily figures. We found that the metrics differ widely between the countries in terms of metrics tracked, the granularity, the definition and the quality. Nevertheless an attempt is made to harmonize the data where possible.

In our exploration ideally the following metrics are obtained on a regional level:

confirmed: individual tested positive for COVID-19
hospitalized: individual admitted to a general hospital and tested positive for COVID-19
hospitalized_icu: individual admitted to a ICU unit in the hospital and tested positive for COVID-19
recovered: individual confirmed to have recovered from COVID-19
deceased: individual confirmed to have passed away with COVID-19 infection

The current sources we are using are:

France

Official Open Data: https://www.data.gouv.fr/fr/datasets/chiffres-cles-concernant-lepidemie-de-covid19-en-france/
Data path: https://raw.githubusercontent.com/opencovid19-fr/data/master/dist/chiffres-cles.csv

Netherlands

Open Source Data Initiative: https://github.com/J535D165/CoronaWatchNL
Data Path: https://raw.githubusercontent.com/J535D165/CoronaWatchNL/master/data-json/data-provincial/RIVM_NL_provincial_latest.json

Denmark

Official data (zip file) from Statens Serum Institut: https://www.ssi.dk/sygdomme-beredskab-og-forskning/sygdomsovervaagning/c/covid19-overvaagning/arkiv-med-overvaagningsdata-for-covid19

Sweden

Official data (zip file) from the European Data Portal: https://www.europeandataportal.eu/data/datasets/https-free-entryscape-com-store-360-resource-12 ("Number of cases of coV-19 in Sweden per day and region")

We are in the process of adding more countries and have colleagues looking at regions in other continents.

Coming next…

The first analysis we will look at in the next article is a Spearman correlation. We will explore the correlation between a single predictor, UV Index, on various outcomes, such as the incidence of hospitalized COVID-19 patients.

We will make use of public data sources, IBM PAIRS, Jupyter Notebooks and various Python libraries to ingest the data, calculate the Spearman correlation coefficient, access the significance and visualize the outcomes such as in this example:

In this next article exploration we look at Spearman's correlation coefficient for different countries and when applying different Time Slices, Rolling Windows and Time Shifts.

Following article dives into the code and points to the public github with sample data for testing and details on how to get a 30 days free trial on PAIRS.

#ibm, #ibmpairs, #datascience, #resuableassets

要查看或添加评论，请登录

Marc Fiammante的更多文章

Seamless image keypoint knowlege transfer to convolutional network

2022年8月17日

Seamless image keypoint knowlege transfer to convolutional network

'Keypoints' of an image are point of interests that are invariant distinct local features in an image remanent through…

2 条评论
A Green and Sustainable AI cheat sheet

2021年7月12日

A Green and Sustainable AI cheat sheet

The concern is raising about AI resource consumption. In June CEN-CENELEC has created the CEN/CLC/JTC 21 on Artificial…

7 条评论
History and uses of the newly published artificial images for neural networks patent

2019年8月23日

History and uses of the newly published artificial images for neural networks patent

Finding vehicles on low resolution satellite images 2 years ago I worked in a challenge consisting of vehicle…

7 条评论
Improving image recognition with synthetics images

2017年10月4日

Improving image recognition with synthetics images

Overfitting in image recognition is a frequent problem due to limited size of training data sets. Another training…

4 条评论
A new approach for visualizing object edges in images

2017年9月19日

A new approach for visualizing object edges in images

Since I wrote the article below I proposed a contribution to OpenCV which has been accepted. It is called BrightEdges.

2 条评论

See all articles

Accelerate analyzing the influence of regional factors on COVID-19 using IBM PAIRS & Jupyter Notebooks.

Marc Fiammante

Inventor à Paris Brain Institute / AP-HP - leader of Newborn Neurodigital AI Convergence project. Retired IBM Fellow.

Introduction

IBM PAIRS

Accessing COVID-19 pandemic country data

Coming next…

Marc Fiammante的更多文章

社区洞察

其他会员也浏览了

Accelerated Data Analytics: Machine Learning with GPU-Accelerated Pandas and Scikit-learn

WHAT IS DATA STRUCTURES AND ALGORITHM ?

Mastering Collaborative Filtering with PySpark ALS Model: An Implementation Guide

The 40 NumPy Methods Data Scientists Use All the Time

Distributed XGBoost or Other Non-Spark Model Predictions with Pandas Categorical Data in PySpark

GAM associations between regional weather factors and COVID-19 using IBM PAIRS, Jupyter Notebooks and R

Deploying Machine Learning Models on Spark - a Comparative Study

Data Structures and Algorithms

THE MACHINE LEARNING DATA SCIENTIST'S TOOLKIT ESSENTIAL SKILLS AND TECHNIQUES FOR SUCCESS

The Open Data Hub Project – A End-to-End AI/ML Platform

Introduction

IBM PAIRS

Accessing COVID-19 pandemic country data

Coming next…

Marc Fiammante的更多文章

Seamless image keypoint knowlege transfer to convolutional network

A Green and Sustainable AI cheat sheet

History and uses of the newly published artificial images for neural networks patent

Improving image recognition with synthetics images

A new approach for visualizing object edges in images

社区洞察

其他会员也浏览了

Accelerated Data Analytics: Machine Learning with GPU-Accelerated Pandas and Scikit-learn

WHAT IS DATA STRUCTURES AND ALGORITHM ?

Mastering Collaborative Filtering with PySpark ALS Model: An Implementation Guide

The 40 NumPy Methods Data Scientists Use All the Time

Distributed XGBoost or Other Non-Spark Model Predictions with Pandas Categorical Data in PySpark

GAM associations between regional weather factors and COVID-19 using IBM PAIRS, Jupyter Notebooks and R

Deploying Machine Learning Models on Spark - a Comparative Study

Data Structures and Algorithms

THE MACHINE LEARNING DATA SCIENTIST'S TOOLKIT ESSENTIAL SKILLS AND TECHNIQUES FOR SUCCESS

The Open Data Hub Project – A End-to-End AI/ML Platform