登录查看更多内容

Lecture: Applied Data Science - Covid 19 prototype

Frank Kienle

Digital Strategy Manager - Roche Diagnostics Operations Mannheim

发布日期: 2020年4月10日

In the last weeks in quarantine, I started to restructure my , Applied Data Science Lecture' given at the TU Kaiserslautern, Germany with a focus on Covid-19 datasets.

The lecture is now accessible via the Udemy platform. Check it out when interested, the outline ist attached.

Outline: Applied Data Science - COVID 19 Data Prototype

The goal of this lecture is to transport the best practices of data science from the industry while developing a COVID-19 analysis prototype. The student should learn the process of modeling and a methodology to approach a business problem. For this, we will introduce the CRISP-DM process. The outline of this lecture follows the CRISP structure

Business understanding
Data Understanding
Data Preparation
Modeling (statistical and machine learning)
Deployment

The entire lecture follows the development flow of a rapid prototype project. Focus for the student is to develop its own prototype (source code snipped are provided)

Topics and the corresponding programming (Python) are introduced step by step in a very compact way with the goal to show a full walkthrough. (Additional sources might be mandatory to follow all steps).

The outline with the individual lectures and learning topics are as follows:

Section 1: Introduction

Introduction
Learning Goals and Content Overview
Used Python Resources

Section 2: Business Understanding

Applied data science should follow a process. We will use the CRISP-DM process to set up the project

4. Introduction to Data Science

5. CRISP-DM

6. Terminology Data Science

7. Python Project Setup (Anaconda, Python cookiecutter)

Section 3: Data understanding

The data gathering can be a cumbersome task. We will access the COVID Johns Hopkins data set and other sources by an API call.

8. Introduction Data Understanding

9. Data Gathering - Johns Hopkins GITHUB (project storage setup, github access)

10. Data Gathering Web Scraping Example (Python requests, beatifulsoup)

11. Data Gathering API call (COVID-19 data Germany, JSON format)

12. Data Gathering REST API call (REST service definition, COVID-19 data USA via smartable.ai)

13. Data Gathering wrap up

Section 4: Data Preparation

Data manipulation and transformation is an essential of any data scientist. We will transform the data towards clean data structures for test and feature construction purposes

14. Initial Data Preparation (Python pandas, first test data flat table)

15. Conversion of Date Object (Python datetime class)

16. Relational Data Structure (Primary Key and relational data model)

Section 5: Explorative Data Analysis - Dynamic Dashboards

Visualizing the first result from static to dynamic dashboards

17. Introduction - Plotting with Matplotlib (static plots with matplotlib / seaborn)

18. Dynamic Plots with Plot.ly

19. First Dynamic COVID 19 visualization (callbacks example and first prototype)

20. Dynamic Plots via Dash (DASH example, client-server setup)

Section 6: Modeling

Within the modeling phase the different mathematical transformation takes place. You will learn exponential slopes, linear regression, rolling linear regression, and signal filtering techniques

21. Modeling Introduction

22. Modeling start with helper functions (functions for quick visualization and data prep)

23. Exponential Slopes (calculating doubling rates, pandas apply function)

24. Machine Learning Basics Introduction (terminology, learning and prediction pipeline)

25. Scikit-Learn Linear Regression (scikit-learn library, ordinary least square)

26. ML model Hypothesis (learning/fit pipeline and prediction pipeline )

27. Log Feature Transformation (applying linear regression on exponential data)

28. Piecewise linear regression (regression over a sliding window, approximating doubling rates)

29. Filtering the COVID Input and Doubling Rates (Python scipy, Savitzky-Golay filtering)

Section 7: Evaluation - Full Walkthrough

The goal is to update, process, and visualize all data on one click. We will stick all parts together on the full data set. All results will visualization in the final DASH prototype

30. Preparing the full Walkthrough - Minimum Viable Product

31. Groupby apply on test data set (groupby apply on the relational data set)

32. Merging the full dataset (Python pandas left join on full data set)

33. Automated Feature Transformation (push from Ipython notebooks to scripts)

34. Finalizing the Minimum Viable Product (visualized and adapted the world wide dataset)

Section 8: Deployment

We will have a look at the next steps of professional application delivery and wrap up the prototyping with best practices overview

35. Prepare for Professional Software Delivery (functional vs. non-functional requirements, agile development)

36. Summary Best Practices (working in a company - it is all about value delivery)

Section 9: Predictive Machine Modeling

Modeling predictive applications is one key activity for data scientists. In this section, we will have a look at time series forecasting with Facebook Prophet, how to train the machine learning model, and how to evaluate the results.

37. Forecasting / Predictions Overview (terminology forecasts vs. predictions, which technique for which problem)

38. Overfitting Introduction

39. Overfitting data preparation (prepare the COVID data)

40. Overfitting demo and metrics (Polynomial regression, demo, and metrics)

41. Cross-validation Explained (time series cross-validation)

42. Forecasts Programming Introduction (Time-series data setup )

43. Forecasts with Facebook Prophet (predict COVID data, what is a horizon)

44. FB Prophet Cross-validation (validation function and interpretation)

45. Controlling Results and trivial model (understand and interpret forecast, trivial moving average model)

46. Selection Bias and Variance (variance in data or in the model, link to Bayesian theory)

Section 10: Simulation of SIR compartmental model

Compartmental models are a technique used to simplify the mathematical modeling of infectious disease. In this section, we will understand and simulate the SIR model including a curve-fitting approach

47. SIR modeling of infectious disease (couple differential equation)

48. Simulating the SIR curves (Monte Carlo simulation of infection rates)

49. Curve fitting of SIR parameters (Python scipy, integration of ordinary differential equation, model posterior fit)

50. Dynamic SIR Simulation Example (adapting and showing a dynamic infection rate)

51. Thank you

Matthias Welge

Principal Consultant / Head of Investor Support

4 年

Great approach to really integrate today's top topic in your lectures.

要查看或添加评论，请登录

查看全部

Lecture: Applied Data Science - Covid 19 prototype

Frank Kienle

Digital Strategy Manager - Roche Diagnostics Operations Mannheim

更多精彩文章

社区洞察

其他会员也浏览了

Data Science Courses (DISCOUNTS)

Mastering Data Science From Basics to Advanced

Cracking the Code of Data Science: A 12 Step Guide to Becoming a Data Scientist

Best Institute for Data Science

Pandas for Data Science

Tools of Data Science: Empowering Insights and Innovation

How I’d Become a Data Scientist (If I Had to Start Over)

Know how Pandas Profiling makes data exploration easier and more effective.

Get Started with Data Science - Minimum Viable Tool (MVT)

Data Science for Beginners: A Step-by-Step Guide (AI for?everyone)

Speaking with your data - insane!

2023年7月8日

Data Science on Sustainable Development Goals

2022年1月20日

Designing a lecture or a business starts with the same 5 key questions!

2021年9月15日

Delivering data failure daily - the new CRISP-DF

2021年6月9日

Shocking Correlation Found!

2021年3月2日

Enterprise Data Science

2020年9月2日

COVID-19 spread speed (20 march)

2020年3月15日

Micro-services and its link to customer value

2019年2月9日

F k you TL;DR

2018年11月29日

Challenges in B2B machine learning / AI

2018年10月3日