Lecture: Applied Data Science - Covid 19 prototype

Lecture: Applied Data Science - Covid 19 prototype

In the last weeks in quarantine, I started to restructure my , Applied Data Science Lecture' given at the TU Kaiserslautern, Germany with a focus on Covid-19 datasets.

The lecture is now accessible via the Udemy platform. Check it out when interested, the outline ist attached.


Outline: Applied Data Science - COVID 19 Data Prototype

The goal of this lecture is to transport the best practices of data science from the industry while developing a COVID-19 analysis prototype. The student should learn the process of modeling and a methodology to approach a business problem. For this, we will introduce the CRISP-DM process. The outline of this lecture follows the CRISP structure

  • Business understanding
  • Data Understanding
  • Data Preparation
  • Modeling (statistical and machine learning)
  • Deployment

The entire lecture follows the development flow of a rapid prototype project. Focus for the student is to develop its own prototype (source code snipped are provided)

Topics and the corresponding programming (Python) are introduced step by step in a very compact way with the goal to show a full walkthrough. (Additional sources might be mandatory to follow all steps).

The outline with the individual lectures and learning topics are as follows:

Section 1: Introduction

  1. Introduction
  2. Learning Goals and Content Overview
  3. Used Python Resources  

Section 2: Business Understanding

Applied data science should follow a process. We will use the CRISP-DM process to set up the project

4. Introduction to Data Science  

5. CRISP-DM           

6. Terminology Data Science    

7. Python Project Setup   (Anaconda, Python cookiecutter)

Section 3: Data understanding

The data gathering can be a cumbersome task. We will access the COVID Johns Hopkins data set and other sources by an API call.

8. Introduction Data Understanding       

9. Data Gathering - Johns Hopkins GITHUB    (project storage setup, github access)

10. Data Gathering Web Scraping Example    (Python requests, beatifulsoup)

11. Data Gathering API call  (COVID-19 data Germany, JSON format)

12. Data Gathering REST API call  (REST service definition, COVID-19 data USA via smartable.ai)

13. Data Gathering wrap up          

Section 4: Data Preparation 

Data manipulation and transformation is an essential of any data scientist. We will transform the data towards clean data structures for test and feature construction purposes

14. Initial Data Preparation  (Python pandas, first test data flat table)

15. Conversion of Date Object  (Python datetime class)

16. Relational Data Structure  (Primary Key and relational data model)

Section 5: Explorative Data Analysis - Dynamic Dashboards  

Visualizing the first result from static to dynamic dashboards

17. Introduction - Plotting with Matplotlib (static plots with matplotlib / seaborn)

18. Dynamic Plots with Plot.ly       

19. First Dynamic COVID 19 visualization   (callbacks example and first prototype)

20. Dynamic Plots via Dash  (DASH example, client-server setup)

Section 6: Modeling 

Within the modeling phase the different mathematical transformation takes place. You will learn exponential slopes, linear regression, rolling linear regression, and signal filtering techniques

21. Modeling Introduction 

22. Modeling start with helper functions (functions for quick visualization and data prep)

23. Exponential Slopes   (calculating doubling rates, pandas apply function)

24. Machine Learning Basics Introduction  (terminology, learning and prediction pipeline)

25. Scikit-Learn Linear Regression     (scikit-learn library, ordinary least square)

26. ML model Hypothesis        (learning/fit pipeline and prediction pipeline ) 

27. Log Feature Transformation       (applying linear regression on exponential data)

28. Piecewise linear regression       (regression over a sliding window, approximating doubling rates)

 29. Filtering the COVID Input and Doubling Rates (Python scipy, Savitzky-Golay filtering)

Section 7: Evaluation - Full Walkthrough   

The goal is to update, process, and visualize all data on one click. We will stick all parts together on the full data set. All results will visualization in the final DASH prototype

30. Preparing the full Walkthrough - Minimum Viable Product 

31. Groupby apply on test data set (groupby apply on the relational data set)

32. Merging the full dataset     (Python pandas left join on full data set)

33. Automated Feature Transformation  (push from Ipython notebooks to scripts)

34. Finalizing the Minimum Viable Product (visualized and adapted the world wide dataset)

Section 8: Deployment

We will have a look at the next steps of professional application delivery and wrap up the prototyping with best practices overview

35. Prepare for Professional Software Delivery (functional vs. non-functional requirements, agile development)

36. Summary Best Practices (working in a company - it is all about value delivery)

Section 9: Predictive Machine Modeling   

Modeling predictive applications is one key activity for data scientists. In this section, we will have a look at time series forecasting with Facebook Prophet, how to train the machine learning model, and how to evaluate the results.

37. Forecasting / Predictions Overview (terminology forecasts vs. predictions, which technique for which problem)

38. Overfitting Introduction      

39. Overfitting data preparation    (prepare the COVID data)

40. Overfitting demo and metrics    (Polynomial regression, demo, and metrics)

41. Cross-validation Explained    (time series cross-validation)

42. Forecasts Programming Introduction (Time-series data setup )

 43. Forecasts with Facebook Prophet   (predict COVID data, what is a horizon)

44. FB Prophet Cross-validation  (validation function and interpretation)

45. Controlling Results and trivial model (understand and interpret forecast, trivial moving average model)  

46. Selection Bias and Variance     (variance in data or in the model, link to Bayesian theory)

Section 10: Simulation of SIR compartmental model

Compartmental models are a technique used to simplify the mathematical modeling of infectious disease. In this section, we will understand and simulate the SIR model including a curve-fitting approach

47. SIR modeling of infectious disease (couple differential equation)

48. Simulating the SIR curves (Monte Carlo simulation of infection rates)

49. Curve fitting of SIR parameters (Python scipy, integration of ordinary differential equation, model posterior fit)

50. Dynamic SIR Simulation Example (adapting and showing a dynamic infection rate)

51. Thank you


Matthias Welge

Principal Consultant / Head of Investor Support

4 年

Great approach to really integrate today's top topic in your lectures.

回复

要查看或添加评论,请登录

社区洞察

其他会员也浏览了