登录查看更多内容

?? Day14 of #100DaysOfPython ??

Surya Singh

AI Consultant | MS in Machine Learning & AI | ex-EY

发布日期: 2024年4月12日

+ 关注

Today, we're diving into types of missing values in a dataset!

Q. What are the different types of missing data?

1. Missing Completely at Random (MCAR): Observations in a feature are said to be missing at random when the probability of a particular observation missing from the feature is not different than probability of any other observation missing from that feature. Disregarding such cases form the data would not have a negative impact on the predictive power of the model/inferences made. There is no relationship between the missing data and any other values.

2. Missing not at Random: Systematic missing values; there is a relationship between the missing data and any other - missing or observed data - within the dataset.

3. Missing at Random (MAR): MAR is more common & realistic than MCAR. In this case the observed data and the missing values do not come from the same data distribution.

For e.g., A company surveys its employees for level of depression they experience at work. In this case, men are less likely to fill a survey for depression resulting in a missing value which is has nothing to do with their level of depression.

Link to the dataset

Let's use this dataset to observed examples of the abovementioned missing value types:

1. Initializing titanic dataset and observing the missing values to understand which feature classifies as MNAR/MCAR

2. Observing rows that will classify as MCAR

3. Observing a case of MNAR for 'Cabin' feature in the Titanic dataset

Stay tuned for a complete deep dive on feature engineering on GitHub!

要查看或添加评论，请登录

查看全部

?? Day14 of #100DaysOfPython ??

Surya Singh

AI Consultant | MS in Machine Learning & AI | ex-EY

Q. What are the different types of missing data?

更多精彩文章

社区洞察

其他会员也浏览了

Highly Recommended Read for Data Enthusiasts!

Time & Space Complexity.

What is Central Tendency? Mean,Median & Mode

Handling missing values in time series

Data Visualization with Matplotlib and Seaborn

?? Day18 of #100DaysOfPython ??

Downside Measures: Semi-Deviation, VaR and CVaR

How to Create a Box Plot with Seaborn

Time Complexity of an Algorithm – Part 5

Reflections on the #DuBoisChallenge2024: from prints to python to prints

Q. What are the different types of missing data?

?? Day100 of #100DaysOfPython ??

2024年8月20日

?? Day99 of #100DaysOfPython ??

2024年8月19日

?? Day98 of #100DaysOfPython ??

2024年8月18日

?? Day97 of #100DaysOfPython ??

2024年8月17日

?? Day96 of #100DaysOfPython ??

2024年8月16日

?? Day95 of #100DaysOfPython ??

2024年8月15日

?? Day94 of #100DaysOfPython ??

2024年8月14日

?? Day93 of #100DaysOfPython ??

2024年8月13日

?? Day92 of #100DaysOfPython ??

2024年8月12日

?? Day91 of #100DaysOfPython ??

2024年8月11日

社区洞察

其他会员也浏览了

Highly Recommended Read for Data Enthusiasts!

Time & Space Complexity.

What is Central Tendency? Mean,Median & Mode

Handling missing values in time series

Data Visualization with Matplotlib and Seaborn

?? Day18 of #100DaysOfPython ??

Downside Measures: Semi-Deviation, VaR and CVaR

How to Create a Box Plot with Seaborn

Time Complexity of an Algorithm – Part 5

Reflections on the #DuBoisChallenge2024: from prints to python to prints