?? Day14 of #100DaysOfPython ??

?? Day14 of #100DaysOfPython ??

Today, we're diving into types of missing values in a dataset!

Q. What are the different types of missing data?

1. Missing Completely at Random (MCAR): Observations in a feature are said to be missing at random when the probability of a particular observation missing from the feature is not different than probability of any other observation missing from that feature. Disregarding such cases form the data would not have a negative impact on the predictive power of the model/inferences made. There is no relationship between the missing data and any other values.

2. Missing not at Random: Systematic missing values; there is a relationship between the missing data and any other - missing or observed data - within the dataset.

3. Missing at Random (MAR): MAR is more common & realistic than MCAR. In this case the observed data and the missing values do not come from the same data distribution.

For e.g., A company surveys its employees for level of depression they experience at work. In this case, men are less likely to fill a survey for depression resulting in a missing value which is has nothing to do with their level of depression.

Link to the dataset

Let's use this dataset to observed examples of the abovementioned missing value types:

1. Initializing titanic dataset and observing the missing values to understand which feature classifies as MNAR/MCAR
2. Observing rows that will classify as MCAR
3. Observing a case of MNAR for 'Cabin' feature in the Titanic dataset

Stay tuned for a complete deep dive on feature engineering on GitHub!

要查看或添加评论,请登录

社区洞察

其他会员也浏览了