?? Day14 of #100DaysOfPython ??
Today, we're diving into types of missing values in a dataset!
Q. What are the different types of missing data?
1. Missing Completely at Random (MCAR): Observations in a feature are said to be missing at random when the probability of a particular observation missing from the feature is not different than probability of any other observation missing from that feature. Disregarding such cases form the data would not have a negative impact on the predictive power of the model/inferences made. There is no relationship between the missing data and any other values.
2. Missing not at Random: Systematic missing values; there is a relationship between the missing data and any other - missing or observed data - within the dataset.
3. Missing at Random (MAR): MAR is more common & realistic than MCAR. In this case the observed data and the missing values do not come from the same data distribution.
For e.g., A company surveys its employees for level of depression they experience at work. In this case, men are less likely to fill a survey for depression resulting in a missing value which is has nothing to do with their level of depression.
Let's use this dataset to observed examples of the abovementioned missing value types:
Stay tuned for a complete deep dive on feature engineering on GitHub!