?? Day94 of #100DaysOfPython ??
Today, we're diving into another technique for handling missing values known as Random Sample Imputation!
Let's dive into an example on imputing missing observations in the feature through a randomly picked sample:
Interpretation of the plot:
In case of imputing missing values in the Age feature with median values it was observed that the standard deviation changed from 14.5 to 13 and most of the observations are close to the median (represented by blue line). The disadvantage of this is that it leads a change in variance of the dataset and distorts the data distribution.
However, in case of imputing the missing observations in the Age feature with randomly picked values, there is almost no change in standard deviation and the distribution of the data leading to no change in variance. Represented by Red & Green line in the plot.
Advantage:
Disadvantage: