Missing Values Guide
First let us understand what a missing value really is?
Missing values occur when there is no data or value stored for the variable in an observation. Missing data are a common occurrence and can have a significant effect on the conclusions drawn from the data. And this can be due to multiple reasons such as accidental or data entry error.
Let us also try to understand why having missing values in a dataset is a problem for us? or what impact do the missing values have on our machine learning models?
Missing data are problematic because, Firstly, most statistical procedures require a value for each variable. Secondly, the missing data can cause a bias in the estimation of parameters and the accuracy of our machine learning model can be affected. Hence we can finally conclude that having missing values in our data is not at all good for our predictive model. And because of this reason we will have to threat missing value in the most appropriate way, so that it does not become a threat to our predictive model.
Now let us hop onto the types of Missing Values.
MCAR which stands for "Missing Completely at Random", MAR which stands for "Missing at Random", MNAR which stands for "Missing Not at Random".
Well, now let us try to understand more about these values in detail one by one.
> MCAR : The types of missing values are the values which are missing without any reason or pattern. The values are missed randomly across the dataset and have no association with any other factor. Next we have, > MAR : The missing values in this category have some association with other features of the dataset. The variables which has missing values can be nearly related to any other variable of the dataset. > MNAR : These types of missing values are the values which are missing with some specific reasons. And we will have a clear understanding and logic for the missing value.
Now let us see what kind of method to acquire to solve each of the above mentioned missing value types,
i) For MAR - Maximum Likelihood - Expectation Maximization - Listwise Deletion - Regression Imputation
ii) For MCAR - Maximum Likelihood - Expectation Maximization - Listwise Deletion - Regression Imputation - Pairwise Deletion - Mean/ Median Imputation - Hot/ Cold deck Imputation - Case Substitution - Prior Knowledge
iii) For MNAR - Listwise Deletion
Software Engineer | WordPress Developer | Transitioning to React.js | Passionate about Frontend Development | Open to New Opportunities.
3 年Good post All the best.
Associate Manager | Research Analyst | Data Analyst | Developer
3 年Nice work