How To Deal With Missing Values In A Dataset-To Build An Unbiased ML Model
Every business wants to leverage AI/ML technology to reap maximum benefits and stay ahead in the market, but to do that it is important to build an unbiased ML model!
It's important to handle missing values appropriately. Because if the missing information is not handled correctly, you could wind up creating a biased machine learning model that produces false results. Also, missing data can make the statistical analysis less precise.
Below are a few simple steps to deal with missing values in the datasets:
1. A table's missing value rows or columns can be easily removed from the dataset. A column may be excluded from the analysis if more than half the rows in the column have null values. A similar approach can be used for rows when more than 50% of the columns have missing values. In cases where there are many missing values, this tactic might not be very useful.
2. If the columns with missing values and the column's data type are both numeric, the missing values can be filled in by taking the median or the mode of the remaining values in the column.
3. If the data in a column can be categorized, the missing values in that column can be replaced with the most often used?category. It can be replaced by a new category variable if more than half of the column values are empty.
4. Missing value prediction can also be performed, for example, regression or classification approaches can predict values depending on the nature of the missing values.