Main Challenges to Machine Learning
Pratyush Singh
Trainee Associate @Western Union | Ex-Intern @ Capgemini | Aspiring ML Engineer
Our main task in machine learning is to select a machine learning algorithm and train it using some data, So, the two things that can go wrong here is - "bad machine learning algorithm" and/or "bad data".
BAD DATA
Insufficient quantity of training data
Machine learning algorithms require thousands of examples for fairly simple problems, and for complex problems like image or speech recognition we may required millions of training examples.
Non-representative training data
It is important to use a training set that is representative of the cases we want to generalize to. If the sample is too small we may have non-representative data as a result of chance(called sampling noise) and even very large samples can be non-representative if the sampling method is flawed(called sampling bias).
It is crucial to look out for nonresponse bias ( happens when the individuals willing to take part in a research study are different than those who do not want to or are unable to take part in it) during sampling.
Poor quality data
If the training data is full of errors , missing values , outliers and noise , it will make it harder for the system to detect underlying patterns in the data during training and so the system might not perform well. It is often well worth the effort to spend time cleaning up the training data.
Irrelevant features
The machine learning system only learns if our training data contains enough relevant features and not many irrelevant ones. Coming up with a good set of features to train on is called feature engineering , and it involves:
领英推荐
BAD ALGORITHM
Overfitting the training data
Overfitting - It means that the model performs well on the training data but does not generalize well on new instances.
Overfitting happens when the model is too complex relative to the amount and noisiness of the data and the model is learning patterns in the noise itself.
Possible solutions-
Constraining the model to make it simpler and reduce the risk of overfitting is called Regularization. We need to find the right balance between fitting the training data perfectly & keeping the model simply enough to ensure that it generalizes well.
Underfitting the training data
Underfitting is the opposite of overfitting. It means that our model is too simple to learn the underlying patterns in the data.
Possible solutions-
Student at Dr.Vishwanath Karad MIT WORLD PEACE UNIVERSITY|PUNE
8 个月Very informative