9 Steps for solving any machine learning problem
Ibrahim Sobh - PhD
?? Senior Expert of Artificial Intelligence, Valeo Group | LinkedIn Top Voice | Machine Learning | Deep Learning | Data Science | Computer Vision | NLP | Developer | Researcher | Lecturer
In this article, we will present a universal blueprint that we can use to attack and solve any machine-learning problem.
1) Defining the problem
First, you must define the problem. What is your input and what are you trying to predict? for example, you can learn to classify images if you have annotated data set. Same for sentiment prediction movie reviews. What type of problem are you facing? It depends on the applications, such as binary or multiclass classification, multiclass, multilabel classification, scalar or vector regression. Unsupervised learning such as clustering, or reinforcement learning.
One class of unsolvable problems we should be aware of is nonstationary problems.
For example, using machine learning trained on past data to predict the future is making the assumption that the future will behave like the past. That often isn’t the case.
2) Choosing a measure of success
To attain success, you must first define what success means to you. Are you looking for precision, recall, or customer retention rate? Loss function selection is usually based on the success metric. In other words, what you want your model to maximize or minimize, where it should be aligned with the higher-level business goals.
3) Data Splitting
Splitting the available data into three sets: training, validation, and testing. The model is trained on the training data and evaluated and tuned on the validation data. Once the model is ready for prime time, you test it one final time on the test data.
4) Deciding on an evaluation protocol
We must establish how the progress is measured, usually with three common evaluation protocols:
5) Data Preparation
6) Developing a model that does better than a baseline
Develop a model that has statistical power
Achieve statistical power: develop a small model that is able of beating a dummy (random) baseline. In the MNIST 10 digit-classification example, anything that achieves an accuracy greater than 0.1 can be said to have statistical power. It’s not always possible to achieve statistical power. The reason can be that the answer to the question you’re asking isn’t present in the input data.
Three key choices to build your first working model:
领英推荐
7) developing a model that overfits
Once we obtain a model that has statistical power, the question becomes, is your model sufficiently powerful? To figure out how big a model we need, we must develop a model that overfits by adding layers, making layers bigger and train for more epochs. When we see that the model’s performance on the validation data begins to degrade, while the performance on the training data in increasing, we have achieved overfitting
8) Regularizing the model and tuning the hyperparameters
This step will take the most time: repeatedly modify your model, train it, evaluate on validation data, modify it again, and repeat, until the model is as good as it can get. These are some things you should try: dropout, add or remove layers, L1 and/or L2 regularization, the learning rate of the optimizer, and so on.
Every time you use feedback from your validation process to tune your model, you leak information about the validation process into the model.
Repeated just a few times, this is safe; but done systematically over many iterations, it will eventually cause your model to overfit to the validation process (even though no model is directly trained on any of the validation data). This makes the evaluation process less reliable.
9) Testing and production
Once we have a satisfactory model configuration, we train the final production model on all the available data (training and validation) and evaluate it one last time on the never-seen test set.
If it turns out that performance on the test set is significantly worse than the performance measured on the validation data, this may mean that:
In this case, you may want to review the model tuning process and/ot switch to a more reliable evaluation protocol (such as iterated K-fold validation)
That's it :)
This article is a summary of Chapter 4: Fundamentals of machine learning, Chollet, Francois. Deep learning with Python. 2017. Fran?ois Chollet works on deep learning at Google. He is the creator of the Keras deep-learning library, as well as a contributor to the TensorFlow machine learning framework. He also does deep-learning research, with a focus on computer vision and the application of machine learning to formal reasoning.
Another useful blog "A Recipe for Training Neural Networks" by Andrej Karpathy
Best Regards
Machine Learning Engineer/Researcher at Siemens | AI MSc student | Computer vision | NLP
3 年Very insightful thank you ??
Product Management Leader | Technical Strategist | Innovator | Entrepreneur
3 年Magnificent as usual ??