9 Steps for solving any machine learning problem

9 Steps for solving any machine learning problem

In this article, we will present a universal blueprint that we can use to attack and solve any machine-learning problem.

1) Defining the problem

First, you must define the problem. What is your input and what are you trying to predict? for example, you can learn to classify images if you have annotated data set. Same for sentiment prediction movie reviews. What type of problem are you facing? It depends on the applications, such as binary or multiclass classification, multiclass, multilabel classification, scalar or vector regression. Unsupervised learning such as clustering, or reinforcement learning.

One class of unsolvable problems we should be aware of is nonstationary problems.

For example, using machine learning trained on past data to predict the future is making the assumption that the future will behave like the past. That often isn’t the case.

2) Choosing a measure of success

To attain success, you must first define what success means to you. Are you looking for precision, recall, or customer retention rate? Loss function selection is usually based on the success metric. In other words, what you want your model to maximize or minimize, where it should be aligned with the higher-level business goals.

3) Data Splitting

Splitting the available data into three sets: training, validation, and testing. The model is trained on the training data and evaluated and tuned on the validation data. Once the model is ready for prime time, you test it one final time on the test data.

4) Deciding on an evaluation protocol

We must establish how the progress is measured, usually with three common evaluation protocols:

  • Maintaining a hold-out validation set: The way to go when you have plenty of data
  • Doing K-fold cross-validation: The right choice when you have too few samples for hold-out validation to be reliable
  • Doing iterated K-fold validation: For performing highly accurate model evaluation when little data is available

5) Data Preparation

  • Data should be formatted as vectors (tensors) for neural networks
  • The values should be scaled to small values: for example, in range of [-1, 1] or [0, 1]
  • For heterogeneous data, where different features have values in different ranges, the data should be normalized (zero mean, one variance).
  • If needed, conduct feature engineering, especially for small-data problems

6) Developing a model that does better than a baseline

Develop a model that has statistical power

Achieve statistical power: develop a small model that is able of beating a dummy (random) baseline. In the MNIST 10 digit-classification example, anything that achieves an accuracy greater than 0.1 can be said to have statistical power. It’s not always possible to achieve statistical power. The reason can be that the answer to the question you’re asking isn’t present in the input data.

Three key choices to build your first working model:

  1. Last-layer activation: For instance, the IMDB binary classification sigmoid is used. For general regression, no activation can be used activation.
  2. Loss function: For IMDB binary_crossentropy, and for regression mean square error (MSE).
  3. Optimization: Optimizer and learning rate, for example, rmsprop or ADAM with the default learning rate.

7) developing a model that overfits

Once we obtain a model that has statistical power, the question becomes, is your model sufficiently powerful? To figure out how big a model we need, we must develop a model that overfits by adding layers, making layers bigger and train for more epochs. When we see that the model’s performance on the validation data begins to degrade, while the performance on the training data in increasing, we have achieved overfitting

8) Regularizing the model and tuning the hyperparameters

This step will take the most time: repeatedly modify your model, train it, evaluate on validation data, modify it again, and repeat, until the model is as good as it can get. These are some things you should try: dropout, add or remove layers, L1 and/or L2 regularization, the learning rate of the optimizer, and so on.

Every time you use feedback from your validation process to tune your model, you leak information about the validation process into the model.

Repeated just a few times, this is safe; but done systematically over many iterations, it will eventually cause your model to overfit to the validation process (even though no model is directly trained on any of the validation data). This makes the evaluation process less reliable.

9) Testing and production

Once we have a satisfactory model configuration, we train the final production model on all the available data (training and validation) and evaluate it one last time on the never-seen test set.

If it turns out that performance on the test set is significantly worse than the performance measured on the validation data, this may mean that:

  1. The validation procedure wasn’t reliable
  2. The model began overfitting to the validation data while tuning the parameters.

In this case, you may want to review the model tuning process and/ot switch to a more reliable evaluation protocol (such as iterated K-fold validation)


That's it :)


This article is a summary of Chapter 4: Fundamentals of machine learning, Chollet, Francois. Deep learning with Python . 2017. Fran?ois Chollet works on deep learning at Google. He is the creator of the Keras deep-learning library, as well as a contributor to the TensorFlow machine learning framework. He also does deep-learning research, with a focus on computer vision and the application of machine learning to formal reasoning.

Another useful blog "A Recipe for Training Neural Networks " by Andrej Karpathy


Best Regards

Mohamed Abdelkarim

Machine Learning Engineer/Researcher at Siemens | AI MSc student | Computer vision | NLP

3 年

Very insightful thank you ??

Ahmed Ibrahim

Product Management Leader | Technical Strategist | Innovator | Entrepreneur

3 年

Magnificent as usual ??

要查看或添加评论,请登录

Ibrahim Sobh - PhD的更多文章

社区洞察

其他会员也浏览了