登录查看更多内容

9 Steps for solving any machine learning problem

Ibrahim Sobh - PhD

?? Senior Expert of Artificial Intelligence, Valeo Group | LinkedIn Top Voice | Machine Learning | Deep Learning | Data Science | Computer Vision | NLP | Developer | Researcher | Lecturer

发布日期: 2021年8月28日

In this article, we will present a universal blueprint that we can use to attack and solve any machine-learning problem.

1) Defining the problem

First, you must define the problem. What is your input and what are you trying to predict? for example, you can learn to classify images if you have annotated data set. Same for sentiment prediction movie reviews. What type of problem are you facing? It depends on the applications, such as binary or multiclass classification, multiclass, multilabel classification, scalar or vector regression. Unsupervised learning such as clustering, or reinforcement learning.

One class of unsolvable problems we should be aware of is nonstationary problems.

For example, using machine learning trained on past data to predict the future is making the assumption that the future will behave like the past. That often isn’t the case.

2) Choosing a measure of success

To attain success, you must first define what success means to you. Are you looking for precision, recall, or customer retention rate? Loss function selection is usually based on the success metric. In other words, what you want your model to maximize or minimize, where it should be aligned with the higher-level business goals.

3) Data Splitting

Splitting the available data into three sets: training, validation, and testing. The model is trained on the training data and evaluated and tuned on the validation data. Once the model is ready for prime time, you test it one final time on the test data.

4) Deciding on an evaluation protocol

We must establish how the progress is measured, usually with three common evaluation protocols:

Maintaining a hold-out validation set: The way to go when you have plenty of data
Doing K-fold cross-validation: The right choice when you have too few samples for hold-out validation to be reliable
Doing iterated K-fold validation: For performing highly accurate model evaluation when little data is available

5) Data Preparation

Data should be formatted as vectors (tensors) for neural networks
The values should be scaled to small values: for example, in range of [-1, 1] or [0, 1]
For heterogeneous data, where different features have values in different ranges, the data should be normalized (zero mean, one variance).
If needed, conduct feature engineering, especially for small-data problems

6) Developing a model that does better than a baseline

Develop a model that has statistical power

Achieve statistical power: develop a small model that is able of beating a dummy (random) baseline. In the MNIST 10 digit-classification example, anything that achieves an accuracy greater than 0.1 can be said to have statistical power. It’s not always possible to achieve statistical power. The reason can be that the answer to the question you’re asking isn’t present in the input data.

Three key choices to build your first working model:

Last-layer activation: For instance, the IMDB binary classification sigmoid is used. For general regression, no activation can be used activation.
Loss function: For IMDB binary_crossentropy, and for regression mean square error (MSE).
Optimization: Optimizer and learning rate, for example, rmsprop or ADAM with the default learning rate.

Data & Analytics 1 年前

What is Machine Learning ?

5G 6G & O-RAN 1 年前

Regularization Techniques in Machine Learning

Sanjay Kumar MBA,MS,PhD 6 个月前

7) developing a model that overfits

Once we obtain a model that has statistical power, the question becomes, is your model sufficiently powerful? To figure out how big a model we need, we must develop a model that overfits by adding layers, making layers bigger and train for more epochs. When we see that the model’s performance on the validation data begins to degrade, while the performance on the training data in increasing, we have achieved overfitting

8) Regularizing the model and tuning the hyperparameters

This step will take the most time: repeatedly modify your model, train it, evaluate on validation data, modify it again, and repeat, until the model is as good as it can get. These are some things you should try: dropout, add or remove layers, L1 and/or L2 regularization, the learning rate of the optimizer, and so on.

Every time you use feedback from your validation process to tune your model, you leak information about the validation process into the model.

Repeated just a few times, this is safe; but done systematically over many iterations, it will eventually cause your model to overfit to the validation process (even though no model is directly trained on any of the validation data). This makes the evaluation process less reliable.

9) Testing and production

Once we have a satisfactory model configuration, we train the final production model on all the available data (training and validation) and evaluate it one last time on the never-seen test set.

If it turns out that performance on the test set is significantly worse than the performance measured on the validation data, this may mean that:

The validation procedure wasn’t reliable
The model began overfitting to the validation data while tuning the parameters.

In this case, you may want to review the model tuning process and/ot switch to a more reliable evaluation protocol (such as iterated K-fold validation)

That's it :)

This article is a summary of Chapter 4: Fundamentals of machine learning, Chollet, Francois. Deep learning with Python . 2017. Fran?ois Chollet works on deep learning at Google. He is the creator of the Keras deep-learning library, as well as a contributor to the TensorFlow machine learning framework. He also does deep-learning research, with a focus on computer vision and the application of machine learning to formal reasoning.

Another useful blog "A Recipe for Training Neural Networks " by Andrej Karpathy

Best Regards

Mohamed Abdelkarim

Machine Learning Engineer/Researcher at Siemens | AI MSc student | Computer vision | NLP

3 年

Very insightful thank you ??

1 次回应

Ahmed Ibrahim

Product Management Leader | Technical Strategist | Innovator | Entrepreneur

3 年

Magnificent as usual ??

1 次回应

查看更多评论

要查看或添加评论，请登录

Ibrahim Sobh - PhD的更多文章

How to Learn Artificial Intelligence: A Beginner’s Guide

2024年5月31日

How to Learn Artificial Intelligence: A Beginner’s Guide

Artificial Intelligence (AI) is a fascinating field that simulates human intelligence and task performance using…
[????????????] ?????????????????? ???????????? explained with code ??

2023年1月28日

[????????????] ?????????????????? ???????????? explained with code ??

"During the last two years there has been a plethora of large generative models such as ChatGPT or Stable Diffusion…

2 条评论
A conversation with ChatGPT about AI, study roadmap, applications, interview questions with answers, salaries, and more!

2023年1月21日

A conversation with ChatGPT about AI, study roadmap, applications, interview questions with answers, salaries, and more!

Hello everyone, and thank you all for being here today! Let me introduce our new star, the ChatGPT, who will discuss…
10 Object detectors with code [YOLOF, YOLOX, DETR, Deformable DETR, SparseR-CNN, VarifocalNet, PAA, SABL, ATSS, Double Heads]

2022年2月17日

10 Object detectors with code [YOLOF, YOLOX, DETR, Deformable DETR, SparseR-CNN, VarifocalNet, PAA, SABL, ATSS, Double Heads]

In this article, 10 well-known pre-trained object detectors are loaded and used in a standard and easy way. YOLOF: You…

6 条评论
FNet: Do we need the attention layer at all? [Explained with code]

2021年10月30日

FNet: Do we need the attention layer at all? [Explained with code]

FNet: Mixing Tokens with Fourier Transforms "In this work, we investigate whether simpler token mixing mechanisms can…
Patches Are All You Need! [with code]

2021年10月28日

Patches Are All You Need! [with code]

"It is only a matter of time before Transformers become the dominant architecture for vision domains, just as they have…
MLP is all you need! [with code]

2021年10月23日

MLP is all you need! [with code]

From Google: MLP-Mixer: An all-MLP Architecture for Vision Main idea: "While convolutions and attention are both…

2 条评论
Anatomy of the Beast with many heads! [with code]

2021年6月12日

Anatomy of the Beast with many heads! [with code]

1) Introduction: In previous articles, we discussed the Transfomers, where Learning Representations of Variable Length…

2 条评论
The magic of XLM-R: Unsupervised Cross-lingual Representation Learning at Scale

2021年1月16日

The magic of XLM-R: Unsupervised Cross-lingual Representation Learning at Scale

The goal of this paper, by Facebook AI, is to improve cross-lingual language understanding (XLU). Previously, we…
How multilingual is Multilingual BERT?

2021年1月11日

How multilingual is Multilingual BERT?

This article is basically an extractive summary of the paper "How multilingual is Multilingual BERT?" by Google…

3 条评论

See all articles

9 Steps for solving any machine learning problem

Ibrahim Sobh - PhD

?? Senior Expert of Artificial Intelligence, Valeo Group | LinkedIn Top Voice | Machine Learning | Deep Learning | Data Science | Computer Vision | NLP | Developer | Researcher | Lecturer

1) Defining the problem

2) Choosing a measure of success

3) Data Splitting

4) Deciding on an evaluation protocol

5) Data Preparation

6) Developing a model that does better than a baseline

领英推荐

7) developing a model that overfits

8) Regularizing the model and tuning the hyperparameters

9) Testing and production

Ibrahim Sobh - PhD的更多文章

社区洞察

其他会员也浏览了

Understanding Machine Learning: The Future of Intelligent Systems

Image Analysis in Machine Learning: How It Works and Why It Matters

The Secret to Successful Machine Learning: Optimising Hyperparameters

What are "Tensors"

Machine Learning for Product Managers

Classification vs. Regression in Machine Learning

What Is Gradient Descent in Machine Learning?

Classification vs Regression

Modern Face Recognition with deep learning

Hyperparameter Tuning - Optimizing Machine Learning Models

1) Defining the problem

2) Choosing a measure of success

3) Data Splitting

4) Deciding on an evaluation protocol

5) Data Preparation

6) Developing a model that does better than a baseline

领英推荐

7) developing a model that overfits

8) Regularizing the model and tuning the hyperparameters

9) Testing and production

Ibrahim Sobh - PhD的更多文章

How to Learn Artificial Intelligence: A Beginner’s Guide

[????????????] ?????????????????? ???????????? explained with code ??

A conversation with ChatGPT about AI, study roadmap, applications, interview questions with answers, salaries, and more!

10 Object detectors with code [YOLOF, YOLOX, DETR, Deformable DETR, SparseR-CNN, VarifocalNet, PAA, SABL, ATSS, Double Heads]

FNet: Do we need the attention layer at all? [Explained with code]

Patches Are All You Need! [with code]

MLP is all you need! [with code]

Anatomy of the Beast with many heads! [with code]

The magic of XLM-R: Unsupervised Cross-lingual Representation Learning at Scale

How multilingual is Multilingual BERT?

社区洞察

其他会员也浏览了

Understanding Machine Learning: The Future of Intelligent Systems

Image Analysis in Machine Learning: How It Works and Why It Matters

The Secret to Successful Machine Learning: Optimising Hyperparameters

What are "Tensors"

Machine Learning for Product Managers

Classification vs. Regression in Machine Learning

What Is Gradient Descent in Machine Learning?

Classification vs Regression

Modern Face Recognition with deep learning

Hyperparameter Tuning - Optimizing Machine Learning Models