登录查看更多内容

How to Approach Model Optimization for AutoML

Damien Benveniste, PhD

Founder @ TheAiEdge | Building the largest AI professional community | Become an expert with an expert!

发布日期: 2024年6月14日

Since I started my career in Machine learning, I have worked hard to automate every aspect of my work. If I couldn't develop a fully production-ready machine learning at the click of a button, I was doing something wrong! I find it funny how you can recognize a senior machine learning engineer by how little he works to achieve the same results as a junior one working 10 times as hard!

AutoML has always been a subject dear to my heart, and I wanted to talk today about how we should approach the model optimization problem from an automation standpoint. I want to address the different angles we should consider when building an AutoML pipeline and show you a basic example of a pipeline.

Make sure to watch the video:

The optimization space

The model selection is the component that involves the ML algorithmic components. When we talk about “model selection“, we mean searching for the optimal model for a specific training dataset. If we have features X and a target Y, we would like to learn what the optimal transformation F from the data:

?? = ??(??)

The term “optimal“ implies we have a model performance metric, and the “optimal” model is the one that maximizes that metric.

There are different axes we can consider to optimize our model:

The model parameter space: this is the “space“ we optimize when we “train” a model through statistical learning. The parameters are learned using an optimization principle such as the Maximum likelihood estimation principle.

The model paradigm space: Many supervised learning algorithms could be used to solve the same problem. Algorithms like Naive Bayes, XGBoost, or Neural Network could perform very differently depending on the specific dataset.

The hyperparameter space: those are the model parameters we cannot optimize using statistical learning, but they are choices we need to make to set up our training run.

The model architecture space: this is more relevant for Neural Networks. The model architecture can be characterized by a set of hyperparameters, but it tends to be a more complex search than typical hyperparameters. The search space dimension can be as big as 10^40.

The feature space: We also need to select the right feature to feed to our model. Different models will react differently depending on the features we use. Too many features and we may overfit. Too few features and we may underfit.

The feature transformation space: We could consider many transformations to improve our model's performance, such as feature encoding or Box-Cox transformation.

领英推荐

Unleashing Machine Learning Power

Altug Tatlisu 2 个月前

50 Key Definitions in Machine Learning

Dr. John Martin 11 个月前

Thinking differently about . . . machine learning

David Knott 4 年前

The optimization strategies

Considering the complexity of those different subspaces, it is often impractical to attempt to solve the problem exactly, and we need to find ways to select a suitable model quickly.

Optimizing in sequence

The typical model optimization strategy is to optimize each axe separately in sequence. Modularizing the different optimization problems makes it easier for multiple people/teams to work on different aspects without stepping on each other’s toes.

A typical sequence of steps is as follows:

Optimizing the feature transformation space: this allows the potential injection of new features before selecting the right feature space.
Optimizing the feature space: now that the features are “better“because of the previous step, we can subset the best feature space.
Optimizing the model paradigm space: now that we have the “right“data, we can choose the right model.
Optimizing the model architecture space: if the model paradigm chosen in the previous step is a neural network, we need to optimize its architecture. Depending on how much flexibility we allow in the search, it is often easier to optimize the architecture first, independently from the hyperparameter search.
Optimizing the hyperparameter space: once we have a model paradigm and its architecture, it becomes easier to focus on the hyperparameters.
Optimizing the model parameter space: training the final model is the last step.

Joint-optimization

Optimizing in sequence will result in a suboptimal model because we are approximating the search. The feature selection module will select the “best“features in general, and the model paradigm module will determine the best model paradigm based on the features chosen in the previous step. However, for example, finding a model paradigm that would perform better had other features been selected is not impossible.

Therefore, we could consider jointly optimizing different axes together. For example, it is not uncommon to jointly optimize the feature space and the architecture space.

As always, with optimization problems, there is a balance between search accuracy and computational complexity. Each optimization space has a specific dimension. If there are N possible feature sets and M possible architectures, we need to search an N x M overall space to optimize for both axes jointly. However, if we optimize in sequence, the search complexity is only N + M. If, for example, we have 1M possible feature sets and 1M possible architectures, N x M = 1012 and N + M = 2M. This means it would take 500K (= 1012 / 2M) longer to find the exact optimal feature-architecture pair than an approximate one.

Many optimization processes have an iterative implementation, and we can use this to design pseudo-joint-optimization processes. For example, Recursive Feature Elimination (RFE) is a typical feature selection technique where Evolutionary algorithms such as Genetic algorithms (GA) are often used for architecture search. Both methods are iterative and converge slowly to an optimal solution.

We could merge those iterative processes to obtain a pseudo-joint optimization.

The search is still approximated (so fast), but the feature search considers the results from the architecture search and vice-versa.

Watch the video for more information.

Articles You May Have Missed!

The AiEdge

49,432 位关注者

Lopamudra Panda

Director-Data & Analytics|| Engineering ||Product Development|| Digital Transformation || Ex- Walmart || Ex- Unilever || Ex- Coca -Cola @CDMP @PMP @Togaf

2 个月

It's a critical step for scaling. Very insightful! Thanks !

Hardeep Chawla

Enterprise Sales Director at Zoho | Enabling Business Success with Scalable CRM & Digital Transformation Solutions

5 个月

Absolutely! Automating model optimization is crucial for scaling ML efforts efficiently. Excited to dive into your insights on building robust AutoML pipelines—it's the future of making AI accessible and impactful across industries!

Mohammed Lubbad ??

5 个月

That's a cool journey! Automating ML tasks can be a game-changer. What tools do you use for AutoML? Damien Benveniste

Neha Sharma

Research Scholar EXPLORING NEW TECHNOLOGIES

5 个月

Sir, please make one video on feature extraction in time series raw signal dataset.

1 次回应

Free AI Tools & ChatGPT Prompts ??

5 个月

Sounds like you've been grinding in Machine Learning. Automate away and let's dive into AutoML pipeline optimization. Damien Benveniste

查看更多评论

要查看或添加评论，请登录

Damien Benveniste, PhD的更多文章

How To Bring Machine Learning Projects to Success

2024年8月9日

How To Bring Machine Learning Projects to Success

To build a successful machine learning product, you need to understand how to manage a machine learning project. This…

7 条评论
LLMs MasterClass: Last Day for Early-Bird Price

2024年7月22日

LLMs MasterClass: Last Day for Early-Bird Price

Today is the last day to get early bird pricing (25%) for the Train, Fine-Tune, and Deploy Large Language Models…

3 条评论
Float32 vs Float16 vs BFloat16?

2024年7月19日

Float32 vs Float16 vs BFloat16?

Float32, Float16 or BFloat16! Why does that matter for Deep Learning? Those are just different levels of precision…

6 条评论
Train, Fine-Tune, and Deploy Large Language Models Bootcamp!

2024年7月15日

Train, Fine-Tune, and Deploy Large Language Models Bootcamp!

I am glad to introduce the Train, Fine-Tune, and Deploy Large Language Models Bootcamp. The BootCamp will start on…

5 条评论
The Position Encoding In Transformers!

2024年7月12日

The Position Encoding In Transformers!

Transformers and the self-attention are powerful architectures to enable large language models, but we need a mechanism…

11 条评论
Introduction to Machine Learning System Design!

2024年7月2日

Introduction to Machine Learning System Design!

Machine Learning System Design is one of my favorite aspects of Machine Learning. We start with a business idea, a…

13 条评论
Understanding How LoRA Adapters Work!

2024年6月28日

Understanding How LoRA Adapters Work!

LoRA Adapters are, to me, one of the smartest strategies used in Machine Learning in recent years! LoRA came as a very…

8 条评论
The Backpropagation Algorithm!

2024年6月25日

The Backpropagation Algorithm!

The backpropagation algorithm is the heart of deep learning! That is the core reason why we can have those advanced…

11 条评论
Understanding The Computational Graph in Neural Networks

2024年6月21日

Understanding The Computational Graph in Neural Networks

Do you know what is this computational graph used by deep learning frameworks like TensorFlow or PyTorch? The whole…

20 条评论
Understanding CatBoost!

2024年6月10日

Understanding CatBoost!

CatBoost might be the easiest supervised learning algorithm to use today on large tabular data. It is highly…

12 条评论

See all articles

How to Approach Model Optimization for AutoML

Damien Benveniste, PhD

Founder @ TheAiEdge | Building the largest AI professional community | Become an expert with an expert!

The optimization space

领英推荐

The optimization strategies

Optimizing in sequence

Joint-optimization

Articles You May Have Missed!

The AiEdge

49,432 位关注者

Damien Benveniste, PhD的更多文章

社区洞察

其他会员也浏览了

Algorithms in Business: Beyond the Buzz

ML Models

Machine Learning Simplified in 4 Minutes

Six preparation steps to leverage Machine Learning for hyperautomation

What Is Gradient Descent in Machine Learning?

Machine Learning

MACHINE LEARNING: THE INTELLIGENCE REVOLUTION

5 Questions to Ask About Machine Learning (ML)

Do You know what is the Machine Learning?

Classification in Machine Learning: An Introduction

The optimization space

领英推荐

The optimization strategies

Optimizing in sequence

Joint-optimization

Articles You May Have Missed!

The AiEdge

49,432 位关注者

Damien Benveniste, PhD的更多文章

How To Bring Machine Learning Projects to Success

LLMs MasterClass: Last Day for Early-Bird Price

Float32 vs Float16 vs BFloat16?

Train, Fine-Tune, and Deploy Large Language Models Bootcamp!

The Position Encoding In Transformers!

Introduction to Machine Learning System Design!

Understanding How LoRA Adapters Work!

The Backpropagation Algorithm!

Understanding The Computational Graph in Neural Networks

Understanding CatBoost!

社区洞察

其他会员也浏览了

Algorithms in Business: Beyond the Buzz

ML Models

Machine Learning Simplified in 4 Minutes

Six preparation steps to leverage Machine Learning for hyperautomation

What Is Gradient Descent in Machine Learning?

Machine Learning

MACHINE LEARNING: THE INTELLIGENCE REVOLUTION

5 Questions to Ask About Machine Learning (ML)

Do You know what is the Machine Learning?

Classification in Machine Learning: An Introduction