登录查看更多内容

Machine Learning Blog – 8

Mahtab Syed

Data and AI Leader | AI Solutions | Cloud Architecture(Azure, GCP, AWS) | Data Engineering, Generative AI, Artificial Intelligence, Machine Learning and MLOps Programs | Coding and Kaggle

发布日期: 2021年11月20日

Multi-Layer Stacking Ensemble and Optuna Hyperparameter Tuning

In this blog I will illustrate and link to the code of a Multi-Layer Stacking Ensemble or Model Stacking for a problem from Kaggle competition. It took me some time to get this concept, so I will explain using a simple diagram hoping it’s easier to get. And the best way as usual is to write code and try it.

?Input data and output predictions

1.???We are given a training set with few features (x) and label (y).

a.???Divide this training set to create training set xtrain, ytrain and validation set xvalid, yvalid

b.???Train a model using xtrain, ytrain and validate against xvalid, yvalid to get preds_valid

c.????To achieve good cross-validation the best way is to use folds which generate different xtrain, yrain and xvalid, yvalid in each fold

2.???And we are given a test set with no label (y)

a.???After we train and validate a model, we can predict the answers for the test set in test_preds

b.???And then submit the test_preds to the competition

3.???For a model and for a train set we need to optimize the Hyperparameters to get best results, for which I used Optuna Hyperparameter Tuning

Single-Layer Model – Github code

This is simple one model train using xtrain, ytrain and validate against xvalid, yvalid in 5 folds and then generate the test_preds and submit to competition

XGBRegressor gave best results

领英推荐

Understanding XGBoost: A Powerful Machine Learning…

Rahul Sharma 1 年前

Regularization in Machine Learning(Layman Terms…

Vivek Chaudhary 5 年前

Comprehensive Guide to MLflow: Managing the Machine…

Phaneendra G 6 个月前

Multi-Layer Ensemble (Using multiple Models) – Github code

The concept is to train multiple models on exact same training data. The key is models should be sufficiently diverse which means one model will be good at some and not good at some data. If we Stack them it’s possible that the combined result of the 5 models (strong learner) will be better than one individual model (weak learner).

And on top of this we can have multiple layers where output of one layer serves as input for next.

Its best to use folds for cross validation and do Hyperparameter tuning at each layer

Layer 0 - 5 models : XGBRegressor, LGBMRegressor, CatBoostRegressor, RandomForestRegressor, LinearRegression

Layer 1 - 3 models : XGBRegressor, LGBMRegressor, CatBoostRegressor

Layer 2 - final model : XGBRegressor

And in every Layer I used Optuna Hyperparameter Tuning to get best results

TODO:

Layer 0 RandomForest and LinearRegression are not giving good results
Change to another model and Hypertune and then try running all levels and see the final score

Mahtab Syed, Melbourne 20 Nov 2021

Acknowledgements:

Aurelien Geron – Book Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow
Abhishek Thakur - Videos in YouTube (https://www.youtube.com/c/AbhishekThakurAbhi ) and Kaggle code (https://www.kaggle.com/abhishek )

Mahtab Syed

Data and AI Leader | AI Solutions | Cloud Architecture(Azure, GCP, AWS) | Data Engineering, Generative AI, Artificial Intelligence, Machine Learning and MLOps Programs | Coding and Kaggle

3 年

Thanks Abhishek Thakur and Aurélien Géron - I think I understand Model Stacking now. Need to find more diverse models to get better results.

要查看或添加评论，请登录

Mahtab Syed的更多文章

AI Agents or Agentic Systems

2025年3月10日

AI Agents or Agentic Systems

In the new year 2025 we see everyone talking about “Agents” or Agent like systems called “Agentic Systems”. I recently…

1 条评论
Develop your career in AI in 2025

2023年12月27日

Develop your career in AI in 2025

The hype of AI, especially in 2023 and continuing in 2024 and now in 2025, has created a supply of various courses. And…

1 条评论
Generative AI - Learnings 2023

2023年12月21日

Generative AI - Learnings 2023

This year 2023 has been the year of Generative AI using Large Language Models both closed source and open source. Like…

2 条评论
On Emotional Intelligence

2023年10月3日

On Emotional Intelligence

From my old archives - published on Tue 02 Nov 2010 in https://mahtabsyed.blogspot.

1 条评论
What is Data Governance? And why is it necessary especially now?

2023年3月26日

What is Data Governance? And why is it necessary especially now?

With the advent of Machine Learning and Artificial Intelligence for Predictions (Business metrics like Inventory…
Its end of year again… And I have no new year resolutions…

2022年12月31日

Its end of year again… And I have no new year resolutions…

Its 31 Dec 2022, an end of a year again… And I am quite happy and contented. ?? I have a clear vision of what I will do…

3 条评论
Machine Learning Blog – 9

2022年10月7日

Machine Learning Blog – 9

Machine Learning using 3 ways - Full code vs. No Code vs.

3 条评论
Winning with life which keeps throwing new challenges every day...

2022年3月27日

Winning with life which keeps throwing new challenges every day...

I had written this self care tip few months back which I thought its better to be published as an article..

2 条评论
The Silence within

2022年2月7日

The Silence within

Its peak winter in Melbourne and early morning of Wed 29 May 2019, and so far it’s the coldest day this year. I am at…
This year 2021… was in the trenches of worries

2022年1月1日

This year 2021… was in the trenches of worries

This year 2021… was in the trenches of worries due to Covid lockdowns, number of daily cases, economic slowdown…

1 条评论

See all articles

Machine Learning Blog – 8

Mahtab Syed

Data and AI Leader | AI Solutions | Cloud Architecture(Azure, GCP, AWS) | Data Engineering, Generative AI, Artificial Intelligence, Machine Learning and MLOps Programs | Coding and Kaggle

Multi-Layer Stacking Ensemble and Optuna Hyperparameter Tuning

?Input data and output predictions

1.???We are given a training set with few features (x) and label (y).

2.???And we are given a test set with no label (y)

3.???For a model and for a train set we need to optimize the Hyperparameters to get best results, for which I used Optuna Hyperparameter Tuning

Single-Layer Model – Github code

XGBRegressor gave best results

领英推荐

Multi-Layer Ensemble (Using multiple Models) – Github code

Layer 0 - 5 models : XGBRegressor, LGBMRegressor, CatBoostRegressor, RandomForestRegressor, LinearRegression

Layer 1 - 3 models : XGBRegressor, LGBMRegressor, CatBoostRegressor

Layer 2 - final model : XGBRegressor

And in every Layer I used Optuna Hyperparameter Tuning to get best results

TODO:

Mahtab Syed的更多文章

社区洞察

其他会员也浏览了

Comprehensive Guide to MLflow: Managing the Machine Learning Lifecycle

Hands-on ML Series, Episode 3: Testting and Validation of ML System

Hyperband

Basic Machine Learning: The k-Nearest Neighbor (k-NN) Classifier

Supervised ML Model on YOLOv5 with a Custom Dataset ??

Machine Learning: A Complete and Detailed Overview

Machine Learning Life Cycle Management using MLflow

Gradient Descent Algorithms: A Deep Dive

Machine Learning in Action (part 1)

Educated Hyperparameter Optimization

Multi-Layer Stacking Ensemble and Optuna Hyperparameter Tuning

?Input data and output predictions

1.???We are given a training set with few features (x) and label (y).

2.???And we are given a test set with no label (y)

3.???For a model and for a train set we need to optimize the Hyperparameters to get best results, for which I used Optuna Hyperparameter Tuning

Single-Layer Model – Github code

XGBRegressor gave best results

领英推荐

Multi-Layer Ensemble (Using multiple Models) – Github code

Layer 0 - 5 models : XGBRegressor, LGBMRegressor, CatBoostRegressor, RandomForestRegressor, LinearRegression

Layer 1 - 3 models : XGBRegressor, LGBMRegressor, CatBoostRegressor

Layer 2 - final model : XGBRegressor

And in every Layer I used Optuna Hyperparameter Tuning to get best results

TODO:

Mahtab Syed的更多文章

AI Agents or Agentic Systems

Develop your career in AI in 2025

Generative AI - Learnings 2023

On Emotional Intelligence

What is Data Governance? And why is it necessary especially now?

Its end of year again… And I have no new year resolutions…

Machine Learning Blog – 9

Winning with life which keeps throwing new challenges every day...

The Silence within

This year 2021… was in the trenches of worries

社区洞察

其他会员也浏览了

Comprehensive Guide to MLflow: Managing the Machine Learning Lifecycle

Hands-on ML Series, Episode 3: Testting and Validation of ML System

Hyperband

Basic Machine Learning: The k-Nearest Neighbor (k-NN) Classifier

Supervised ML Model on YOLOv5 with a Custom Dataset ??

Machine Learning: A Complete and Detailed Overview

Machine Learning Life Cycle Management using MLflow

Gradient Descent Algorithms: A Deep Dive

Machine Learning in Action (part 1)

Educated Hyperparameter Optimization