Machine Learning Blog – 8

Machine Learning Blog – 8

Multi-Layer Stacking Ensemble and Optuna Hyperparameter Tuning

In this blog I will illustrate and link to the code of a Multi-Layer Stacking Ensemble or Model Stacking for a problem from Kaggle competition. It took me some time to get this concept, so I will explain using a simple diagram hoping it’s easier to get. And the best way as usual is to write code and try it.

?Input data and output predictions

1.???We are given a training set with few features (x) and label (y).

a.???Divide this training set to create training set xtrain, ytrain and validation set xvalid, yvalid

b.???Train a model using xtrain, ytrain and validate against xvalid, yvalid to get preds_valid

c.????To achieve good cross-validation the best way is to use folds which generate different xtrain, yrain and xvalid, yvalid in each fold

2.???And we are given a test set with no label (y)

a.???After we train and validate a model, we can predict the answers for the test set in test_preds

b.???And then submit the test_preds to the competition

3.???For a model and for a train set we need to optimize the Hyperparameters to get best results, for which I used Optuna Hyperparameter Tuning

?

Single-Layer Model – Github code

This is simple one model train using xtrain, ytrain and validate against xvalid, yvalid in 5 folds and then generate the test_preds and submit to competition

XGBRegressor gave best results


Multi-Layer Ensemble (Using multiple Models) – Github code

The concept is to train multiple models on exact same training data. The key is models should be sufficiently diverse which means one model will be good at some and not good at some data. If we Stack them it’s possible that the combined result of the 5 models (strong learner) will be better than one individual model (weak learner).

And on top of this we can have multiple layers where output of one layer serves as input for next.

Its best to use folds for cross validation and do Hyperparameter tuning at each layer

No alt text provided for this image


Layer 0 - 5 models : XGBRegressor, LGBMRegressor, CatBoostRegressor, RandomForestRegressor, LinearRegression

Layer 1 - 3 models : XGBRegressor, LGBMRegressor, CatBoostRegressor

Layer 2 - final model : XGBRegressor

And in every Layer I used Optuna Hyperparameter Tuning to get best results


TODO:

  • Layer 0 RandomForest and LinearRegression are not giving good results
  • Change to another model and Hypertune and then try running all levels and see the final score

Mahtab Syed, Melbourne 20 Nov 2021

Acknowledgements:

  1. Aurelien Geron – Book Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow
  2. Abhishek Thakur - Videos in YouTube (https://www.youtube.com/c/AbhishekThakurAbhi ) and Kaggle code (https://www.kaggle.com/abhishek )

Mahtab Syed

Data and AI Leader | AI Solutions | Cloud Architecture(Azure, GCP, AWS) | Data Engineering, Generative AI, Artificial Intelligence, Machine Learning and MLOps Programs | Coding and Kaggle

3 年

Thanks Abhishek Thakur and Aurélien Géron - I think I understand Model Stacking now. Need to find more diverse models to get better results.

回复

要查看或添加评论,请登录

Mahtab Syed的更多文章

  • AI Agents or Agentic Systems

    AI Agents or Agentic Systems

    In the new year 2025 we see everyone talking about “Agents” or Agent like systems called “Agentic Systems”. I recently…

    1 条评论
  • Develop your career in AI in 2025

    Develop your career in AI in 2025

    The hype of AI, especially in 2023 and continuing in 2024 and now in 2025, has created a supply of various courses. And…

    1 条评论
  • Generative AI - Learnings 2023

    Generative AI - Learnings 2023

    This year 2023 has been the year of Generative AI using Large Language Models both closed source and open source. Like…

    2 条评论
  • On Emotional Intelligence

    On Emotional Intelligence

    From my old archives - published on Tue 02 Nov 2010 in https://mahtabsyed.blogspot.

    1 条评论
  • What is Data Governance? And why is it necessary especially now?

    What is Data Governance? And why is it necessary especially now?

    With the advent of Machine Learning and Artificial Intelligence for Predictions (Business metrics like Inventory…

  • Its end of year again… And I have no new year resolutions…

    Its end of year again… And I have no new year resolutions…

    Its 31 Dec 2022, an end of a year again… And I am quite happy and contented. ?? I have a clear vision of what I will do…

    3 条评论
  • Machine Learning Blog – 9

    Machine Learning Blog – 9

    Machine Learning using 3 ways - Full code vs. No Code vs.

    3 条评论
  • Winning with life which keeps throwing new challenges every day...

    Winning with life which keeps throwing new challenges every day...

    I had written this self care tip few months back which I thought its better to be published as an article..

    2 条评论
  • The Silence within

    The Silence within

    Its peak winter in Melbourne and early morning of Wed 29 May 2019, and so far it’s the coldest day this year. I am at…

  • This year 2021… was in the trenches of worries

    This year 2021… was in the trenches of worries

    This year 2021… was in the trenches of worries due to Covid lockdowns, number of daily cases, economic slowdown…

    1 条评论

社区洞察

其他会员也浏览了