Machine Learning Blog – 9
Image credit - www.istockphoto.com

Machine Learning Blog – 9

Machine Learning using 3 ways - Full code vs. No Code vs. Automated ML

I have been coding my models using Open-source technologies (Python, Pandas, NumPy and matplotlib, Scikit-learn and TensorFlow) in a Jupyter notebook on Google Colab using CPU / GPU. And now I am trying to make an enterprise grade application using MLOps (Azure Cloud, Azure DevOps and MLflow)

I had heard of "No Code" and "Auto ML" and I though let's give it a try with same data and compare accuracy of prediction against "Full code" where we have full control of the model.

  • Full code (above stack all my code using known algorithms) vs
  • No Code (Azure ML Designer) vs
  • Auto ML (Azure Automated ML)

Data

  • Bulldozers Regression Kaggle problem
  • Data has 412,698 rows and 104 columns which is a good size
  • Many columns had more than 50% data missing, Date was bundled as one column, and there are few numerical and most object(string) columns which had to be converted to categories
  • After Data Transformation and Feature Engineering from this dataset (https://www.kaggle.com/c/bluebook-for-bulldozers/data ) the transformed data set is created which is used as input to training (the Github code includes this transformation part)

ML Model?- Trained a Regression model using 3 ways

  1. Full code (Python, scikit-learn with XGBRegressor)
  2. No Code(Azure ML Designer with 2 models)
  3. Auto ML(Azure ML)

1. Full code (Scikit-learn with XGBRegressor)

  • GitHub code here (https://github.com/mahtabsyed/Machine-Learning-Full-code-vs-No-Code-vs-Automated-ML/blob/main/Kaggle_Bulldozers_Regression.ipynb)
  • Python code
  • XGBRegressor
  • Using kFolds for cross validation
  • Hyperparameter tuning using Optuna (The model without Hyperparameter tuning proved to be better) - check the code (link above)
  • Evaluation - # BEST RMSE and MSE SO FAR!
  • # MAE: 57.002996
  • # RMSE: 258.555955

2. No Code(Azure ML Designer )

Models cheat sheet https://docs.microsoft.com/en-us/azure/machine-learning/algorithm-cheat-sheet?

  • Azure Machine Learning Pipeline - Model 1 and 2
  • Data - same
  • Model - Boosted Decision Tree Regression
  • Evaluation - Not good compared to my full code XGBBoostRegressor which is ~50 times better
  • # MAE: 5731.167302
  • # RMSE: 8571.87339

No alt text provided for this image

3. Automated ML (Azure ML)

  • This is quite easy to use
  • Specify the data and the Label
  • Specify the Compute Cluster and cross validation method (kFold)
  • And it identifies as a Regression task
  • Runs for quite a long time (about 6 hours) - To note : I provisioned CPU Compute Cluster and not GPU
  • Evaluation - Both MAE and RMSE are worse than No Code and quite poor compared to Full Code

No alt text provided for this image
No alt text provided for this image

So, for now (using vanilla model training) Full code wins… ??


Melbourne, 07 Oct 2022

Rebecca Vaksman

I bring Hiring, Talent Management, People Experience, Learning, Growth & Engagement together in the tech space ??

2 年

POV: When your people managers are constantly learning, experimenting, empowering their teams and keeping up with the industry... ??????

Miriam P.

Communications | Culture | Marketing

2 年

How could I resist reading this Mahtab Syed - cute dogs!

要查看或添加评论,请登录

Mahtab Syed的更多文章

  • AI Agents or Agentic Systems

    AI Agents or Agentic Systems

    In the new year 2025 we see everyone talking about “Agents” or Agent like systems called “Agentic Systems”. I recently…

    1 条评论
  • Develop your career in AI in 2025

    Develop your career in AI in 2025

    The hype of AI, especially in 2023 and continuing in 2024 and now in 2025, has created a supply of various courses. And…

    1 条评论
  • Generative AI - Learnings 2023

    Generative AI - Learnings 2023

    This year 2023 has been the year of Generative AI using Large Language Models both closed source and open source. Like…

    2 条评论
  • On Emotional Intelligence

    On Emotional Intelligence

    From my old archives - published on Tue 02 Nov 2010 in https://mahtabsyed.blogspot.

    1 条评论
  • What is Data Governance? And why is it necessary especially now?

    What is Data Governance? And why is it necessary especially now?

    With the advent of Machine Learning and Artificial Intelligence for Predictions (Business metrics like Inventory…

  • Its end of year again… And I have no new year resolutions…

    Its end of year again… And I have no new year resolutions…

    Its 31 Dec 2022, an end of a year again… And I am quite happy and contented. ?? I have a clear vision of what I will do…

    3 条评论
  • Winning with life which keeps throwing new challenges every day...

    Winning with life which keeps throwing new challenges every day...

    I had written this self care tip few months back which I thought its better to be published as an article..

    2 条评论
  • The Silence within

    The Silence within

    Its peak winter in Melbourne and early morning of Wed 29 May 2019, and so far it’s the coldest day this year. I am at…

  • This year 2021… was in the trenches of worries

    This year 2021… was in the trenches of worries

    This year 2021… was in the trenches of worries due to Covid lockdowns, number of daily cases, economic slowdown…

    1 条评论
  • Machine Learning Blog – 8

    Machine Learning Blog – 8

    Multi-Layer Stacking Ensemble and Optuna Hyperparameter Tuning In this blog I will illustrate and link to the code of a…

    1 条评论

社区洞察

其他会员也浏览了