登录查看更多内容

Column Transformer and Pipelines in Machine Learning

Zuhaib Ashraf

Innovating Today, Shaping Tomorrow: AI Solutions For Every Field. Let's talk about Artificial intelligence| Machine Learning | Deep Learning | Computer Vision | AIOps | MLOps | GDSC AI/ML Lead

发布日期: 2023年7月14日

Introduction:

When starting out or participating in competitions, it may seem beneficial to pre-process data in separate stages. However, when the goal is to develop a comprehensive machine learning project for production purposes, it becomes necessary to pre-process new data before feeding it to the model. Rewriting all the preprocessing steps each time can be time-consuming. To save time and effort, pipelines are employed. Machine learning pipelines enable us to execute all the preprocessing steps sequentially, and with the help of a Column Transformer, this can be achieved with just a single line of code.

Column Transformer:

Column Transformer is a tool in scikit-learn that helps us work with numerical and categorical data separately. It allows us to create and apply different transformations to specific columns of our data. To use Column Transformer, we need to provide a transformer object and specify the transformations we want to apply to each column. These transformations are passed in a tuple along with the column we want to apply them to.

To demonstrate column transformer, I use a toy data set of COVID.

The transformation we will build for:

Missing value imputation
Ordinal Encoding
One Hot Encoding

Now We will see a detailed difference between doing code with column transformers and without column transformers:

No alt text provided for this image — Fiigure 1: Importing Libraries and Data set

Without Column Transformer

With Column Transformer

Machine Learning Pipelines:

Machine learning pipelines are like a series of connected steps, where the result of each step is passed to the next one. It's similar to how in neural networks, the output of one layer becomes the input for the next layer. Just like a pipeline carries water from one place to another, a machine learning pipeline carries data through each step until the final output is achieved. By using machine learning pipelines, the length of production code also reduces.

Demonstration of machine Learning Pipelines:

Without using machine learning Pipelines:

Tyler Blalock 1 个月前

Parametric and Nonparametric Machine Learning Algorithm

Mansoor Ahmed 2 年前

The Rise of Automated Machine Learning

Pritha Bose 7 个月前

Production side code without using pipelines:

As we can see, in production code we have to implement every feature engineering step which is haptic because we have to take care of the sequence that we use in implementing side.

2. using Machine Learning Pipeline:

Production code using pipelines:

Conclusion:

By using column transformers and machine learning pipelines, we can reduce the lines of code which also reduce the code reading complexity and also make our production side code easy.

Column Transformer and Pipelines in Machine Learning

Zuhaib Ashraf

Innovating Today, Shaping Tomorrow: AI Solutions For Every Field. Let's talk about Artificial intelligence| Machine Learning | Deep Learning | Computer Vision | AIOps | MLOps | GDSC AI/ML Lead

领英推荐

更多精彩文章

社区洞察

其他会员也浏览了

Techniques Empowering Machine Learning on Graph Data

Normalization in Machine learning

What is needed to run Machine Learning at Scale ?

The Big 3 of Machine Learning Tasks

Model Selection

Decoding Machine Learning: A Strategic Approach to Model Selection

Machine Learning Topic 2: Complete Guide to Building, Deploying, and Maintaining a Machine Learning Model

Categorization of ML Algos:

A Comprehensive Guide to Optimization Techniques in Machine Learning

Machine Learning

领英推荐

Feature Transformation Techniques

2023年7月18日

Encoding Features

2023年7月1日

Introduction to Feature Engineering

2023年6月27日

Understanding Data and performing EDA

2023年6月20日

How to frame a machine learning model?

2023年6月11日

Tensors

2023年6月9日