Column Transformer and Pipelines in Machine Learning
Zuhaib Ashraf
Innovating Today, Shaping Tomorrow: AI Solutions For Every Field. Let's talk about Artificial intelligence| Machine Learning | Deep Learning | Computer Vision | AIOps | MLOps | GDSC AI/ML Lead
Introduction:
When starting out or participating in competitions, it may seem beneficial to pre-process data in separate stages. However, when the goal is to develop a comprehensive machine learning project for production purposes, it becomes necessary to pre-process new data before feeding it to the model. Rewriting all the preprocessing steps each time can be time-consuming. To save time and effort, pipelines are employed. Machine learning pipelines enable us to execute all the preprocessing steps sequentially, and with the help of a Column Transformer, this can be achieved with just a single line of code.
Column Transformer:
Column Transformer is a tool in scikit-learn that helps us work with numerical and categorical data separately. It allows us to create and apply different transformations to specific columns of our data. To use Column Transformer, we need to provide a transformer object and specify the transformations we want to apply to each column. These transformations are passed in a tuple along with the column we want to apply them to.
To demonstrate column transformer, I use a toy data set of COVID.
The transformation we will build for:
Now We will see a detailed difference between doing code with column transformers and without column transformers:
Without Column Transformer
With Column Transformer
Machine Learning Pipelines:
Machine learning pipelines are like a series of connected steps, where the result of each step is passed to the next one. It's similar to how in neural networks, the output of one layer becomes the input for the next layer. Just like a pipeline carries water from one place to another, a machine learning pipeline carries data through each step until the final output is achieved. By using machine learning pipelines, the length of production code also reduces.
Demonstration of machine Learning Pipelines:
领英推荐
Production side code without using pipelines:
As we can see, in production code we have to implement every feature engineering step which is haptic because we have to take care of the sequence that we use in implementing side.
2. using Machine Learning Pipeline:
Production code using pipelines:
Conclusion:
By using column transformers and machine learning pipelines, we can reduce the lines of code which also reduce the code reading complexity and also make our production side code easy.