Implementing End-to-End Machine Learning Pipelines Using Scikit-Learn and Python

Implementing End-to-End Machine Learning Pipelines Using Scikit-Learn and Python

Pipelines in Scikit-learn streamline the process of machine learning model development by chaining multiple steps, from preprocessing to model training, into one cohesive workflow. This modular approach not only enhances code readability and maintainability but also ensures that all transformations are correctly applied to both the training and testing datasets. By mastering pipelines, we can efficiently build and deploy robust machine-learning solutions.


In this article we will delve into pipelines and machine learning using Scikit-learn and Python.


Pipelines in Scikit-learn allow us to sequentially apply a list of transforms and a final estimator. This means we can chain multiple processes, from data preprocessing to the final model application, into one streamlined workflow.

Click in this Link to get the Python Notebook.


Example 1: Basic Pipeline

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression

steps = [('scaler', StandardScaler()), ('classifier', LogisticRegression())]

pipe = Pipeline(steps)

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split

X, y = make_classification(n_samples=1000, n_features=20, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

pipe.fit(X_train, y_train)

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split

X, y = make_classification(n_samples=1000, n_features=20, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

pipe.fit(X_train, y_train)

y_pred = pipe.predict(X_test)
        

Visualizing Pipelines

from sklearn import set_config
set_config(display='diagram')

pipe
        

Example 2: Complex Pipeline with Dimensionality Reduction

from sklearn.decomposition import PCA
from sklearn.svm import SVC

steps = [
    ('scaler', StandardScaler()), 
    ('pca', PCA(n_components=3)), 
    ('classifier', SVC())
]
pipe2 = Pipeline(steps)

pipe2.fit(X_train, y_train)
y_pred2 = pipe2.predict(X_test)        

Example 3: Column Transformer

from sklearn.compose import ColumnTransformer
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import OneHotEncoder

numerical_features = ['num_feature1', 'num_feature2']
categorical_features = ['cat_feature1', 'cat_feature2']

numerical_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='mean')),
    ('scaler', StandardScaler())
])

categorical_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='constant', fill_value='missing')),
    ('onehot', OneHotEncoder(handle_unknown='ignore'))
])        
preprocessor = ColumnTransformer(
    transformers=[
        ('num', numerical_transformer, numerical_features),
        ('cat', categorical_transformer, categorical_features)
    ]
)        
final_pipeline = Pipeline(steps=[('preprocessor', preprocessor), ('classifier', LogisticRegression())])

final_pipeline.fit(X_train, y_train)
final_predictions = final_pipeline.predict(X_test)
        

要查看或添加评论,请登录

Nasir Uddin Ahmed的更多文章

社区洞察

其他会员也浏览了