Feature Scaling in Machine Learning: A Comprehensive Guide

Feature Scaling in Machine Learning: A Comprehensive Guide


Feature scaling is a crucial step in preparing your data for machine learning. When different features in a dataset have varying scales (e.g., age in years vs. income in dollars), it can significantly impact the performance of certain algorithms. Two popular methods for feature scaling are standardization (Z-score normalization) and min-max scaling. These techniques ensure that the data is on a consistent scale, which is essential for algorithms like Principal Component Analysis (PCA), K-Nearest Neighbors (KNN), SVMs, and neural networks.

In this guide, we’ll explain the concepts of standardization and min-max scaling using simple code examples. Then, we’ll see how scaling affects the performance of PCA and how that impacts classification accuracy.

By the end, you’ll understand when and why to apply these scaling techniques and how they improve model performance in real-world tasks.

Why Does Feature Scaling Matter?

Algorithms like K-Nearest Neighbors (KNN) and Principal Component Analysis (PCA) are sensitive to the scale of input features. Without scaling, features with larger ranges (like age in years vs. salary in dollars) can dominate the model's learning process, leading to poor performance.


If you’re working with a dataset where age is between 0 and 100, and income is between 0 and 100,000, the income feature will completely dominate the distance calculations in KNN, making the model less sensitive to changes in the age feature.

By scaling the data, we ensure that all features contribute equally to the algorithm’s learning process.

Feature Scaling Methods

1. Standardization (Z-score Normalization)

Standardization transforms the data so that it has a mean of 0 and a standard deviation of 1. It’s commonly used when you expect your data to follow a normal distribution.

Code Example:

Let’s load the Wine Dataset and apply standardization using scikit-learn.

import pandas as pd
from sklearn.preprocessing import StandardScaler

# Load Wine dataset
df = pd.read_csv(
    header=None, usecols=[0,1,2])

df.columns = ['Class label', 'Alcohol', 'Malic acid']

# Standardization
scaler = StandardScaler().fit(df[['Alcohol', 'Malic acid']])
df_std = scaler.transform(df[['Alcohol', 'Malic acid']])


When to Use Standardization?

  • When your machine learning model assumes normally distributed data, such as in SVM, logistic regression, and PCA.
  • When you need to ensure all features have equal importance in algorithms like KNN and PCA.

2. Min-Max Scaling (Normalization)

Min-max scaling transforms data to fit within a specified range, typically [0, 1]. It’s commonly used when features don’t follow a normal distribution but still need to be on the same scale.

Code Example:

Let’s apply Min-Max scaling to the same dataset.

from sklearn.preprocessing import MinMaxScaler

# Min-Max Scaling
minmax_scaler = MinMaxScaler().fit(df[['Alcohol', 'Malic acid']])
df_minmax = minmax_scaler.transform(df[['Alcohol', 'Malic acid']])


When to Use Min-Max Scaling?

  • When your data doesn't follow a Gaussian distribution.
  • Ideal for algorithms like neural networks or when working with image data (pixel values normalized between [0, 1]).

Visualizing the Impact of Scaling

Let’s visualize how standardization and min-max scaling change the feature distributions.

import matplotlib.pyplot as plt

# Function to visualize scaling effects
def plot_scaling(df, df_std, df_minmax):

    plt.scatter(df['Alcohol'], df['Malic acid'], color='green', label='Original Scale', alpha=0.5)
    plt.scatter(df_std[:, 0], df_std[:, 1], color='red', label='Standardized', alpha=0.5)
    plt.scatter(df_minmax[:, 0], df_minmax[:, 1], color='blue', label='Min-Max Scaled', alpha=0.5)

    plt.title('Feature Scaling on Wine Dataset')
    plt.ylabel('Malic Acid')
    plt.legend(loc='upper left')

plot_scaling(df, df_std, df_minmax)        

The Impact of Scaling on Principal Component Analysis (PCA)

What is PCA?

PCA is a technique used to reduce the dimensionality of data by projecting it onto a set of orthogonal axes (principal components) that explain the maximum variance in the data.

However, PCA is highly sensitive to feature scaling. Without scaling, features with larger values dominate the variance, making PCA less effective.

Practical Example: Standardization and PCA

In this example, we will perform PCA on both standardized and non-standardized data to observe the effect of scaling.

from sklearn.decomposition import PCA

# Perform PCA on non-standardized data
pca = PCA(n_components=2).fit(df[['Alcohol', 'Malic acid']])
df_pca = pca.transform(df[['Alcohol', 'Malic acid']])

# Perform PCA on standardized data
pca_std = PCA(n_components=2).fit(df_std)
df_std_pca = pca_std.transform(df_std)        

Visualizing PCA with and without Standardization:

fig, (ax1, ax2) = plt.subplots(ncols=2, figsize=(10,5))

# Plot non-standardized PCA
ax1.scatter(df_pca[:, 0], df_pca[:, 1], color='blue', alpha=0.5)
ax1.set_title('PCA on Non-Standardized Data')
ax1.set_xlabel('1st Principal Component')
ax1.set_ylabel('2nd Principal Component')

# Plot standardized PCA
ax2.scatter(df_std_pca[:, 0], df_std_pca[:, 1], color='red', alpha=0.5)
ax2.set_title('PCA on Standardized Data')
ax2.set_xlabel('1st Principal Component')
ax2.set_ylabel('2nd Principal Component')



  • Without standardization: The principal components are dominated by features with larger ranges, which may not capture meaningful patterns.
  • With standardization: PCA performs much better, capturing the true variance and ensuring that all features contribute equally.

Training a Classifier After PCA: Naive Bayes Example

Let’s train a simple Naive Bayes classifier to compare the performance of PCA with and without standardization.

from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score

# Train Naive Bayes on non-standardized data
gnb = GaussianNB().fit(df_pca, df['Class label'])
predictions_pca = gnb.predict(df_pca)

# Train Naive Bayes on standardized PCA data
gnb_std = GaussianNB().fit(df_std_pca, df['Class label'])
predictions_std_pca = gnb_std.predict(df_std_pca)

# Accuracy
print("Accuracy on Non-Standardized PCA:", accuracy_score(df['Class label'], predictions_pca))
print("Accuracy on Standardized PCA:", accuracy_score(df['Class label'], predictions_std_pca))        


  • Without Standardization: Lower accuracy because PCA does not effectively capture the data's variance.
  • With Standardization: Higher accuracy as PCA captures the correct structure of the data.


Feature scaling is a critical preprocessing step that can significantly improve the performance of machine learning algorithms, especially those sensitive to the scale of input data. Standardization and min-max scaling are two popular techniques that ensure features are on a consistent scale.

Scaling is particularly important in tasks like PCA because it ensures that all features contribute equally to the analysis, improving both dimensionality reduction and subsequent classification accuracy.

Key Takeaways:

  • Always standardize your data using algorithms like PCA, SVM, or KNN.
  • Min-max scaling is ideal when working with neural networks or datasets where the data does not follow a normal distribution.
  • Scaling improves the performance of classifiers, especially in high-dimensional tasks.

By understanding these scaling techniques, you can significantly improve the performance of your machine-learning models.


Jaydeep Wagh的更多文章

