The Ultimate Guide to Feature Scaling in Data Science
Abdulla Pathan
Award-Winner CIO | Driving Global Revenue Growth & Operational Excellence via AI, Cloud, & Digital Transformation | LinkedIn Top Voice in Innovation, AI, ML, & Data Governance | Delivering Scalable Solutions & Efficiency
In the world of data science, normalization is more than just a step—it's a critical practice that can make or break the performance of your machine learning models. Whether you're working in K-12 education or higher education, understanding and applying feature scaling is essential for achieving accurate and reliable results. Let's dive into a comprehensive guide to feature scaling, focusing on its importance, techniques, and practical implementation.
Why Normalize?
Types of Normalization
Min-Max Scaling
Z-Score Standardization
Robust Scaling
Max Abs Scaling
Implementation in Python with Scikit-Learn
Min-Max Scaling
领英推荐
import numpy as np
from sklearn.preprocessing import MinMaxScaler
# Sample Data
data = np.array([
[1, 2, 3],
[4, 5, 6],
[7, 8, 9],
[10, 11, 12]
])
# Instantiate the MinMaxScaler
scaler = MinMaxScaler()
# Fit the scaler to the data and transform it
scaled_data = scaler.fit_transform(data)
# Displaying the original and scaled data
print("Original Data:\n", data)
print("\nScaled Data:\n", scaled_data)
# Function to scale new data using the already fitted scaler
def scale_new_data(new_data, fitted_scaler):
scaled_new_data = fitted_scaler.transform(new_data)
return scaled_new_data
# Example new data to scale
new_data = np.array([
[2, 3, 4],
[5, 6, 7]
])
# Scaling new data
scaled_new_data = scale_new_data(new_data, scaler)
print("\nNew Data:\n", new_data)
print("\nScaled New Data:\n", scaled_new_data)
Practical Considerations
Z-Score Standardization
import numpy as np
from sklearn.preprocessing import StandardScaler
# Sample Data
data = np.array([
[1, 2, 3],
[4, 5, 6],
[7, 8, 9],
[10, 11, 12]
])
# Instantiate the StandardScaler
scaler = StandardScaler()
# Fit the scaler to the data and transform it
scaled_data = scaler.fit_transform(data)
# Displaying the original and scaled data
print("Original Data:\n", data)
print("\nScaled Data:\n", scaled_data)
# Function to scale new data using the already fitted scaler
def scale_new_data(new_data, fitted_scaler):
scaled_new_data = fitted_scaler.transform(new_data)
return scaled_new_data
# Example new data to scale
new_data = np.array([
[2, 3, 4],
[5, 6, 7]
])
# Scaling new data
scaled_new_data = scale_new_data(new_data, scaler)
print("\nNew Data:\n", new_data)
print("\nScaled New Data:\n", scaled_new_data)
# Function to inverse transform scaled data back to original scale
def inverse_transform_data(scaled_data, fitted_scaler):
original_data = fitted_scaler.inverse_transform(scaled_data)
return original_data
# Inverse transforming the scaled data
original_data_from_scaled = inverse_transform_data(scaled_data, scaler)
print("\nInverse Transformed Data (from scaled data back to original):\n", original_data_from_scaled)
Practical Considerations
Robust Scaling
import numpy as np
from sklearn.preprocessing import RobustScaler
# Sample Data
data = np.array([
[1, 2, 3],
[4, 5, 6],
[7, 8, 9],
[10, 11, 12]
])
# Instantiate the RobustScaler
scaler = RobustScaler()
# Fit the scaler to the data and transform it
scaled_data = scaler.fit_transform(data)
# Displaying the original and scaled data
print("Original Data:\n", data)
print("\nScaled Data:\n", scaled_data)
# Function to scale new data using the already fitted scaler
def scale_new_data(new_data, fitted_scaler):
scaled_new_data = fitted_scaler.transform(new_data)
return scaled_new_data
# Example new data to scale
new_data = np.array([
[2, 3, 4],
[5, 6, 7]
])
# Scaling new data
scaled_new_data = scale_new_data(new_data, scaler)
print("\nNew Data:\n", new_data)
print("\nScaled New Data:\n", scaled_new_data)
# Function to inverse transform scaled data back to original scale
def inverse_transform_data(scaled_data, fitted_scaler):
original_data = fitted_scaler.inverse_transform(scaled_data)
return original_data
# Inverse transforming the scaled data
original_data_from_scaled = inverse_transform_data(scaled_data, scaler)
print("\nInverse Transformed Data (from scaled data back to original):\n", original_data_from_scaled)
Practical Considerations
Max Abs Scaling
import numpy as np
from sklearn.preprocessing import MaxAbsScaler
# Sample Data
data = np.array([
[1, -2, 3],
[4, 5, -6],
[-7, 8, 9],
[10, -11, 12]
])
# Instantiate the MaxAbsScaler
scaler = MaxAbsScaler()
# Fit the scaler to the data and transform it
scaled_data = scaler.fit_transform(data)
# Displaying the original and scaled data
print("Original Data:\n", data)
print("\nScaled Data:\n", scaled_data)
# Function to scale new data using the already fitted scaler
def scale_new_data(new_data, fitted_scaler):
scaled_new_data = fitted_scaler.transform(new_data)
return scaled_new_data
# Example new data to scale
new_data = np.array([
[2, -3, 4],
[5, 6, -7]
])
# Scaling new data
scaled_new_data = scale_new_data(new_data, scaler)
print("\nNew Data:\n", new_data)
print("\nScaled New Data:\n", scaled_new_data)
# Function to inverse transform scaled data back to original scale
def inverse_transform_data(scaled_data, fitted_scaler):
original_data = fitted_scaler.inverse_transform(scaled_data)
return original_data
# Inverse transforming the scaled data
original_data_from_scaled = inverse_transform_data(scaled_data, scaler)
print("\nInverse Transformed Data (from scaled data back to original):\n", original_data_from_scaled)
Practical Considerations
Conclusion
Feature scaling is a cornerstone of effective data preprocessing. By normalizing your data, you can ensure that your models perform better and produce more reliable results. Whether you're a data scientist in K-12 education, higher education, or any other field, mastering these techniques is crucial. Implementing feature scaling with tools like Scikit-Learn can streamline your workflow and enhance the quality of your insights.