Performing Data Normalization and Scaling with NumPy
Mohamed Riyaz Khan
Data Scientist in Tech | Leveraging Data for Insights | Seeking New Challenges | Driving Impact | Python | Machine Learning | Data Analysis | SQL | TensorFlow | NLP
Data normalization and scaling are essential preprocessing steps in data analysis and machine learning. These techniques help to standardize the range of independent variables or features of data, ensuring that each feature contributes equally to the analysis. In this article, we'll explore how to perform data normalization and scaling using NumPy, with practical examples and easy-to-follow instructions.
What are Data Normalization and Scaling?
Using NumPy for Data Normalization and Scaling
NumPy provides powerful tools to handle data normalization and scaling efficiently.
Step-by-Step Guide
First, you need to import the NumPy library.
import numpy as np
You can either generate random data or use your own dataset. For simplicity, let's create some example data.
# Example data
data = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
Normalization
Normalization typically scales the data to a range of [0, 1].
# Normalize the data
data_min = np.min(data, axis=0)
data_max = np.max(data, axis=0)
normalized_data = (data - data_min) / (data_max - data_min)
# Print the normalized data
print("Normalized data:\n", normalized_data)
Scaling
Scaling typically adjusts the data to have a mean of 0 and a standard deviation of 1.
领英推荐
You can scale the data using the following formula:
Where:
# Scale the data
data_mean = np.mean(data, axis=0)
data_std = np.std(data, axis=0)
scaled_data = (data - data_mean) / data_std
# Print the scaled data
print("Scaled data:\n", scaled_data)
Example
Here's a complete example with detailed explanations.
import numpy as np
# Example data
data = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
# Normalization
data_min = np.min(data, axis=0)
data_max = np.max(data, axis=0)
normalized_data = (data - data_min) / (data_max - data_min)
print("Normalized data:\n", normalized_data)
# Scaling
data_mean = np.mean(data, axis=0)
data_std = np.std(data, axis=0)
scaled_data = (data - data_mean) / data_std
print("Scaled data:\n", scaled_data)
Output:
Normalized data:
[[0. 0. 0. ]
[0.5 0.5 0.5 ]
[1. 1. 1. ]]
Scaled data:
[[-1.22474487 -1.22474487 -1.22474487]
[ 0. 0. 0. ]
[ 1.22474487 1.22474487 1.22474487]]
In this example:
The normalized_data array contains the normalized values of the original data, scaled to a range of [0, 1].
The scaled_data array contains the scaled values of the original data, with a mean of 0 and a standard deviation of 1.
Applications
Conclusion
Data normalization and scaling are fundamental preprocessing steps that enhance the performance of data analysis and machine learning models. NumPy provides simple and efficient functions to handle these tasks. By following the steps outlined in this guide, you can easily normalize and scale your data for better analysis and modeling.
Happy preprocessing!