登录查看更多内容

Normalization in Machine learning

Sandeepkumar Belamagi

Data Analyst | Machine Learning | Python & ML Pipelines | Power BI Expert | MLOps Learner | Transitioning to Data Science / ML Engineer

发布日期: 2022年9月11日

What is Normalization in Machine Learning?

Normalization is a scaling technique in Machine Learning applied during data preparation to change the values of numeric columns in the dataset to use a common scale. It is not necessary for all datasets in a model. It is required only when features of machine learning models have different ranges.

Methods of Data Normalization :

1) Decimal Scaling

2) Min-Max Normalization

3) z-Score Normalization(zero-mean Normalization)

Implementing different methods of Data normalization:

1. Decimal Scaling

Decimal normalization is a method of normalization in which the given value is normalized by shifting the decimal points of that value. The number of decimal points to move is determined by the absolute maximum value of the given set of data. If Vi value of attribute A, then Ui is given as,

Decimal Scale Normalization formula:

Where, j is the smallest integer such that max|Ui|<1.

领英推荐

Unraveling the Enigma of VAE

360DigiTMG 1 年前

Graph Machine Learning: It's Everywhere!

Tyler Blalock 5 个月前

LSTM for Enterprise Time Series Forecasting

Vasu Rao 6 个月前

min-max normalization:

Also known as min-max scaling or min-max normalization, rescaling is the simplest method and consists in rescaling the range of features to scale the range in 0 to 1 or ?1 to 1. Selecting the target range depends on the nature of the data. The general formula for a min-max of 0 to 1 is given as:

where x is an original value, x' is the normalized value. For example, suppose that we have the students' weight data, and the students' weights span 160 pounds, 200 pounds. To rescale this data, we first subtract 160 from each student's weight and divide the result by 40 (the difference between the maximum and minimum weights)

Z-Score Normalization / Standardization (zero-mean Normalization):

Feature standardization makes the values of each feature in the data have zero-mean (when subtracting the mean in the numerator) and unit-variance. This method is widely used for normalization in many machine learning algorithms (e.g., support vector machines, logistic regression, and artificial neural networks). The general method of calculation is to determine the distribution mean and standard deviation for each feature. Next we subtract the mean from each feature. Then we divide the values (mean is already subtracted) of each feature by its standard deviation.

Where x is the original feature vector, x? = average(x) is the mean of that feature vector, and ?? is its standard deviation.

Some machine learning algorithms benefit from normalization and standardization, particularly when Euclidean distance is used. For example, if one of the variables in the K-Nearest Neighbor, KNN, is in the 1000s and the other is in the 0.1s, the first variable will dominate the distance rather strongly. In this scenario, normalization and standardization might be beneficial.

When to use normalization and standardization:

When you don’t know the distribution of your data?or when you know it’s not a Gaussian,?normalization is a smart approach to apply. Normalization is useful when your data has variable scales and the technique you’re employing, such as k-nearest neighbors and artificial neural networks, doesn’t make assumptions about the distribution of your data.
The assumption behind standardization is that your data follows a Gaussian (bell curve) distribution. This isn’t required, however, it helps the approach work better if your attribute distribution is Gaussian.?When your data has variable dimensions and the technique you’re using (like logistic regression,?linear regression, linear discriminant analysis) standardization is useful.
We normalize training data to solve the model learning challenge. We make sure that the various features have similar value ranges (feature scaling) so that gradient descents can converge faster.

Thank you...!!!

Normalization in Machine learning

Sandeepkumar Belamagi

Data Analyst | Machine Learning | Python & ML Pipelines | Power BI Expert | MLOps Learner | Transitioning to Data Science / ML Engineer

What is Normalization in Machine Learning?

Implementing different methods of Data normalization:

1. Decimal Scaling

领英推荐

min-max normalization:

Z-Score Normalization / Standardization (zero-mean Normalization):

When to use normalization and standardization:

社区洞察

其他会员也浏览了

BxD Primer Series: Support Vector Machine (SVM) Models

Comparing Machine Learning Models to Find the Best Fit

Model Selection

What Is Gradient Descent in Machine Learning?

The significance of artificial intelligence with machine learning and deep learning:

State of Retrosynthesis in Machine Learning era (Part 1 - A brief synopsis)

How to Choose the Right Machine Learning Algorithm for Your Business Success

Classification vs. Regression in Machine Learning

BxD Primer Series: Mean-Shift Clustering Models

Classification