Normalization in Machine learning

Normalization in Machine learning

What is Normalization in Machine Learning?

Normalization is a scaling technique in Machine Learning applied during data preparation to change the values of numeric columns in the dataset to use a common scale. It is not necessary for all datasets in a model. It is required only when features of machine learning models have different ranges.

Methods of Data Normalization :

1) Decimal Scaling

2) Min-Max Normalization

3) z-Score Normalization(zero-mean Normalization)

Implementing different methods of Data normalization:

1. Decimal Scaling

Decimal normalization is a method of normalization in which the given value is normalized by shifting the decimal points of that value. The number of decimal points to move is determined by the absolute maximum value of the given set of data. If Vi value of attribute A, then Ui is given as,

Decimal Scale Normalization formula:

No alt text provided for this image

Where, j is the smallest integer such that max|Ui|<1.

No alt text provided for this image
No alt text provided for this image
No alt text provided for this image
No alt text provided for this image
No alt text provided for this image
No alt text provided for this image
No alt text provided for this image

min-max normalization:

Also known as min-max scaling or min-max normalization, rescaling is the simplest method and consists in rescaling the range of features to scale the range in 0 to 1 or ?1 to 1. Selecting the target range depends on the nature of the data. The general formula for a min-max of 0 to 1 is given as:

No alt text provided for this image

where x is an original value, x' is the normalized value. For example, suppose that we have the students' weight data, and the students' weights span 160 pounds, 200 pounds. To rescale this data, we first subtract 160 from each student's weight and divide the result by 40 (the difference between the maximum and minimum weights)

No alt text provided for this image
No alt text provided for this image
No alt text provided for this image
No alt text provided for this image

Z-Score Normalization / Standardization (zero-mean Normalization):

Feature standardization makes the values of each feature in the data have zero-mean (when subtracting the mean in the numerator) and unit-variance. This method is widely used for normalization in many machine learning algorithms (e.g., support vector machines, logistic regression, and artificial neural networks). The general method of calculation is to determine the distribution mean and standard deviation for each feature. Next we subtract the mean from each feature. Then we divide the values (mean is already subtracted) of each feature by its standard deviation.

No alt text provided for this image


Where x is the original feature vector, x? = average(x) is the mean of that feature vector, and ?? is its standard deviation.

No alt text provided for this image
No alt text provided for this image
No alt text provided for this image

Some machine learning algorithms benefit from normalization and standardization, particularly when Euclidean distance is used. For example, if one of the variables in the K-Nearest Neighbor, KNN, is in the 1000s and the other is in the 0.1s, the first variable will dominate the distance rather strongly. In this scenario, normalization and standardization might be beneficial.

When to use normalization and standardization:

  • When you don’t know the distribution of your data?or when you know it’s not a Gaussian,?normalization is a smart approach to apply. Normalization is useful when your data has variable scales and the technique you’re employing, such as k-nearest neighbors and artificial neural networks, doesn’t make assumptions about the distribution of your data.
  • The assumption behind standardization is that your data follows a Gaussian (bell curve) distribution. This isn’t required, however, it helps the approach work better if your attribute distribution is Gaussian.?When your data has variable dimensions and the technique you’re using (like logistic regression,?linear regression, linear discriminant analysis) standardization is useful.
  • We normalize training data to solve the model learning challenge. We make sure that the various features have similar value ranges (feature scaling) so that gradient descents can converge faster.

Thank you...!!!

要查看或添加评论,请登录

社区洞察

其他会员也浏览了