Feature Scaling

Feature Scaling

  1. STANDARDIZATION

Standardization or Z-Score Normalization?is the transformation of features by subtracting from mean and dividing by standard deviation. This is often called as Z-score. Standardization translate the data to the mean vector of original data to the origin and squishes or expend.

In standardization -- 1. mean centering 2. Scaling by the factor of standard deviation

  • mean and standard deviation is used for scaling
  • X' = X - mean / standard deviation
  • it is used when we want to ensure zero mean & unit standard deviation
  • it is much less effected by outliers
  • The?preprocessing?module provides the?StandardScaler?

No alt text provided for this image
Ater scaling the mean is zero and unit standard deviation
No alt text provided for this image
After scaling less much effected by outliers


2. NORMALIZATION

Type of normalization:-

A. Min Max scaling

  • Scikit-Learn provides a transformer called MinMaxScaler
  • ?X_new = (X - X_min)/(X_max - X_min)
  • This scales the range to [0, 1]
  • Geometrically speaking, transformation squishes the n-dimensional data into an n-dimensional unit hypercube
  • useful when there are no outliers as it cannot cope up with them
  • It is really affected by outliers
  • It is used when features are of different scales

Use of minmax scaler.  After scaling data is squishes between  0 & 1.
Use of minmax scaler. After scaling data squishes between 0 & 1

B. Mean normalization

  • x'=x - x_mean / X_max - X_min
  • this give range b/w?[-1 to 1]
  • In mean normalization, we center the variable at zero and rescale the distribution to the value range. This procedure involves subtracting the mean from each observation and then dividing the result by the difference between the minimum and maximum values
  • if the value is less then mean then we get -ve value
  • if the value is more than mean then we get + ve value
  • it help where we need centred data

C. Max absolute scaling

  • X' = X /|X_max|
  • sklearn.preprocessing.MaxAbsScaler
  • Scale each feature by its maximum absolute value.
  • This estimator scales and translates each feature individually such that the maximal absolute value of each feature in the training set will be 1.0. It does not shift/center the data, and thus does not destroy any sparsity.
  • it use where we have sparse data -- > means in data where no. of zero are more?

D. Robust scaling

  • X' = X - X_median / IQR
  • sklearn.preprocessing.RobustScaler
  • Scale features using statistics that are robust to outliers.
  • This Scaler removes the median and scales the data according to the quantile range (defaults to IQR: Interquartile Range). The IQR is the range between the 1st quartile (25th quantile) and the 3rd quartile (75th quantile)
  • perform better in the data with outlier


#machinelearning #featureengineering #featurescaling #datacleaning #eda

要查看或添加评论,请登录

社区洞察

其他会员也浏览了