Standardization or Z-Score Normalization?is the transformation of features by subtracting from mean and dividing by standard deviation. This is often called as Z-score. Standardization translate the data to the mean vector of original data to the origin and squishes or expend.
In standardization -- 1. mean centering 2. Scaling by the factor of standard deviation
- mean and standard deviation is used for scaling
- X' = X - mean / standard deviation
- it is used when we want to ensure zero mean & unit standard deviation
- it is much less effected by outliers
- The?preprocessing?module provides the?StandardScaler?
- Scikit-Learn provides a transformer called MinMaxScaler
- ?X_new = (X - X_min)/(X_max - X_min)
- This scales the range to [0, 1]
- Geometrically speaking, transformation squishes the n-dimensional data into an n-dimensional unit hypercube
- useful when there are no outliers as it cannot cope up with them
- It is really affected by outliers
- It is used when features are of different scales
- x'=x - x_mean / X_max - X_min
- this give range b/w?[-1 to 1]
- In mean normalization, we center the variable at zero and rescale the distribution to the value range. This procedure involves subtracting the mean from each observation and then dividing the result by the difference between the minimum and maximum values
- if the value is less then mean then we get -ve value
- if the value is more than mean then we get + ve value
- it help where we need centred data
- X' = X /|X_max|
- sklearn.preprocessing.MaxAbsScaler
- Scale each feature by its maximum absolute value.
- This estimator scales and translates each feature individually such that the maximal absolute value of each feature in the training set will be 1.0. It does not shift/center the data, and thus does not destroy any sparsity.
- it use where we have sparse data -- > means in data where no. of zero are more?
- X' = X - X_median / IQR
- sklearn.preprocessing.RobustScaler
- Scale features using statistics that are robust to outliers.
- This Scaler removes the median and scales the data according to the quantile range (defaults to IQR: Interquartile Range). The IQR is the range between the 1st quartile (25th quantile) and the 3rd quartile (75th quantile)
- perform better in the data with outlier