登录查看更多内容

Feature Scaling

Ratan Kumar Jha

Data Analyst | Notice Period - 30 Days | PL 300 | LinkedIn 10K | MCA | 5?? SQL hacker rank | Python | Pandas | Numpy | SQL | Azure SQL | Power BI | Excel | grafana | machine learning | zabbix | pyspark

发布日期: 2022年11月30日

STANDARDIZATION

Standardization or Z-Score Normalization?is the transformation of features by subtracting from mean and dividing by standard deviation. This is often called as Z-score. Standardization translate the data to the mean vector of original data to the origin and squishes or expend.

In standardization -- 1. mean centering 2. Scaling by the factor of standard deviation

mean and standard deviation is used for scaling
X' = X - mean / standard deviation
it is used when we want to ensure zero mean & unit standard deviation
it is much less effected by outliers
The?preprocessing?module provides the?StandardScaler?

No alt text provided for this image — Ater scaling the mean is zero and unit standard deviation

2. NORMALIZATION

Type of normalization:-

A. Min Max scaling

领英推荐

Data Phoenix Digest - ISSUE 8.2024

Dmytro Spodarets 9 个月前

3 Ways to Transition Your Company Into A Data-Driven…

Akintayo Joda 2 年前

Edition 2: Introduction

Sitaram Choudary Yarlagadda 8 个月前

Scikit-Learn provides a transformer called MinMaxScaler
?X_new = (X - X_min)/(X_max - X_min)
This scales the range to [0, 1]
Geometrically speaking, transformation squishes the n-dimensional data into an n-dimensional unit hypercube
useful when there are no outliers as it cannot cope up with them
It is really affected by outliers
It is used when features are of different scales

Use of minmax scaler. After scaling data is squishes between 0 & 1. — Use of minmax scaler. After scaling data squishes between 0 & 1

B. Mean normalization

x'=x - x_mean / X_max - X_min
this give range b/w?[-1 to 1]
In mean normalization, we center the variable at zero and rescale the distribution to the value range. This procedure involves subtracting the mean from each observation and then dividing the result by the difference between the minimum and maximum values
if the value is less then mean then we get -ve value
if the value is more than mean then we get + ve value
it help where we need centred data

C. Max absolute scaling

X' = X /|X_max|
sklearn.preprocessing.MaxAbsScaler
Scale each feature by its maximum absolute value.
This estimator scales and translates each feature individually such that the maximal absolute value of each feature in the training set will be 1.0. It does not shift/center the data, and thus does not destroy any sparsity.
it use where we have sparse data -- > means in data where no. of zero are more?

D. Robust scaling

X' = X - X_median / IQR
sklearn.preprocessing.RobustScaler
Scale features using statistics that are robust to outliers.
This Scaler removes the median and scales the data according to the quantile range (defaults to IQR: Interquartile Range). The IQR is the range between the 1st quartile (25th quantile) and the 3rd quartile (75th quantile)
perform better in the data with outlier

#machinelearning #featureengineering #featurescaling #datacleaning #eda

Ratan Kumar Jha的更多文章

Mean Squared Error (MSE)

2022年12月20日

Mean Squared Error (MSE)

There are three error metrics that are commonly used for evaluating the performance of a regression model; they are:…

1 条评论

Feature Scaling

Ratan Kumar Jha

Data Analyst | Notice Period - 30 Days | PL 300 | LinkedIn 10K | MCA | 5?? SQL hacker rank | Python | Pandas | Numpy | SQL | Azure SQL | Power BI | Excel | grafana | machine learning | zabbix | pyspark

领英推荐

Ratan Kumar Jha的更多文章

社区洞察

其他会员也浏览了

Edition 2: Introduction

Mishandling Missing Values @ DS ML models

The Ultimate Guide to Scaling Data in Data Science

Announcing AI Transform (plus everything you need to know about data mapping)

Mind the Gap: Bridging the Divide Between GenAI Promise and Practice

Data Intellect eNews02

Model Order Selection

TotalAgility 8 - Maps for Seamless Third-Party Data Extraction

Quantico: Forecasting Panel & Single Series Data

Speed With Accuracy In Data Science: Striking The Perfect Balance.

领英推荐

Ratan Kumar Jha的更多文章

Mean Squared Error (MSE)

社区洞察

其他会员也浏览了

Edition 2: Introduction

Mishandling Missing Values @ DS ML models

The Ultimate Guide to Scaling Data in Data Science

Announcing AI Transform (plus everything you need to know about data mapping)

Mind the Gap: Bridging the Divide Between GenAI Promise and Practice

Data Intellect eNews02

Model Order Selection

TotalAgility 8 - Maps for Seamless Third-Party Data Extraction

Quantico: Forecasting Panel & Single Series Data

Speed With Accuracy In Data Science: Striking The Perfect Balance.