Feature Transformation Techniques

Feature Transformation Techniques

Introduction:

Data preprocessing is an important step in machine learning projects. Real-life data can be messy and unorganized, so we need to clean it up before using it in our models. This preprocessing step is crucial for getting good results. Feature Transformation is a technique we should always use, no matter what type of model we're working with. It helps us improve the data so our models can perform better.

Explain Feature transformation:

Feature transformation is a technique that we used to boost the performance of our machine learning algorithm with the help of mathematical formulas. We apply mathematical formulas on features to transform them in a form that directly boost the performance of machine learning algorithm.

How feature transformation increase the performance of machine learning algorithm?

The answer is, the distribution of our data is not normally distributed which has the very large impact on linear models like linear regression, logistic regression etc. Feature transformation technique used mathematical formulas to normalize the distribution. In that way, feature transformation boost the performance of machine learning algorithm.

Before applying feature transformation:

No alt text provided for this image
Figure 1: Before applying feature transformation

After applying feature transformation:

No alt text provided for this image
Figure 2: After applying feature transformation

How normal distribution gives the boost to the performance of machine learning algorithm?

As we know statistics is the mother of machine learning, when a statistician see a normal distribution he sees a way of solving a particular problem in an easy way, this can also said same for the machine learning algorithm as the base of machine learning algorithm is statistics so, when we give normally distributed data to machine learning algorithm the calculation that the algorithm has to made became more easy, so ultimately it takes less time on training and give best accuracy.

Without applying feature transformation the accuracy:

No alt text provided for this image
Figure 3: Before applying feature transformation model accuracy

After applying feature transformation:

No alt text provided for this image
Figure 4: After applying feature transformation model accuracy

As we can see the clear boost in the accuracy of logistic regression.

Types of function transformer:

There are three types of function transformation available in sklearn library,

  • Function transformer
  • Power transformer
  • Quantile transformer

Function Transformer:

In function transformer, there are multiple types of function transformer. The most commonly used are,

  • Log transform
  • Reciprocal transform
  • square transform

Log transformer:

In log transform, we apply log to every value of that particular column to make there distribution normal so that the performance of machine learning algorithm boosts.

Where to use?

  • When the particular column has only positive values, because we can't take the log of negative values.
  • When data is positively skewed

How it works?

Sometimes some columns has large scale than other columns, when we apply log transform, it will convert its scale into the range of other data. In that way its distribution transformed into normal distribution.

Before applying log transformation:

No alt text provided for this image
Figure 5: Before applying log transform

Accuracy of machine learning model before log transformation:

No alt text provided for this image
Figure 6: Model Accuracy

Applying log transformation:

No alt text provided for this image
Figure 7: Importing FunctionTransformer from sklearn
No alt text provided for this image
Figure 8 : Applying log transform

Results after applying log transform:

No alt text provided for this image
Figure 9: Little improvements in the distribution
No alt text provided for this image
Figure 10: Machine Learning Algorithm accuracy

As we can clearly see the improvements after applying log transform.

Reciprocal transform 1/x:

In reciprocal transform, we apply reciprocal of every value of that particular column to make there distribution normal so that the performance of machine learning algorithm boosts.

When to use?

  • Skewed Data: If your feature exhibits a heavily skewed distribution, with a long tail of large values, taking the reciprocal can help normalize the distribution and reduce the impact of extreme values.
  • Proportional Relationships: In some cases, the relationship between the feature and the target variable may be inversely proportional. Taking the reciprocal can help capture this relationship more accurately and improve model performance.
  • Stabilizing Variance: The reciprocal transform can help stabilize the variance of a feature, particularly if the variance increases as the feature values increase. This can be helpful in models that assume constant variance, such as linear regression.

How it works?

  • Skewness correction: If the original feature has a skewed distribution with a long tail of large values, taking the reciprocal can compress these larger values. This helps in making the distribution more symmetrical and reducing the impact of extreme outliers.
  • Value scaling: The reciprocal transform can effectively scale down larger values and scale up smaller values. This is because the reciprocal of a large value is smaller, while the reciprocal of a small value is larger. This can be useful when the range of values in the feature is very large, allowing for better representation of values across the feature space.
  • Proportional relationship capture: In some cases, a reciprocal relationship may exist between the feature and the target variable. By taking the reciprocal of the feature values, this inverse relationship can be captured more effectively. For example, if the feature represents time, and the target variable decreases as time increases, the reciprocal transform can help model this relationship more accurately.

It's important to note that the reciprocal transform may not be suitable for all types of data or all situations. It should be applied judiciously and with an understanding of the underlying data characteristics and the specific problem at hand. Additionally, it's crucial to handle potential issues that may arise, such as division by zero or close-to-zero values, which can impact the effectiveness of the reciprocal transform.

Applying reciprocal transform before and after:

No alt text provided for this image
Figure 11: Before and after applying reciprocal transform

square transform:

The square transform computes the square of each feature value. For a given feature value x, the square transform is calculated as x^2.

Where to Use:

The square transform can be useful in various scenarios:

  • Non-linear Relationships: If there is a non-linear relationship between a feature and the target variable, applying the square transform can help capture this relationship. It can enable a linear model to better fit the data, as it can model curved or quadratic patterns.
  • Scaling Differences: The square transform can be employed when there are significant differences in the scale or magnitude of feature values. Squaring the values can help balance out these differences and bring them to a more comparable range.
  • Variance Stabilization: If the variance of a feature increases as the values increase, applying the square transform can help stabilize the variance. This can be beneficial in situations where modeling assumptions, such as constant variance, need to be met.

How it Works:

When the square transform is applied to a feature, it has several effects on the data:

  • Non-linearity: The squared values introduce non-linearity into the relationship between the feature and the target variable. This allows for modeling more complex patterns and capturing curved relationships that a linear model might struggle to represent.
  • Magnitude Amplification: The square transform amplifies the differences between smaller values, while compressing the differences between larger values. This can be useful in situations where the magnitude of the feature values carries important information.
  • Impact on Outliers: The square transform can magnify the impact of outliers, as squaring extremely large or small values can result in even larger values. This effect should be taken into consideration and handled appropriately.

As with any feature transformation technique, the square transform should be applied thoughtfully and in consideration of the data characteristics and the specific problem at hand. It may not always be suitable or beneficial, and it's important to evaluate its impact on the data and model performance.

Applying square transform before and after:

No alt text provided for this image
No alt text provided for this image
Figure 12: Square transform

Custom transfomer:

No alt text provided for this image
Figure 13: Custom transformer

You can use this particular piece of code for making custom mathematical transformer.

Power Transformer:

Power transformer is used when the desired output is more Gaussian based. Power transform has two types:

  • Box-Cox transform
  • Yeo-Johnson transform

Box-Cox transform:

Box-Cox require data to be strictly positive, it does not even accepts zero in the data. Formula on the basis of Box-Cox transforms works is,

No alt text provided for this image
Figure 14: Formula for Box-Cox

The exponent here is a variable called λ that varies over the range of -5 to 5 and in the process of searching it examine all possible values of λ. Finally, we choose the optimal values (resulting in the best approximation to a normal distribution) for that particular feature.

Scope:

Applied only on values greater than zero (positive values only zero excluded)

Internal working techniques:

  • Max likelihood
  • Bayesian statistics

Distribution Before applying Box-Cox:

No alt text provided for this image
Figure 15: Distribution Before applying Box-Cox

Algorithm Accuracy before applying Box-Cox:

No alt text provided for this image
No alt text provided for this image
Figure 16: Model Accuracy before applying Box-Cox

Distribution after applying Box-Cox:

No alt text provided for this image
Figure 17: After applying Box-Cox

Algorithm Accuracy before applying Box-Cox:

No alt text provided for this image
Figure 18: Model Accuracy before applying Box-Cox

Yeo-Johnson transform:

This transformation is the specialized form of Box-Cox transform, we can apply Yeo-Johnson transform as well on the negative values. Formula of Yeo-Johnson,

No alt text provided for this image
Figure 19: Formula for Yeo Johnson

Distribution Before applying yeo-johnson

No alt text provided for this image
Figure 20: Before applying Yeo-Johnson

Algorithm Accuracy before applying Yeo-Johnson:

No alt text provided for this image
No alt text provided for this image
Figure 21: Model Accuracy before applying yeo-Johnson

Distribution after applying Yeo-Johnson:

No alt text provided for this image
Figure 22: Distribution after applying Yeo-Johnson

Algorithm Accuracy before applying Yeo-Johnson:

No alt text provided for this image
Figure 23: Model Accuracy before applying Yeo-Johnson

Conclusion:

When we are working with linear models it is necessary to normalize the scale of data for the better performance. The feature transformer technique has lots of variants available to do the task of normalizing the distribution, at the end of the day it totally depends on us what approach we will go for.

要查看或添加评论,请登录

Zuhaib Ashraf的更多文章

  • Column Transformer and Pipelines in Machine Learning

    Column Transformer and Pipelines in Machine Learning

    Introduction: When starting out or participating in competitions, it may seem beneficial to pre-process data in…

  • Encoding Features

    Encoding Features

    Introduction: Feature encoding is used for the transformation of categorical features into numerical features. Types of…

    9 条评论
  • Introduction to Feature Engineering

    Introduction to Feature Engineering

    Introduction: Feature Engineering is the process of using domain knowledge to extract features from raw data. These…

    2 条评论
  • Understanding Data and performing EDA

    Understanding Data and performing EDA

    Understanding data depends on 2 steps: Step 1: What basic question should be ask? Step 2: Exploratory Data Analysis…

  • How to frame a machine learning model?

    How to frame a machine learning model?

    Introduction: Framing a machine learning problem involves a series of steps to design and structure the problem…

    4 条评论
  • Tensors

    Tensors

    ?? What are tensors? ???? Tensors are fundamental mathematical objects used to describe various physical properties…

    9 条评论

社区洞察

其他会员也浏览了