登录查看更多内容

?? Day12 of #100DaysOfPython ??

Surya Singh

Sr. AI/ML Consultant & Team Lead @Accenture Strategy | ex-ZS, EY | MS in ML & AI

发布日期: 2024年4月10日

Today, we're diving into feature engineering for optimized machine learning models through Frequency Encoding!

Frequency Encoding is a method through which we replace the labels in a feature with their respective frequencies. This method is used in place of One Hot Encoding for datasets with high cardinality.

Let's take a look how we can implement Frequency Encoding and why One Hot Encoding won't be the best solution in this case:

Dataset: https://www.kaggle.com/datasets/yasserh/mercedesbenz-greener-manufacturing-dataset

1. reading mercedes dataset taken from kaggle and loading 'X1' & 'X2' feature that have high cardinality

2. Performing One Hot Encoding on the dataset

One Hot Encoding results in 69 more features that significantly increase the dimensionality of the dataset.

If we look at the labels we can see that feature 'X1' has 27 labels and 'X2' has 44 labels which is the cause of the increase in dimensionality of the datasets.

Let's take feature 'X2' that has 44 labels and perform Frequency Encoding:

During Frequency Encoding each label in feature 'X2' is replaced with their respective frequency protecting the dataset from the curse of dimensionality.

While Frequency Encoding is simple and easy to implement & reduces the dimensionality it has the following disadvantages:

In case there are two labels with same frequency, replacing the labels with the frequency leads to a loss of valuable information
The frequency can be arbitrary and have no contribution towards the predictive power of the model.

Have you used Frequency Encoding while preparing the dataset for the model? if so, let me know in the comments how it impacted the model and what are the challenges you encountered?

要查看或添加评论，请登录

Surya Singh的更多文章

?? Day100 of #100DaysOfPython ??

2024年8月20日

?? Day100 of #100DaysOfPython ??

Today, we're diving into map(), filter(), & reduce() in python! map() The map() function in Python is used to apply a…

2 条评论
?? Day99 of #100DaysOfPython ??

2024年8月19日

?? Day99 of #100DaysOfPython ??

Today, we're diving into 'is' & '==' in python! The 'is' and '==' operators might seem similar at first glance, but…
?? Day98 of #100DaysOfPython ??

2024年8月18日

?? Day98 of #100DaysOfPython ??

Today, we're diving into the use of .join() function for string concatenation in python! The .
?? Day97 of #100DaysOfPython ??

2024年8月17日

?? Day97 of #100DaysOfPython ??

Today, we're continuing to dive into Object Oriented Programming in python! How do we initialise a class and create…
?? Day96 of #100DaysOfPython ??

2024年8月16日

?? Day96 of #100DaysOfPython ??

Today, we're diving into Object Oriented Programming in python! What is a class? A class is a blueprint for creating…
?? Day95 of #100DaysOfPython ??

2024年8月15日

?? Day95 of #100DaysOfPython ??

Today, we're diving into regex in python! Regex allows you to define search patterns for strings, making it easier to…
?? Day94 of #100DaysOfPython ??

2024年8月14日

?? Day94 of #100DaysOfPython ??

Today, we're diving into another technique for handling missing values known as Random Sample Imputation! Random sample…
?? Day93 of #100DaysOfPython ??

2024年8月13日

?? Day93 of #100DaysOfPython ??

Today, we're diving into Local & Global variables in python! Local variables are defined within a function or block and…
?? Day92 of #100DaysOfPython ??

2024年8月12日

?? Day92 of #100DaysOfPython ??

Today, we're diving into the use of .join() function for string concatenation in python! The .
?? Day91 of #100DaysOfPython ??

2024年8月11日

?? Day91 of #100DaysOfPython ??

Today, we're diving into Count/Frequency Encoding for handling categorical feature! Count or frequency encoding is a…

See all articles

?? Day12 of #100DaysOfPython ??

Surya Singh

Sr. AI/ML Consultant & Team Lead @Accenture Strategy | ex-ZS, EY | MS in ML & AI

Surya Singh的更多文章

社区洞察

其他会员也浏览了

Data Science #27

Data Science #7

Data Science #6

Uniform Manifold Approximation and Projection

Riemannian Metric for SPD Manifolds

Vector and Covector Fields

Google Colab: A Powerful Testing Platform for Machine Learning and Time Series Analysis

GPTEngineer VS tzap.io

Visualization of Mathematical Engineering of Transformers - Part 2

#MachineLearning Train/Test Split + Fit/Predict/Accuracy

Surya Singh的更多文章

?? Day100 of #100DaysOfPython ??

?? Day99 of #100DaysOfPython ??

?? Day98 of #100DaysOfPython ??

?? Day97 of #100DaysOfPython ??

?? Day96 of #100DaysOfPython ??

?? Day95 of #100DaysOfPython ??

?? Day94 of #100DaysOfPython ??

?? Day93 of #100DaysOfPython ??

?? Day92 of #100DaysOfPython ??

?? Day91 of #100DaysOfPython ??

社区洞察

其他会员也浏览了

Data Science #27

Data Science #7

Data Science #6

Uniform Manifold Approximation and Projection

Riemannian Metric for SPD Manifolds

Vector and Covector Fields

Google Colab: A Powerful Testing Platform for Machine Learning and Time Series Analysis

GPTEngineer VS tzap.io

Visualization of Mathematical Engineering of Transformers - Part 2

#MachineLearning Train/Test Split + Fit/Predict/Accuracy