?? Day12 of #100DaysOfPython ??

?? Day12 of #100DaysOfPython ??

Today, we're diving into feature engineering for optimized machine learning models through Frequency Encoding!

Frequency Encoding is a method through which we replace the labels in a feature with their respective frequencies. This method is used in place of One Hot Encoding for datasets with high cardinality.

Let's take a look how we can implement Frequency Encoding and why One Hot Encoding won't be the best solution in this case:

Dataset: https://www.kaggle.com/datasets/yasserh/mercedesbenz-greener-manufacturing-dataset
1. reading mercedes dataset taken from kaggle and loading 'X1' & 'X2' feature that have high cardinality


2. Performing One Hot Encoding on the dataset

One Hot Encoding results in 69 more features that significantly increase the dimensionality of the dataset.

If we look at the labels we can see that feature 'X1' has 27 labels and 'X2' has 44 labels which is the cause of the increase in dimensionality of the datasets.


Let's take feature 'X2' that has 44 labels and perform Frequency Encoding:

3. Performing Frequency Encoding

During Frequency Encoding each label in feature 'X2' is replaced with their respective frequency protecting the dataset from the curse of dimensionality.

While Frequency Encoding is simple and easy to implement & reduces the dimensionality it has the following disadvantages:

  1. In case there are two labels with same frequency, replacing the labels with the frequency leads to a loss of valuable information
  2. The frequency can be arbitrary and have no contribution towards the predictive power of the model.


Have you used Frequency Encoding while preparing the dataset for the model? if so, let me know in the comments how it impacted the model and what are the challenges you encountered?





要查看或添加评论,请登录

Surya Singh的更多文章

  • ?? Day100 of #100DaysOfPython ??

    ?? Day100 of #100DaysOfPython ??

    Today, we're diving into map(), filter(), & reduce() in python! map() The map() function in Python is used to apply a…

    2 条评论
  • ?? Day99 of #100DaysOfPython ??

    ?? Day99 of #100DaysOfPython ??

    Today, we're diving into 'is' & '==' in python! The 'is' and '==' operators might seem similar at first glance, but…

  • ?? Day98 of #100DaysOfPython ??

    ?? Day98 of #100DaysOfPython ??

    Today, we're diving into the use of .join() function for string concatenation in python! The .

  • ?? Day97 of #100DaysOfPython ??

    ?? Day97 of #100DaysOfPython ??

    Today, we're continuing to dive into Object Oriented Programming in python! How do we initialise a class and create…

  • ?? Day96 of #100DaysOfPython ??

    ?? Day96 of #100DaysOfPython ??

    Today, we're diving into Object Oriented Programming in python! What is a class? A class is a blueprint for creating…

  • ?? Day95 of #100DaysOfPython ??

    ?? Day95 of #100DaysOfPython ??

    Today, we're diving into regex in python! Regex allows you to define search patterns for strings, making it easier to…

  • ?? Day94 of #100DaysOfPython ??

    ?? Day94 of #100DaysOfPython ??

    Today, we're diving into another technique for handling missing values known as Random Sample Imputation! Random sample…

  • ?? Day93 of #100DaysOfPython ??

    ?? Day93 of #100DaysOfPython ??

    Today, we're diving into Local & Global variables in python! Local variables are defined within a function or block and…

  • ?? Day92 of #100DaysOfPython ??

    ?? Day92 of #100DaysOfPython ??

    Today, we're diving into the use of .join() function for string concatenation in python! The .

  • ?? Day91 of #100DaysOfPython ??

    ?? Day91 of #100DaysOfPython ??

    Today, we're diving into Count/Frequency Encoding for handling categorical feature! Count or frequency encoding is a…

社区洞察

其他会员也浏览了