?? Day12 of #100DaysOfPython ??
Surya Singh
Sr. AI/ML Consultant & Team Lead @Accenture Strategy | ex-ZS, EY | MS in ML & AI
Today, we're diving into feature engineering for optimized machine learning models through Frequency Encoding!
Frequency Encoding is a method through which we replace the labels in a feature with their respective frequencies. This method is used in place of One Hot Encoding for datasets with high cardinality.
Let's take a look how we can implement Frequency Encoding and why One Hot Encoding won't be the best solution in this case:
One Hot Encoding results in 69 more features that significantly increase the dimensionality of the dataset.
If we look at the labels we can see that feature 'X1' has 27 labels and 'X2' has 44 labels which is the cause of the increase in dimensionality of the datasets.
Let's take feature 'X2' that has 44 labels and perform Frequency Encoding:
During Frequency Encoding each label in feature 'X2' is replaced with their respective frequency protecting the dataset from the curse of dimensionality.
While Frequency Encoding is simple and easy to implement & reduces the dimensionality it has the following disadvantages:
Have you used Frequency Encoding while preparing the dataset for the model? if so, let me know in the comments how it impacted the model and what are the challenges you encountered?