登录查看更多内容

?? Day65 of #100DaysOfPython ??

Surya Singh

Sr. AI/ML Consultant & Team Lead @Accenture Strategy | ex-ZS, EY | MS in ML & AI

发布日期: 2024年7月16日

Today, we're diving into One hot encoding in Machine Learning!

One hot encoding is a technique of converting categorical variables into numerical variable that can be efficiently utilised by a Machine Learning model!

During One hot encoding each label from a categorical variable is converted to a column with binary values representing the presence (1) or absence (0) of that label in that particular row item.

While this is easily doable for cases with limited number of labels per categorical variable but how do we handle cases where there are multiple categorical variables with large and varying number of variables?

In such cases, One hot encoding can be limited for, say, top 10 most frequent variables. Here, top 10 can be top 15 or 20 depending upon the data available and the domain knowledge.

Advantages:

Straightforward to implement
Prevents from spending hours on variable exploration
Does not massively expand the feature spacec

Disadvantages:

Does not add any information to the data that makes the variable more predictive
Does not retain the information of the ignored variables

Let's dive into the implementation of One hot encoding on mercedes dataset from kaggle: https://github.com/Surya8Singh/Feature-Engineering/blob/main/One_Hot_Encoding/One_Hot_Encoding.ipynb

要查看或添加评论，请登录

Surya Singh的更多文章

?? Day100 of #100DaysOfPython ??

2024年8月20日

?? Day100 of #100DaysOfPython ??

Today, we're diving into map(), filter(), & reduce() in python! map() The map() function in Python is used to apply a…

2 条评论
?? Day99 of #100DaysOfPython ??

2024年8月19日

?? Day99 of #100DaysOfPython ??

Today, we're diving into 'is' & '==' in python! The 'is' and '==' operators might seem similar at first glance, but…
?? Day98 of #100DaysOfPython ??

2024年8月18日

?? Day98 of #100DaysOfPython ??

Today, we're diving into the use of .join() function for string concatenation in python! The .
?? Day97 of #100DaysOfPython ??

2024年8月17日

?? Day97 of #100DaysOfPython ??

Today, we're continuing to dive into Object Oriented Programming in python! How do we initialise a class and create…
?? Day96 of #100DaysOfPython ??

2024年8月16日

?? Day96 of #100DaysOfPython ??

Today, we're diving into Object Oriented Programming in python! What is a class? A class is a blueprint for creating…
?? Day95 of #100DaysOfPython ??

2024年8月15日

?? Day95 of #100DaysOfPython ??

Today, we're diving into regex in python! Regex allows you to define search patterns for strings, making it easier to…
?? Day94 of #100DaysOfPython ??

2024年8月14日

?? Day94 of #100DaysOfPython ??

Today, we're diving into another technique for handling missing values known as Random Sample Imputation! Random sample…
?? Day93 of #100DaysOfPython ??

2024年8月13日

?? Day93 of #100DaysOfPython ??

Today, we're diving into Local & Global variables in python! Local variables are defined within a function or block and…
?? Day92 of #100DaysOfPython ??

2024年8月12日

?? Day92 of #100DaysOfPython ??

Today, we're diving into the use of .join() function for string concatenation in python! The .
?? Day91 of #100DaysOfPython ??

2024年8月11日

?? Day91 of #100DaysOfPython ??

Today, we're diving into Count/Frequency Encoding for handling categorical feature! Count or frequency encoding is a…

See all articles

?? Day65 of #100DaysOfPython ??

Surya Singh

Sr. AI/ML Consultant & Team Lead @Accenture Strategy | ex-ZS, EY | MS in ML & AI

Advantages:

Disadvantages:

Surya Singh的更多文章

社区洞察

其他会员也浏览了

Data Structures & Algorithms | A comprehensive Roadmap for Data Professionals

The Hello World of Machine Learning

Data Preprocessing Technique: Binarization

K-Nearest Neighbors

Machine Learning in Simple Steps

?? Day12 of #100DaysOfPython ??

Calculating Eigenvalues and Eigenvectors Using NumPy

TITANIC SURVIVAL PREDICTION PROJECT

Week 2 : Day 4 : Levelling Up My Data Wrangling Skills with Pandas

Using numpy.fft for Fourier Transformations

Advantages:

Disadvantages:

Surya Singh的更多文章

?? Day100 of #100DaysOfPython ??

?? Day99 of #100DaysOfPython ??

?? Day98 of #100DaysOfPython ??

?? Day97 of #100DaysOfPython ??

?? Day96 of #100DaysOfPython ??

?? Day95 of #100DaysOfPython ??

?? Day94 of #100DaysOfPython ??

?? Day93 of #100DaysOfPython ??

?? Day92 of #100DaysOfPython ??

?? Day91 of #100DaysOfPython ??

社区洞察

其他会员也浏览了

Data Structures & Algorithms | A comprehensive Roadmap for Data Professionals

The Hello World of Machine Learning

Data Preprocessing Technique: Binarization

K-Nearest Neighbors

Machine Learning in Simple Steps

?? Day12 of #100DaysOfPython ??

Calculating Eigenvalues and Eigenvectors Using NumPy

TITANIC SURVIVAL PREDICTION PROJECT

Week 2 : Day 4 : Levelling Up My Data Wrangling Skills with Pandas

Using numpy.fft for Fourier Transformations