登录查看更多内容

Encode Categorical Variables to Numeric Variables: Label encoder v/s One hot encoder

MANIDIPA CHAKRAVARTI

Senior Manager | Procurement Professional | SAP Ariba | Data Science Enthusiast

发布日期: 2020年9月17日

Label encoder v/s One hot encoder

Typically, any structured data set includes multiple columns – a combination of numerical as well as categorical variables. A machine learning algorithm can only understand the numbers and not the text. This process is called categorical encoding. Categorical encoding is a process of converting categories to numbers.

Categorical data describes categories or groups. One example would be car brands like Mercedes, BMW and Audi – Another body types of cars like Hatchback, Convertible, Sedan.

In this article we do the detailed comparison with statistical analysis for 2 very popular encoding techniques

Label Encoding

Encode labels with a value between 0 and n_classes-1 where n is the number of distinct labels. If a label repeats it assigns the same value to as assigned earlier.

For example:0- Hatchback, 1- Convertible, 2- Sedan and so on

One-Hot Encoding

Add dummy variables for each unique category. Assign 0 or 1 in each category

We will infer the winner by comparing RMSE of both the models.

Below is the source code from git hub. Happy Encoding :)

要查看或添加评论，请登录

MANIDIPA CHAKRAVARTI的更多文章

Lesser known facts about .loc and value_counts in Pandas for python

2020年9月26日

Lesser known facts about .loc and value_counts in Pandas for python

Professional data scientists usually spend a very large portion of their time on Data Cleaning. Pandas is the…

1 条评论
Data cleaning with Numpy and Pandas

2020年9月8日

Data cleaning with Numpy and Pandas

Proper data cleaning is the “secret” sauce behind machine learning. Better data beats fancier algorithms… Garbage in =…

Encode Categorical Variables to Numeric Variables: Label encoder v/s One hot encoder

MANIDIPA CHAKRAVARTI

Senior Manager | Procurement Professional | SAP Ariba | Data Science Enthusiast

MANIDIPA CHAKRAVARTI的更多文章

社区洞察

其他会员也浏览了

Self-Lengthen method for longer LLMs responses

Sometimes a Simple Solution is the Best Solution

Day 11: Machine Learning in Credit Scoring

Pinnacle of Efficiency: Automated Hyperparameter Optimization

Achieving Excellence in Machine Learning Projects: Setting SMART Goals

?? Day 134 of 365: Introduction to Feature Selection ??

?? Day 136 of 365: Wrapper Methods ??

How a PLC do the Scaling for a Sensor ?

AI xray fracture detection (full code): YoloV9 + Docker

Car Price Prediction Project using Regression Models (Machine Learning Project 1)

MANIDIPA CHAKRAVARTI的更多文章

Lesser known facts about .loc and value_counts in Pandas for python

Data cleaning with Numpy and Pandas

社区洞察

其他会员也浏览了

Self-Lengthen method for longer LLMs responses

Sometimes a Simple Solution is the Best Solution

Day 11: Machine Learning in Credit Scoring

Pinnacle of Efficiency: Automated Hyperparameter Optimization

Achieving Excellence in Machine Learning Projects: Setting SMART Goals

?? Day 134 of 365: Introduction to Feature Selection ??

?? Day 136 of 365: Wrapper Methods ??

How a PLC do the Scaling for a Sensor ?

AI xray fracture detection (full code): YoloV9 + Docker

Car Price Prediction Project using Regression Models (Machine Learning Project 1)