登录查看更多内容

Data Encoding in Machine Learning

GOKUL . S

AI Intern at Infosys Springboard | Communication |Creativity |Team Management | Active listening |Problem Solving

发布日期: 2024年10月25日

Data encoding plays a crucial role in machine learning, especially when dealing with categorical data or text data that cannot be directly fed into a model. Proper data encoding ensures that the data is in a numerical format that the machine learning algorithm can understand and learn from effectively.

Nominal or One Hot Encoding :

It is technique used to represent categorical data as numerical data , which is more suitable for Machine Learning Algorithms
In this method each category is represented as binary vector where each bit corresponds to a unique category
For example ,There is a feature "Color" in which it has three categories "Red", "Green", "Blue" .When applying One Hot Encoding the three categories are divided into three features where if "Red" category comes in a row all the rows except "Red" become 0 in the new feature created separately for "Red"
The main disadvantage of using One Hot Encoding is that if we have large number of categories ,if we apply One Hot Encoding to this ,many number of Features are created .
Another Disadvantage is Sparse Matrix that is 1s and 0s when we have n number of categories, leads to overfitting of the model

Label Encoding :

Label encoding involves assigning unique numerical values to each categories in the feature
The labels are usually arranged in alphabetical order or based on frequency of the category
For example , Consider a feature "Color" which has category "Green", "Blue" , "Red" when applying Label encoding to it gives Green-2,Blue-1,Red-3
The main disadvantage of using Label encoding is that when we are analyzing ordinal data there is some ranking in the category .If we apply Label Encoding to them it will give unique values based on alphabetical or frequency of the category. It leads to the inaccuracy of the model output.

领英推荐

What is a Data pipeline for Machine Learning?

TAGX 1 年前

The Role of Feature Engineering in Machine Learning…

CodeCrux Web Technologies(P) Ltd. 2 个月前

Overview of Feature Engineering In Machine Learning

Sanjay Kumar MBA,MS,PhD 5 个月前

Ordinal Encoding :

Ordinal Encoding is used to encode categorical data that have an intrinsic order or ranking .
In this technique each category is assigned a numerical value based on the position in the order.
For example , If we have a feature "Educational Qualification" with categories "Graduate", "Post Graduate", "High School" when applying Ordinal Encoding to it gives "High school" - 1, "Graduate" - 2, "Post Graduate - 3"

Target Guided Ordinal Encoding :

It is technique used to encode categorical variable based on their relation with the target variable
It is useful when we have a Categorical feature with large number of unique categories.
In this, we replace the unique categories with a numerical value based on the mean or median of the target variable for that category
This create an monotonic relationship between categories / value and Target variable , which can improve the predictive power of the model

要查看或添加评论，请登录

GOKUL . S的更多文章

Understanding Support Vector Machines (SVM)

2025年2月14日

Understanding Support Vector Machines (SVM)

Support Vector Machines (SVM) is a powerful machine learning algorithm used for both classification and regression…

2 条评论
Understanding Logistic Regression: A Fundamental Tool in Machine Learning

2025年2月9日

Understanding Logistic Regression: A Fundamental Tool in Machine Learning

Understanding Logistic Regression: A Fundamental Tool in Machine Learning In the world of machine learning…

1 条评论
What is Linear Regression?

2024年11月25日

What is Linear Regression?

Imagine you’re a shopkeeper, and you notice that as the temperature outside increases, more people buy cold drinks from…
Supervised Machine Learning: A Comprehensive Overview

2024年5月21日

Supervised Machine Learning: A Comprehensive Overview

In the realm of artificial intelligence (AI) and data science, supervised machine learning stands as a cornerstone…
Navigating the Future: The Integration of Machine Learning in Self-Driving Cars

2024年4月29日

Navigating the Future: The Integration of Machine Learning in Self-Driving Cars

Introduction: Self-driving cars represent a paradigm shift in transportation, promising safer roads, increased…
PANDAS LIBRARY

2024年3月28日

PANDAS LIBRARY

In the realm of data science and analytics, the ability to efficiently manipulate and analyze data is paramount. Enter…
Exploring Data Visualization with Seaborn: A Powerful Python Library

2024年3月22日

Exploring Data Visualization with Seaborn: A Powerful Python Library

In the vast landscape of data science and analysis, visualization serves as a powerful tool for understanding…
Mongo DB

2024年2月15日

Mongo DB

MongoDB is a document-oriented NoSQL database, designed for ease of development, scalability, and performance. Unlike…
Space X

2023年12月16日

Space X

Founded by visionary entrepreneur Elon Musk in 2002, SpaceX has become synonymous with innovation in space exploration.…
AMAZON WEB SERVICES

2023年12月16日

AMAZON WEB SERVICES

Unleashing the Power of Cloud Computing: A Deep Dive into Amazon Web Services (AWS) In the ever-evolving landscape of…

See all articles

Data Encoding in Machine Learning

GOKUL . S

AI Intern at Infosys Springboard | Communication |Creativity |Team Management | Active listening |Problem Solving

领英推荐

GOKUL . S的更多文章

社区洞察

其他会员也浏览了

IID in machine learning

Decision Tree

Day 13 : How Machines Learn from Data – An Overview

DIMENSIONALITY REDUCTION

Data Encoding in Machine Learning - Part 08

Introduction to Data

Encode-Categorical-Features

Embeddings explained in plain English

Enhancing Machine Learning Models: The Importance of Data Augmentation

Unlocking the Power of Machine Learning: A Deep Dive

领英推荐

GOKUL . S的更多文章

Understanding Support Vector Machines (SVM)

Understanding Logistic Regression: A Fundamental Tool in Machine Learning

What is Linear Regression?

Supervised Machine Learning: A Comprehensive Overview

Navigating the Future: The Integration of Machine Learning in Self-Driving Cars

PANDAS LIBRARY

Exploring Data Visualization with Seaborn: A Powerful Python Library

Mongo DB

Space X

AMAZON WEB SERVICES

社区洞察

其他会员也浏览了

IID in machine learning

Decision Tree

Day 13 : How Machines Learn from Data – An Overview

DIMENSIONALITY REDUCTION

Data Encoding in Machine Learning - Part 08

Introduction to Data

Encode-Categorical-Features

Embeddings explained in plain English

Enhancing Machine Learning Models: The Importance of Data Augmentation

Unlocking the Power of Machine Learning: A Deep Dive