登录查看更多内容

BxD Primer Series: Linear Discriminant Analysis (LDA) for Dimensionality Reduction

Mayank K.

Founding Partner - BUSINESS x DATA

发布日期: 2023年5月19日

Hey there ??

Welcome to BxD Primer Series where we are covering topics such as Machine learning models, Neural Nets, GPT, Ensemble models, Hyper-automation in ‘one-post-one-topic’ format. Today’s post is on?LDA for Dimensionality Reduction. Let’s get started:

The What:

LDA (Linear Discriminant Analysis), is a supervised learning technique used for dimensionality reduction in machine learning. The goal of LDA is to project data onto a lower-dimensional space that maximizes the separation between the classes while minimizing the overlap between them.

Like PCA (Principal Component Analysis), LDA is also based on solving for Eigenvectors and Eigenvalues but it has several advantages over PCA. LDA takes into account the class labels, which can improve the separation between classes.

Comparison with PCA:

Purpose: PCA is an unsupervised dimensionality reduction technique that seeks to maximize the variance in the data while reducing the number of features. LDA, on the other hand, is a supervised technique that aims to maximize the separation between the classes by projecting the data onto a lower-dimensional space.

Assumptions:?PCA assumes that the data is linearly related and normally distributed. LDA assumes that the data is normally distributed and that the classes have equal covariance matrices.

Input:?PCA operates on the entire dataset, without regard to the class labels. LDA requires the class labels to be known and operates on the labeled subset of the data.

Objective:?PCA seeks to find the directions of maximum variance in the data, which are called principal components. LDA seeks to find the linear combinations of the features that best separate the classes.

Interpretability:?PCA produces new features that does not have clear interpretability in terms of original variables. LDA produces features that are chosen specifically to maximize separation between the classes, which make them more interpretable for specific classification problem.

Performance:?PCA is generally faster and more robust than LDA, but it may not be as effective at separating the classes. LDA can produce highly discriminative features, but it is sensitive to class imbalance and other factors.

The How:

The goal of LDA is to find a linear transformation (w) of input data that maximizes the separation between classes. The cost function used in LDA, also known as Fisher's criterion, is defined as the ratio of “between-class variance” to “within-class variance”. This cost function is as below:

Where,

w?is the projection vector that maximizes separation between the classes
S_b?is the between-class scatter matrix
S_w?is the within-class scatter matrix.

Let X be the input data matrix of size N x D, where N is the number of samples and D is the number of features. We assume that there are K classes in the data. Here is a step-by-step explanation of how LDA will work on this data:

Step 1: Calculate the mean vector of each class:

Where?N_k?is number of samples in class k and?x_i?is the i-th sample in class k

Step 2: Calculate the within-class scatter matrix:

Step 3: Calculate the between-class scatter matrix:

Where?μ?is the overall mean vector of whole data

领英推荐

The Art of Balance: Understanding and Optimizing…

Huenei IT Services 2 个月前

The Quest for Interpretable Machine Learning Models

Vizuara 9 个月前

The Emergence of Machine Learning in Forecasting– a…

Alkiviadis Vazacopoulos 2 年前

Step 4: The goal of LDA is to find the projection vector?w?that maximizes the cost function?J(w). This can be achieved by solving the generalized eigenvalue problem:

Where?λ?is the eigenvalue of the matrix and?w?is the corresponding eigenvector. The eigenvectors represent the directions in which data should be projected in order to maximize the separation between the classes, and the eigenvalues represent the amount of variance in data that is captured by each eigenvector.

Step 5: Select the top k eigenvectors based on their corresponding eigenvalues:?The number of eigenvectors selected is equal to the desired dimensionality of the new subspace. We have already covered topic of selecting k in previous edition on PCA (check?here).

Step 6: Project original data on k-selected eigenvectors:?Transform your original data matrix X of size N x D into a new matrix X_new of size N x k by taking the product between original data matrix X and the matrix of selected eigenvectors W of size D x k:

X_new = XW

The resulting transformed data matrix X_new represents the data projected onto new subspace spanned by the selected eigenvectors. Each row of X_new corresponds to a sample in original data matrix X, but with the number of dimensions reduced from D to k.

The output of LDA can be used as input in a classifier.

The Why:

Consider using LDA for below reasons:

Reduces dimensionality of data:?LDA reduces the number of features in input data, making it easier for a classifier to identify underlying patterns in the data.
Removes irrelevant information:?LDA removes noise and irrelevant information from input data, making it easier for a classifier to focus on the most important features.
Handles multicollinearity:?Highly correlated features can negatively affect the performance of a classifier. LDA handles multicollinearity by identifying and removing redundant features.
Provides a lower-dimensional representation of data:?Making it easier to visualize and interpret the data.

The Why Not:

You might not want to use LDA for below reasons:

Assumes normally distributed data within each class:?Which may not be true in all cases.
Assumes equal covariance matrices of each class:?Which may not be true in all cases. If the covariance matrices are not equal, LDA is not be able to accurately estimate the transformation matrix.
Does not work well with small datasets:?LDA requires a sufficient number of data points to accurately estimate the parameters of the covariance matrices.
Does not work well with highly imbalanced datasets:?As it will not be able to accurately capture the patterns in minority class.
May introduce bias:?If the assumptions of the technique are not met. For example, if the data is not normally distributed or the covariance matrices are not equal, the results of LDA will be biased.

Alternatives to LDA:

Other techniques for supervised dimensionality reduction are as below:

Quadratic Discriminant Analysis (QDA):?It assumes that each class has its own covariance matrix, unlike LDA which assumes that all classes have equal covariance matrix.
Regularized Discriminant Analysis (RDA):?RDA is a variant of LDA that incorporates a regularization term to deal with high-dimensional datasets where number of features exceeds the number of samples.
Linear Support Vector Machines (SVM):?A classification technique that can also be used for dimensionality reduction by projecting the data onto a lower-dimensional subspace that maximally separates the classes. We covered this in previous edition (check?here).
Decision Trees:?Popular supervised classification technique that can also be used for feature selection by identifying the most informative features for the classification problem. We covered this in previous edition (check?here).
Random Forests:?Random forests are an ensemble learning technique that combine multiple decision trees to improve the accuracy and robustness of the classification problem. They can also be used for feature selection by identifying the most important features.

Time for you to support:

Reply to this email with your question
Forward/Share to a friend who can benefit from this
Chat on Substack with BxD (here)
Engage with BxD on LinkedIN (here)

In next coming posts, we will cover one more dimensionality reduction model: t-SNE.

Post that we will start with recommendation models such as Collaborative Filtering, Content-based Filtering, Knowledge-based Systems, Matrix Factorization, Hybrid Recommender Systems.

Let us know your feedback!

Until then,

Have a great time! ??

#businessxdata?#bxd?#LDA?#Dimensionality?#Reduction?#primer

BUSINESS x DATA

765 位关注者

要查看或添加评论，请登录

Mayank K.的更多文章

What we look for in new recruits?

2024年9月22日

What we look for in new recruits?

Personalization is the #1 use case of most of AI technology (including Generative AI, Knowledge Graphs…
500+ Enrollments, ?????????? Ratings and a Podcast

2024年9月14日

500+ Enrollments, ?????????? Ratings and a Podcast

We are all in for AI Driven Marketing Personalization. This is the niche where we want to build this business.
What you mean 'Build A Business'?

2024年9月7日

What you mean 'Build A Business'?

We are all in for AI Driven Personalization in Business. This is the niche where we want to build this business.
Why 'AI-Driven Personalization' niche?

2024年8月31日

Why 'AI-Driven Personalization' niche?

We are all in for AI Driven Personalization in Business. In fact, this is the niche where we want to build this…
Entering the next chapter of BxD

2024年8月24日

Entering the next chapter of BxD

We are all in for AI Driven Personalization in Business. And recently we created a course about it.

1 条评论
We are ranking #1

2024年8月17日

We are ranking #1

We are all in for AI Driven Personalization in Business. And recently we created a course about it.
My favorites from the new release

2024年7月27日

My favorites from the new release

The Full version of BxD newsletter has a new home. Subscribe on LinkedIn: ?? https://www.
Many senior level jobs inside....

2024年7月7日

Many senior level jobs inside....

Hi friend - As you know, we recently completed 100 editions of this newsletter and I was the primary publisher so far…
People need more jobs and videos.

2024年6月29日

People need more jobs and videos.

From the 100th edition celebration survey conducted last week- one point is standing out that people need more jobs and…
BxD Saturday Letter #202425

2024年6月22日

BxD Saturday Letter #202425

Please take 2 mins to send your feedback. Link: https://forms.

See all articles

BxD Primer Series: Linear Discriminant Analysis (LDA) for Dimensionality Reduction

Mayank K.

Founding Partner - BUSINESS x DATA

The What:

Comparison with PCA:

The How:

领英推荐

The Why:

The Why Not:

Alternatives to LDA:

Time for you to support:

BUSINESS x DATA

765 位关注者

Mayank K.的更多文章

社区洞察

其他会员也浏览了

Artificial Intelligence - Part 7.2 - GENERATIVE AI - Transformer Models

Convolutional Neural Networks (CNNs) and MaxPooling: How Machines See the World

BxD Primer Series: Matrix Factorization Recommendation Models

BxD Primer Series: DBSCAN Clustering Models

BxD Primer Series: K-Nearest Neighbors (K-NN) Models

DeepSORT Algorithm For Object Tracking

Day 30 — Hyperparameter Optimization

Hello World - Machine Learning & Neural Network

BxD Primer Series: Bagging Ensemble Models

The What:

Comparison with PCA:

The How:

领英推荐

The Why:

The Why Not:

Alternatives to LDA:

Time for you to support:

BUSINESS x DATA

765 位关注者

Mayank K.的更多文章

What we look for in new recruits?

500+ Enrollments, ?????????? Ratings and a Podcast

What you mean 'Build A Business'?

Why 'AI-Driven Personalization' niche?

Entering the next chapter of BxD

We are ranking #1

My favorites from the new release

Many senior level jobs inside....

People need more jobs and videos.

BxD Saturday Letter #202425

社区洞察

其他会员也浏览了

Artificial Intelligence - Part 7.2 - GENERATIVE AI - Transformer Models

Convolutional Neural Networks (CNNs) and MaxPooling: How Machines See the World

BxD Primer Series: Matrix Factorization Recommendation Models

BxD Primer Series: DBSCAN Clustering Models

BxD Primer Series: K-Nearest Neighbors (K-NN) Models

DeepSORT Algorithm For Object Tracking

Day 30 — Hyperparameter Optimization

Hello World - Machine Learning & Neural Network

BxD Primer Series: Bagging Ensemble Models