BxD Primer Series: Linear Discriminant Analysis (LDA) for Dimensionality Reduction

BxD Primer Series: Linear Discriminant Analysis (LDA) for Dimensionality Reduction

Hey there ??

Welcome to BxD Primer Series where we are covering topics such as Machine learning models, Neural Nets, GPT, Ensemble models, Hyper-automation in ‘one-post-one-topic’ format. Today’s post is on?LDA for Dimensionality Reduction. Let’s get started:

The What:

LDA (Linear Discriminant Analysis), is a supervised learning technique used for dimensionality reduction in machine learning. The goal of LDA is to project data onto a lower-dimensional space that maximizes the separation between the classes while minimizing the overlap between them.

Like PCA (Principal Component Analysis), LDA is also based on solving for Eigenvectors and Eigenvalues but it has several advantages over PCA. LDA takes into account the class labels, which can improve the separation between classes.

Comparison with PCA:

Purpose: PCA is an unsupervised dimensionality reduction technique that seeks to maximize the variance in the data while reducing the number of features. LDA, on the other hand, is a supervised technique that aims to maximize the separation between the classes by projecting the data onto a lower-dimensional space.

Assumptions:?PCA assumes that the data is linearly related and normally distributed. LDA assumes that the data is normally distributed and that the classes have equal covariance matrices.

Input:?PCA operates on the entire dataset, without regard to the class labels. LDA requires the class labels to be known and operates on the labeled subset of the data.

Objective:?PCA seeks to find the directions of maximum variance in the data, which are called principal components. LDA seeks to find the linear combinations of the features that best separate the classes.

Interpretability:?PCA produces new features that does not have clear interpretability in terms of original variables. LDA produces features that are chosen specifically to maximize separation between the classes, which make them more interpretable for specific classification problem.

Performance:?PCA is generally faster and more robust than LDA, but it may not be as effective at separating the classes. LDA can produce highly discriminative features, but it is sensitive to class imbalance and other factors.

The How:

The goal of LDA is to find a linear transformation (w) of input data that maximizes the separation between classes. The cost function used in LDA, also known as Fisher's criterion, is defined as the ratio of “between-class variance” to “within-class variance”. This cost function is as below:

No alt text provided for this image

Where,

  • w?is the projection vector that maximizes separation between the classes
  • S_b?is the between-class scatter matrix
  • S_w?is the within-class scatter matrix.

Let X be the input data matrix of size N x D, where N is the number of samples and D is the number of features. We assume that there are K classes in the data. Here is a step-by-step explanation of how LDA will work on this data:

Step 1: Calculate the mean vector of each class:

No alt text provided for this image

Where?N_k?is number of samples in class k and?x_i?is the i-th sample in class k

Step 2: Calculate the within-class scatter matrix:

No alt text provided for this image

Step 3: Calculate the between-class scatter matrix:

No alt text provided for this image

Where?μ?is the overall mean vector of whole data

Step 4: The goal of LDA is to find the projection vector?w?that maximizes the cost function?J(w). This can be achieved by solving the generalized eigenvalue problem:

No alt text provided for this image

Where?λ?is the eigenvalue of the matrix and?w?is the corresponding eigenvector. The eigenvectors represent the directions in which data should be projected in order to maximize the separation between the classes, and the eigenvalues represent the amount of variance in data that is captured by each eigenvector.

Step 5: Select the top k eigenvectors based on their corresponding eigenvalues:?The number of eigenvectors selected is equal to the desired dimensionality of the new subspace. We have already covered topic of selecting k in previous edition on PCA (check?here).

Step 6: Project original data on k-selected eigenvectors:?Transform your original data matrix X of size N x D into a new matrix X_new of size N x k by taking the product between original data matrix X and the matrix of selected eigenvectors W of size D x k:

X_new = XW

The resulting transformed data matrix X_new represents the data projected onto new subspace spanned by the selected eigenvectors. Each row of X_new corresponds to a sample in original data matrix X, but with the number of dimensions reduced from D to k.

The output of LDA can be used as input in a classifier.

The Why:

Consider using LDA for below reasons:

  1. Reduces dimensionality of data:?LDA reduces the number of features in input data, making it easier for a classifier to identify underlying patterns in the data.
  2. Removes irrelevant information:?LDA removes noise and irrelevant information from input data, making it easier for a classifier to focus on the most important features.
  3. Handles multicollinearity:?Highly correlated features can negatively affect the performance of a classifier. LDA handles multicollinearity by identifying and removing redundant features.
  4. Provides a lower-dimensional representation of data:?Making it easier to visualize and interpret the data.

The Why Not:

You might not want to use LDA for below reasons:

  1. Assumes normally distributed data within each class:?Which may not be true in all cases.
  2. Assumes equal covariance matrices of each class:?Which may not be true in all cases. If the covariance matrices are not equal, LDA is not be able to accurately estimate the transformation matrix.
  3. Does not work well with small datasets:?LDA requires a sufficient number of data points to accurately estimate the parameters of the covariance matrices.
  4. Does not work well with highly imbalanced datasets:?As it will not be able to accurately capture the patterns in minority class.
  5. May introduce bias:?If the assumptions of the technique are not met. For example, if the data is not normally distributed or the covariance matrices are not equal, the results of LDA will be biased.

Alternatives to LDA:

Other techniques for supervised dimensionality reduction are as below:

  1. Quadratic Discriminant Analysis (QDA):?It assumes that each class has its own covariance matrix, unlike LDA which assumes that all classes have equal covariance matrix.
  2. Regularized Discriminant Analysis (RDA):?RDA is a variant of LDA that incorporates a regularization term to deal with high-dimensional datasets where number of features exceeds the number of samples.
  3. Linear Support Vector Machines (SVM):?A classification technique that can also be used for dimensionality reduction by projecting the data onto a lower-dimensional subspace that maximally separates the classes. We covered this in previous edition (check?here).
  4. Decision Trees:?Popular supervised classification technique that can also be used for feature selection by identifying the most informative features for the classification problem. We covered this in previous edition (check?here).
  5. Random Forests:?Random forests are an ensemble learning technique that combine multiple decision trees to improve the accuracy and robustness of the classification problem. They can also be used for feature selection by identifying the most important features.

Time for you to support:

  1. Reply to this email with your question
  2. Forward/Share to a friend who can benefit from this
  3. Chat on Substack with BxD (here)
  4. Engage with BxD on LinkedIN (here)

In next coming posts, we will cover one more dimensionality reduction model: t-SNE.

Post that we will start with recommendation models such as Collaborative Filtering, Content-based Filtering, Knowledge-based Systems, Matrix Factorization, Hybrid Recommender Systems.

Let us know your feedback!

Until then,

Have a great time! ??

#businessxdata?#bxd?#LDA?#Dimensionality?#Reduction?#primer

要查看或添加评论,请登录

社区洞察

其他会员也浏览了