BxD Primer Series: Linear Discriminant Analysis (LDA) for Dimensionality Reduction
Hey there ??
Welcome to BxD Primer Series where we are covering topics such as Machine learning models, Neural Nets, GPT, Ensemble models, Hyper-automation in ‘one-post-one-topic’ format. Today’s post is on?LDA for Dimensionality Reduction. Let’s get started:
The What:
LDA (Linear Discriminant Analysis), is a supervised learning technique used for dimensionality reduction in machine learning. The goal of LDA is to project data onto a lower-dimensional space that maximizes the separation between the classes while minimizing the overlap between them.
Like PCA (Principal Component Analysis), LDA is also based on solving for Eigenvectors and Eigenvalues but it has several advantages over PCA. LDA takes into account the class labels, which can improve the separation between classes.
Comparison with PCA:
Purpose: PCA is an unsupervised dimensionality reduction technique that seeks to maximize the variance in the data while reducing the number of features. LDA, on the other hand, is a supervised technique that aims to maximize the separation between the classes by projecting the data onto a lower-dimensional space.
Assumptions:?PCA assumes that the data is linearly related and normally distributed. LDA assumes that the data is normally distributed and that the classes have equal covariance matrices.
Input:?PCA operates on the entire dataset, without regard to the class labels. LDA requires the class labels to be known and operates on the labeled subset of the data.
Objective:?PCA seeks to find the directions of maximum variance in the data, which are called principal components. LDA seeks to find the linear combinations of the features that best separate the classes.
Interpretability:?PCA produces new features that does not have clear interpretability in terms of original variables. LDA produces features that are chosen specifically to maximize separation between the classes, which make them more interpretable for specific classification problem.
Performance:?PCA is generally faster and more robust than LDA, but it may not be as effective at separating the classes. LDA can produce highly discriminative features, but it is sensitive to class imbalance and other factors.
The How:
The goal of LDA is to find a linear transformation (w) of input data that maximizes the separation between classes. The cost function used in LDA, also known as Fisher's criterion, is defined as the ratio of “between-class variance” to “within-class variance”. This cost function is as below:
Where,
Let X be the input data matrix of size N x D, where N is the number of samples and D is the number of features. We assume that there are K classes in the data. Here is a step-by-step explanation of how LDA will work on this data:
Step 1: Calculate the mean vector of each class:
Where?N_k?is number of samples in class k and?x_i?is the i-th sample in class k
Step 2: Calculate the within-class scatter matrix:
Step 3: Calculate the between-class scatter matrix:
Where?μ?is the overall mean vector of whole data
领英推荐
Step 4: The goal of LDA is to find the projection vector?w?that maximizes the cost function?J(w). This can be achieved by solving the generalized eigenvalue problem:
Where?λ?is the eigenvalue of the matrix and?w?is the corresponding eigenvector. The eigenvectors represent the directions in which data should be projected in order to maximize the separation between the classes, and the eigenvalues represent the amount of variance in data that is captured by each eigenvector.
Step 5: Select the top k eigenvectors based on their corresponding eigenvalues:?The number of eigenvectors selected is equal to the desired dimensionality of the new subspace. We have already covered topic of selecting k in previous edition on PCA (check?here).
Step 6: Project original data on k-selected eigenvectors:?Transform your original data matrix X of size N x D into a new matrix X_new of size N x k by taking the product between original data matrix X and the matrix of selected eigenvectors W of size D x k:
X_new = XW
The resulting transformed data matrix X_new represents the data projected onto new subspace spanned by the selected eigenvectors. Each row of X_new corresponds to a sample in original data matrix X, but with the number of dimensions reduced from D to k.
The output of LDA can be used as input in a classifier.
The Why:
Consider using LDA for below reasons:
The Why Not:
You might not want to use LDA for below reasons:
Alternatives to LDA:
Other techniques for supervised dimensionality reduction are as below:
Time for you to support:
In next coming posts, we will cover one more dimensionality reduction model: t-SNE.
Post that we will start with recommendation models such as Collaborative Filtering, Content-based Filtering, Knowledge-based Systems, Matrix Factorization, Hybrid Recommender Systems.
Let us know your feedback!
Until then,
Have a great time! ??