Linear Discriminant Analysis
José Jaime Comé
Information Management Associate @ UNHCR ? Data Specialist/Statistician (Python||R||SQL||PowerBI||Excel) ? Youtube: 15K+ subscribers
Linear discriminant analysis (LDA) group data into categories, as such, this technique is used for dimensionality reduction and classification problems. LDA is composed by discriminant function (for more than two groups, a set of discriminant functions), these functions are linear combination of independent variables (which looks like multiple regression equation) that will discriminate between the categories in perfect manner. The result model can be used for prediction (assignment of new cases into defined groups).
Let’s have data presented in 2D space. This data consist in two class and has been presented without discrimination.
Now the researcher can find a vector direction that best discriminates between these two classes.
While PCA aims to find the most accurate data representation in a lower dimensional space spanned by the maximum variance directions, this might not work in some cases. In other hand, Discriminant Analysis represent data in lower dimension preserving the discriminatory information between different classes of the dataset.
As stated above, PCA looks the most variation in the data, LDA tries to maximize the separation of known categories.
To have more understanding, let’s have an example: imagine that the researcher needs to classify students based their achievement. For each student enrolled, some information like 'test score' and others are collected. At the end, the researcher can have students into groups, and we can also have the percentage of those correctly classified. New student to be enrolled, can be classified based on the resulting?model. The researcher can combine this information into function to determine how good the students can be discriminated between groups.
Assumptions
The assumptions are the same as those for MANOVA. LDA are quite sensitive to outliers. Independent variables must be normal in each group. Variances among group variables are the same across levels of predictors. LDA assume that covariances are equal while Quadratic Discriminant Analysis may be used when covariances are not equal. The sample are randomly selected and score on one variable is assumed to be independent of scores for all other observation included. Group membership must be mutually exclusive (cases can’t belongs to more than one group).
领英推荐
LDA may still be reliable when using dichotomous variables (where multivariate normality is often violated)
The steps in LDA
? Formulating the problem before analysis.
? Estimate discriminant function coefficients.
? Determination of significance of discriminant functions.
? Interpretation of the results obtained.
? Validity of the result.
When use LDA
·?????? When classes are well-separated is better Linear discriminant analysis than Logistic regression because estimates become more unstable for logistic analysis.
·?????? When n is small and distribution of predictors are approximately normal in each class.
·?????? When have more than two response classes, because it also provides low-dimensional views of data.
Assistente de dados na ICAP at Columbia University
9 个月Thank you, for sharing this knowledge ??.