登录查看更多内容

Linear Discriminant Analysis

José Jaime Comé

Information Management Associate @ UNHCR ? Data Specialist/Statistician (Python||R||SQL||PowerBI||Excel) ? Youtube: 15K+ subscribers

发布日期: 2024年6月21日

Linear discriminant analysis (LDA) group data into categories, as such, this technique is used for dimensionality reduction and classification problems. LDA is composed by discriminant function (for more than two groups, a set of discriminant functions), these functions are linear combination of independent variables (which looks like multiple regression equation) that will discriminate between the categories in perfect manner. The result model can be used for prediction (assignment of new cases into defined groups).

Let’s have data presented in 2D space. This data consist in two class and has been presented without discrimination.

Now the researcher can find a vector direction that best discriminates between these two classes.

While PCA aims to find the most accurate data representation in a lower dimensional space spanned by the maximum variance directions, this might not work in some cases. In other hand, Discriminant Analysis represent data in lower dimension preserving the discriminatory information between different classes of the dataset.

As stated above, PCA looks the most variation in the data, LDA tries to maximize the separation of known categories.

To have more understanding, let’s have an example: imagine that the researcher needs to classify students based their achievement. For each student enrolled, some information like 'test score' and others are collected. At the end, the researcher can have students into groups, and we can also have the percentage of those correctly classified. New student to be enrolled, can be classified based on the resulting?model. The researcher can combine this information into function to determine how good the students can be discriminated between groups.

Assumptions

The assumptions are the same as those for MANOVA. LDA are quite sensitive to outliers. Independent variables must be normal in each group. Variances among group variables are the same across levels of predictors. LDA assume that covariances are equal while Quadratic Discriminant Analysis may be used when covariances are not equal. The sample are randomly selected and score on one variable is assumed to be independent of scores for all other observation included. Group membership must be mutually exclusive (cases can’t belongs to more than one group).

领英推荐

The Trick That Helps All Statisticians Survive

Keith McNulty 7 个月前

Determining weights in a GRAPHRAG

Ajit Jaokar 10 个月前

17 More Must-Know Data Science Interview Questions and…

Gregory Piatetsky-Shapiro 8 年前

LDA may still be reliable when using dichotomous variables (where multivariate normality is often violated)

The steps in LDA

? Formulating the problem before analysis.

? Estimate discriminant function coefficients.

? Determination of significance of discriminant functions.

? Interpretation of the results obtained.

? Validity of the result.

When use LDA

·?????? When classes are well-separated is better Linear discriminant analysis than Logistic regression because estimates become more unstable for logistic analysis.

·?????? When n is small and distribution of predictors are approximately normal in each class.

·?????? When have more than two response classes, because it also provides low-dimensional views of data.

John Bernabé Rafael Baptista Tomás

Assistente de dados na ICAP at Columbia University

9 个月

Thank you, for sharing this knowledge ??.

要查看或添加评论，请登录

José Jaime Comé的更多文章

Machine Learning: Predicting outcomes using Binary Logistic Regression

2024年8月7日

Machine Learning: Predicting outcomes using Binary Logistic Regression

Logistic regression is a statistical model that is used for binary classification by linear combination of data of one…
Prediction Model using Autoregressive Integrated Moving Average (ARIMA)

2024年7月10日

Prediction Model using Autoregressive Integrated Moving Average (ARIMA)

An autoregressive integrated moving average (ARIMA) is a statistical analysis model that predict values based on…
Comparing means of different groups (Analysis of Variance)

2024年6月29日

Comparing means of different groups (Analysis of Variance)

Analysis of Variance (ANOVA) is collection of statistical tests used to analyze the difference between means of more…

2 条评论
Factor Analysis

2024年6月11日

Factor Analysis

Factor analysis is a statistical method used to describe variability among large number of observed, correlated…

1 条评论
Principal Component Analysis (PCA)

2024年5月27日

Principal Component Analysis (PCA)

The number of features or dimensions in a dataset can lead to issues such as overfitting, increasing computation…

1 条评论
Data Governance

2024年5月15日

Data Governance

While Data management is part of the overall management of data. Data governance in short is just documentation…
Data Mining with Cluster Analysis

2024年2月25日

Data Mining with Cluster Analysis

The Cluster analysis is technique of statistical analysis and one of the method of data mining that consist of dividing…

See all articles

Linear Discriminant Analysis

José Jaime Comé

Information Management Associate @ UNHCR ? Data Specialist/Statistician (Python||R||SQL||PowerBI||Excel) ? Youtube: 15K+ subscribers

领英推荐

José Jaime Comé的更多文章

社区洞察

其他会员也浏览了

F-distribution and its Application in Hypothesis Testing

Checking for the Assumptions of Linear Regression using the mtcars dataset ????

Understanding Wide Confidence Intervals and Significant p-values in Research

How to Deal With Imbalanced Classification and Imbalanced Regression Data?

Beyond the Average: The Diverse World of Statistical Means

I ran 580 model-dataset experiments to show that, even if you try very hard, it is almost impossible to know that a model is degrading just by looking

???? Unlocking the Essence of Statistical Significance: Understanding the P-Value ????

Approaches to Repeated Measures: Repeated Measures ANOVA, Marginal, and Mixed Models

When it Makes Sense to Categorize a Continuous Predictor in a Regression Model

Correlation plots in?R

领英推荐

José Jaime Comé的更多文章

Machine Learning: Predicting outcomes using Binary Logistic Regression

Prediction Model using Autoregressive Integrated Moving Average (ARIMA)

Comparing means of different groups (Analysis of Variance)

Factor Analysis

Principal Component Analysis (PCA)

Data Governance

Data Mining with Cluster Analysis

社区洞察

其他会员也浏览了

F-distribution and its Application in Hypothesis Testing

Checking for the Assumptions of Linear Regression using the mtcars dataset ????

Understanding Wide Confidence Intervals and Significant p-values in Research

How to Deal With Imbalanced Classification and Imbalanced Regression Data?

Beyond the Average: The Diverse World of Statistical Means

I ran 580 model-dataset experiments to show that, even if you try very hard, it is almost impossible to know that a model is degrading just by looking

???? Unlocking the Essence of Statistical Significance: Understanding the P-Value ????

Approaches to Repeated Measures: Repeated Measures ANOVA, Marginal, and Mixed Models

When it Makes Sense to Categorize a Continuous Predictor in a Regression Model

Correlation plots in?R