Derivation of Linear Discriminant Analysis (LDA) using Bayes Theorem
Bilal Hungund
Sr. Data Scientist at Halliburton, Computational Sciences and Engineering for Energy, CSEE
Bayes Probability Theorem is given as,
For Multiple Events, it is given as,
Gaussian Probability Distribution function,
Constraint to perform LDA is pretty simple, i.e., (for 2 labels in class variables)
Here fi is a probability of having 1 as outcome and fj is a probability of having 0 as an outcome, and the same for the rest of the labels if present. fi can be further elaborated by calculating the probability of "i" given X (predictors), i.e.,
Therefore applying Bayes theorem in the above expression,
Hence, final expression would be, (eq. no. 1)
In LDA, X represents a matrix, because it contains a set of predictors, hence all operations will be performed with respect to matrix calculations.
Ci is Covariance of "i", substituting the above formula on eq. no. 1,
L.H.S.
R.H.S.
mu i and mu j are mean of labels
Applying Logarithm both side,
L.H.S.
R.H.S
Multiplying -2 both sides, therefore, here L.H.S < R.H.S,
L.H.S.
R.H.S.
Basically, in LDA it is always recommended to use pooled variance rather than calculating the variance of each label, as it provides more variability among the data,
Here Covariance matrix, i.e., Ci and Cj is given as,
Hence, the pooled variance will be,
Substituting Ci and Cj as C in both side,
L.H.S.
R.H.S.
Hence, the final equation eq. no. 2, is,
Now, third part of equation denotes as (A+B)^2, i.e.,
The first part of the above equation would get canceled as it is common on both sides. At last, multiply -(1/2) both sides, for L.H.S. > R.H.S. in eq. no.2,
Therefore, LDA is given as,
LDA can be used for predicting multiclass labels, the only constraint is its cost function should be greater than other labels cost functions.
Thank you, have a nice day.