Derivation of Linear Discriminant Analysis (LDA) using Bayes Theorem

Bayes Probability Theorem is given as,

No alt text provided for this image

For Multiple Events, it is given as,

No alt text provided for this image

Gaussian Probability Distribution function,

No alt text provided for this image

Constraint to perform LDA is pretty simple, i.e., (for 2 labels in class variables)

No alt text provided for this image

Here fi is a probability of having 1 as outcome and fj is a probability of having 0 as an outcome, and the same for the rest of the labels if present. fi can be further elaborated by calculating the probability of "i" given X (predictors), i.e.,

No alt text provided for this image

Therefore applying Bayes theorem in the above expression,

No alt text provided for this image

Hence, final expression would be, (eq. no. 1)

No alt text provided for this image

In LDA, X represents a matrix, because it contains a set of predictors, hence all operations will be performed with respect to matrix calculations.

No alt text provided for this image

Ci is Covariance of "i", substituting the above formula on eq. no. 1,

L.H.S.

No alt text provided for this image

R.H.S.

No alt text provided for this image

mu i and mu j are mean of labels

Applying Logarithm both side,

No alt text provided for this image

L.H.S.

No alt text provided for this image

R.H.S

No alt text provided for this image

Multiplying -2 both sides, therefore, here L.H.S < R.H.S,

L.H.S.

No alt text provided for this image

R.H.S.

No alt text provided for this image

Basically, in LDA it is always recommended to use pooled variance rather than calculating the variance of each label, as it provides more variability among the data,

Here Covariance matrix, i.e., Ci and Cj is given as,

No alt text provided for this image

Hence, the pooled variance will be,

No alt text provided for this image

Substituting Ci and Cj as C in both side,

L.H.S.

No alt text provided for this image

R.H.S.

No alt text provided for this image

Hence, the final equation eq. no. 2, is,

No alt text provided for this image

Now, third part of equation denotes as (A+B)^2, i.e.,

No alt text provided for this image
No alt text provided for this image

The first part of the above equation would get canceled as it is common on both sides. At last, multiply -(1/2) both sides, for L.H.S. > R.H.S. in eq. no.2,

No alt text provided for this image
No alt text provided for this image

Therefore, LDA is given as,

No alt text provided for this image

LDA can be used for predicting multiclass labels, the only constraint is its cost function should be greater than other labels cost functions.

Thank you, have a nice day.

要查看或添加评论,请登录

Bilal Hungund的更多文章

  • Gradient Descent in Python

    Gradient Descent in Python

    To start with Machine Learning, one of the basic aspects of proceeding is to understand and implement the concept of…

    1 条评论

社区洞察

其他会员也浏览了