GEOMETRIC INTUITION OF LOGISTIC REGRESSION



By term logistic regression you might think this to be a regression problem. But this is not a regression problem. It is a classification problem.?

????Logistic regression is a statistical model that uses a logistic function to model a binary dependent variable. In geometric interpretation terms, Logistic regression tries to find a line or plane which best separates the two classes. Logistic regression works with a dataset that is almost or perfectly linear separable


.

If the plane π passes through the origin , then

plane(π) = w^Tx.

Overall we have to find the w and b which corresponds to the plane such that the plane π separates the positive and negative points.

Suppose we take a data point Xi which is our query point and we have to find a distance of that point from the plane(π). So the distance di is written as:

di=w^Txi/||w||

If w is a unit vector and plane passes through the origin,

di = w^Tx.

So this is the distance of the point Xi from the plane but how would you determine that the current distance of the point is considered as positive or negative?.


If Xi and normal w are lying on the same side or in the same direction then we considered that the distance di>0 and yi is positive.

If Xj and normal w are lying on the opposite side or in the opposite direction then we considered that the distance dj<0 and yi is negative.


CLASSIFIER

If w^Tx>0 then yi = +1

If w^Tx <0 then yi = -1

There are 4 cases that we can consider:

case 1 : If our class label is positive i.e. yi = positive and Xi and w lie on the same side i.e. w^Tx >0, then the classifier is predicted that the class is also positive means its prediction is true. Here we can see that yi*w^Tx > o since positive *positive = Positive >0.

case 2 : If our class label is negativetive i.e. yi = negative and Xi and w lie on the opposite side i.e. w^Tx <0, then the classifier is predicted that the class is also negative means its prediction is true. Here we can see that yi*w^Tx > o since negative*negative = Positive >0.

case 3 : If our class label is positive i.e. yi = positive and Xi and w lie on the opposite side i.e. w^Tx <0, then the classifier is predicted that the class is negative means its prediction is false. Here we can see that yi*w^Tx < o since positive *negative = negative < 0.

case 4 : If our class label is negative i.e. yi = negative and Xi and w lie on the same side i.e. w^Tx >0, then the classifier is predicted that the class is positive means its prediction is false. Here we can see that yi*w^Tx < o since positive *negative = negative < 0.

We want the classifier to be very good. i.e. we want maximum number of correctly classified point. we want as many points as possible to have yi*w^Tx>0. Given a training dataset (xi,yi) is fixed. we want to take w so that it maximizes yi*w^Txi.

HOW OUTLIER WILL IMPACT THE MODEL

Suppose we take an example of two planes π1 and π2 that are used to separate the 2 class label data points positive and negative.

There is one outlier in the dataset. If we find yi*w^Txi for π1 we get a negative value and for π2 we get positive value. So we conclude that π2 is a better classifier which is not true. Plane π1 classifies more accurately than plane π2. So such outlier impact more on our model.

To preserve the model from outlier we have to modify our optimal function.

MODIFYING OPTIMAL FUNCTION USING SQUASHING

To modify our optimal function we will use the squashing technique. The idea is that :

  1. If the distance of a point from plane is small then we will use it as it is.
  2. If the distance of a point from plane is large then we will convert it into smaller value.

We will use some function over our optimal function for preserving the model from such outliers.

we will use the sigmoid function to optimize our equation.

Sigmoid Function

The sigmoid function is written as :

σ(x) = 1/(1 + e^?x)

Maximum value of sigmoid function is 1.

Minimum value of sigmoid function is 0.

Our optimal function will be :





要查看或添加评论,请登录

社区洞察

其他会员也浏览了