The Regression !
A technique for determining the statistical relationship between two or more variables where a change in a dependent variable is associated with, and depends on, a change in one or more independent variables.
Regression is basically a continuous supervised learning algorithm and is one of the popular algorithms.
Continuous and Discrete Data:
Discrete data are descriptive (like “fast” or “slow”), whereas continuous are numeric value calculated based on its relation with the independent variable.
Linear Regression example:
Take the following example
Let assume value of Y depends on X . The above graph is roughly drawn for the data above.
Here the red line which roughly touches all the points is called regression line and you can clearly see that line intersects at ‘c’ on Y axis (Y intercept)
And if “m” is slope of line then equation of line is given by:
Y= mX+c
Based on this formula you can predict the value of Y from X
And this is the basic idea of using Regression in ML.
From the above graph, since the relationship between the independent variables(X) and the dependant variable(Y) is linear, the regression is known as linear regression.
Training Data:
We apply regression on our data and find out slope and intercept and using line formula we solve Y for any given X or we can say our machine predicts the Y for any given X.
Errors:
Error are referred to the distance between any point and regression line, Regression lines are drawn such that they pass from mean value
I.e y’ = mx’ + c y’= ??y / n
x’= ??x/ n
n- number of dataset
Because of this, few points, in real-time data, may not fall exactly on line and these differences are errors.
Mean Squared Error:
This tells you how close your regression line is to a set of points away from the line. It takes the distance from the points to the regression line and squares them. This distance is an error. The squaring is done to remove all negative signs if any. It’s called the MSE as you’re finding the average of a set of errors.
Steps to calculate MSE :
- Find the regression line
- Substitute your values of X and then find the new predicted values of Y.
- Find the difference in the predicted value and the actual value.
- Square them
- Add all of the errors and then find the mean.
MSE is used to find the line of best fit. Small the value, better the result.
Equation :
R-squared error :
R-squared is a statistical measure of how close the data are to the fitted regression line. It is also known as the coefficient of determination, or the coefficient of multiple determination for multiple regression.
The definition of R-squared is fairly straight-forward. It is the percentage of the response variable variation that is explained by a linear model. Or:
R-squared = Explained variation / Total variation
R-squared is always between 0 and 100%:
- 0% indicates that the model explains none of the variability of the response data around its mean.
- 100% indicates that the model explains all the variability of the response data around its mean.
Equation of Hypothesis:
Cost Function :
A cost function is something you want to minimize. For example, your cost function might be the sum of squared errors over your training set.
Gradient Descent :
It is a method for finding the minimum of a function of multiple variables. So you can use gradient descent to minimize your cost function. If your cost is a function of N variables, then gradient is the length N vector that defines the particular direction in which the cost is increasing very rapidly.
For an excellent explanation on gradient descent, please visit the week-1 of this course by Andrew NG on Coursera. It is recommended that you take this course :)
https://www.coursera.org/learn/machine-learning
The image above is a very clear explanation for the Gradient Descent.
Credits : Andrew NG’s ML course from Coursera
Logistic Regression :
In statistics, the logistic model (or logit model) is a statistical model that is usually taken to apply to a binary dependent variable. In regression analysis, logistic regression or logit regression is estimating the parameters of a logistic model. More formally, a logistic model is one where the log-odds of the probability of an event is a linear combination of independent or predictor variables. The two possible dependent variable values are often labelled as “0” and “1”, which represent outcomes such as pass/fail, win/lose, alive/dead or healthy/sick.
It uses sigmoid function of the logistic function to predict the result.
Classification with Logistic Regression :
It is an algorithm that is quite simple and is very useful binary classification ( classifying yes/no, fast/slow, dead/alive etc ). It can be used to handle multiple classes too. Such a classification is called ‘one-vs-all’ classification. One-vs all is basically a collection of binary classifiers in which the probability is found. FInally, the one with the highest probability is chosen as the final result.
Sigmoid Function :
- If value of y(0)=0.5
- If y(z)>=0.5, then z >=0
- If y(z)<0.5, then z<0
- Hence, 0.5 is the cut off for the binary classification.
- If, y(z)> = 0.5, answer is 1 (positive case). Else if y(z)<0.5, answer is 0 (negative case).
Decision Boundary :
It acts as a separation between the two classes. The line defined by the separation between the two areas of y(z)=0 and y(z)=1 is nothing but a decision boundary. If the feature variables xi are non-linear, then the decision boundary can be non-linear as well.
In the above figure, the red line acts as a decision boundary between 2 classes i.e. blue circles and the green triangles.
The above graph is a circular decision boundary which separates the green triangles and the blue circles.
One vs all classification :
We saw the cases with 2 classes above.
Now let us consider the case with 3 classes.
If we try class 1 vs all, we get a binary classification like the one below :
If we try class 2 vs all, we get a binary classification like the one below :
If we try class 3 vs all, we get a binary classification like the one below :
This way, we find the value of hypothesis on all the 3 cases and choose the highest value among them. That is the final answer we need. Since there are 3 classes, its a 3 binary classification problem.
The same can be extended to N cases. It will be a N binary classification problem
Other Regression:
Linear and Logistic Regression are rigid and do not work well when dataset has larger number of outliers, and we might have to preprocess data like feature selection or PCA, and many other type of regression are available like Lasso Regression, Elastic Net etc.
- Written by Aditya Shenoy and Samyuktha Prabhu
For the link of the same on Medium, click here
#IndiaStudents #MachineLearning #ComputerScience #DataScience #DataAnalytics #ArtificialIntelligence #Computers #Engineering #Regression #LinearRegression #LogisticRegression
Data Scientist at Equifax
6 年Great one !! You have taught me the entire course of my OE in one go. Carry on the good work :)