Machine Learning: Predicting outcomes using Binary Logistic Regression
José Jaime Comé
Information Management Associate @ UNHCR ? Data Specialist/Statistician (Python||R||SQL||PowerBI||Excel) ? Youtube: 15K+ subscribers
Logistic regression is a statistical model that is used for binary classification by linear combination of data of one or more independent variables to “S” shaped Logistic Function. The output of this model is binary outcome, values that ranged between 0 and 1 or belonged two categories, such as yes or no, 0 or 1, or true or false.
Because the Logistic Regression is simple, ease to interpret and effective in addressing binary classification challenges, is extensively employed. Some of the real application is in the tasks like identify email as spam or not. Logistic regression, because of what was mentioned so far, falls into category of Machine Learning.
It’s referred as regression because is the extension of linear regression, but most used for classification problems. While logistic regression examines the association of (categorical or continuous) independent variable(s) with one dichotomous dependent variable making prediction about category variable that can be true or false, yes or no, 1 or 0 by fit the data in “S” shaped Logistic Function. In contrast, linear regression analyzes dependent continuous variable and identify the relationship between a continuous dependent variable and one or more independent variables.
The parameters of a logistic regression are most estimated by maximum-likelihood estimation (MLE), it does not evaluate the coefficient of determination (or R squared) as observed in linear regression.
Example
A group of 18 students spend hours studying for an exam. The table below shows the number of hours spent and test result for each student.
The value that shows that the student approved is 1 and 0 for failure. Looking in definition of logistic regression, this data is consistent with Logistic regression, so, model using this data can be created and afterwards new students can be classified based on hours studied.
Logistic Function – Sigmoid Function
The sigmoid function is a mathematical function used to map inputs that can be any value into values ranged from 0 and 1. So this function will fit the output data in “S” shaped Logistic Function.
Assumptions of Logistic Regression
·?????? The observations must be independent to each other.
·?????? Dependent variable must be binary or dichotomous. When there are more than two categories, SoftMax function is used.
·?????? Linear relationship must be observed between independent variables and log odds.
·?????? No outliers.
·?????? Large sample size.
·?????? Little or no multicollinearity between the predictor variables.
Terminologies in Logistic Regression
Here are some of the terminologies used:
·?????? Independent variables: The input data or predictors.
·?????? Dependent variable: the output or target variable.
·?????? Logistic function: Formula that transform inputs variables into probability value between 0 and 1.
·?????? Odds: ratio of something occurring to something not occurring or the chance of an event occurring.
·?????? Log-odds: Is nothing but natural log transformation of odds.
·?????? Coefficient: The logistic regression model’s estimated parameters.
·?????? Intercept: Constant in the logistic regression model, where log-odds of the outcome of all predictors are at 0.
领英推荐
·?????? Maximum likelihood estimation: method of estimating the coefficients of the logistic regression model, which maximizes the likelihood of observing the data.
Types of logistic regression
Binary logistic regression: or binary classification, the researcher expects for the response or dependent variable two possible outcome (e.g. 0 or 1, true or false, pass or fail).
Multinomial logistic regression: In this approach the researcher expects more than two outcomes.
Ordinal logistic regression: In this approach the researcher expects more than two outcomes, but for this case these values have a defined order. One of the examples of the outcome (e.g. 1 - Strongly Disagree, 2 – Disagree, 3 – Neutral, 4 – Agree, 5 - Strongly Agree)
Logit function
Logit function follow Bernoulli distribution, as such when the variables are connected to Bernoulli distribution it is called logit. In logistic regression the value of p is not known, however the researcher calculates the estimated value of parameter p. To map the linear combination of variables to Bernoulli distribution probability the follow formula is used:
This function range within 0 and 1. As we need the inverse of the function within this range the follow formula is applied:
Where alpha?is linear combination.
Binary Logistic Regression
As we stated that logistic regression examines the association of (categorical or continuous) independent variable(s) with one dichotomous dependent variable making prediction about category variable that can be true or false, yes or no, 1 or 0 by fit the data in “S” shaped Logistic Function, let’s define classic linear function:
This is linear function where Bo and B1 are coefficients and x1 is independent variable. Now let’s have the formula again equally to logit function:
Our object here is to estimate p, as such we must isolate.
And finally, we have:
Now we have the logistic regression function.
To adjust the model is necessary to estimate Bo and B1 of the model. To achieve this goal, Maximum likelihood estimation is used. Maximum likelihood finds the optimal way to fit a distribution to the data.
Practical Example in R
hours_spent <-c(0.1,0.2,0.3,0.9,1.3,1.4,1.7,1.8,2,2.1,2.3,2.4,2.7,2.8,3.2,3.3,3.5,3.6)
approved <- c(0,0,0,0,1,0,0,0,1,1,0,1,1,0,1,1,1,1)
df <- data.frame(hours_spent, approved)
summary(df)
# save model
m1 <- glm(approved ~ hours_spent,
data = df,
family = "binomial"
)
# print results
# In coefficients you find the most important results
# B0 and B1 in Estimate
# p-values in Pr(>|z|)
# H0: Bj=0 vs H1: Bj <> 0 for j = 0,1 using Wald test
# alpha = 0.05, if the p-value displayed is low in means that their are more evidence that the coefficient is different from 0
# when B1=0, X and Y are independent, (probability of approved will not depend on hours_spent)
# When B1>0, the probability that Y=1 increases with X (probability of approved will increases with hours_spent) and
# when B1<0, the probabiity that Y=1 decreases with X (probability of approved will decreases with hours_spent).
summary(m1)
# Multiplicative change in the odds when X increases by 1 unit
exp(coef(m1)["hours_spent"])
# Extra hour increase chance to pass by factor of 6.
# Predict
# predict probability to develop heart disease
pred <- predict(m1, newdata = data.frame(hours_spent = c(4.0)), type = "response"
)
# print prediction
pred
# if student studies 4.0hours has 97.4% chance of approve
# With confidence interval
pred <- predict(m1, newdata = data.frame(hours_spent = c(4.0)), type = "response", se = TRUE)
# print prediction
pred$fit
# 95% confidence interval for the prediction
lower <- pred$fit - (qnorm(0.975) * pred$se.fit)
upper <- pred$fit + (qnorm(0.975) * pred$se.fit)
c(lower, upper)