登录查看更多内容

Machine Learning: Predicting outcomes using Binary Logistic Regression

José Jaime Comé

Information Management Associate @ UNHCR ? Data Specialist/Statistician (Python||R||SQL||PowerBI||Excel) ? Youtube: 15K+ subscribers

发布日期: 2024年8月7日

Logistic regression is a statistical model that is used for binary classification by linear combination of data of one or more independent variables to “S” shaped Logistic Function. The output of this model is binary outcome, values that ranged between 0 and 1 or belonged two categories, such as yes or no, 0 or 1, or true or false.

Because the Logistic Regression is simple, ease to interpret and effective in addressing binary classification challenges, is extensively employed. Some of the real application is in the tasks like identify email as spam or not. Logistic regression, because of what was mentioned so far, falls into category of Machine Learning.

It’s referred as regression because is the extension of linear regression, but most used for classification problems. While logistic regression examines the association of (categorical or continuous) independent variable(s) with one dichotomous dependent variable making prediction about category variable that can be true or false, yes or no, 1 or 0 by fit the data in “S” shaped Logistic Function. In contrast, linear regression analyzes dependent continuous variable and identify the relationship between a continuous dependent variable and one or more independent variables.

The parameters of a logistic regression are most estimated by maximum-likelihood estimation (MLE), it does not evaluate the coefficient of determination (or R squared) as observed in linear regression.

Example

A group of 18 students spend hours studying for an exam. The table below shows the number of hours spent and test result for each student.

The value that shows that the student approved is 1 and 0 for failure. Looking in definition of logistic regression, this data is consistent with Logistic regression, so, model using this data can be created and afterwards new students can be classified based on hours studied.

Logistic Function – Sigmoid Function

The sigmoid function is a mathematical function used to map inputs that can be any value into values ranged from 0 and 1. So this function will fit the output data in “S” shaped Logistic Function.

Assumptions of Logistic Regression

·?????? The observations must be independent to each other.

·?????? Dependent variable must be binary or dichotomous. When there are more than two categories, SoftMax function is used.

·?????? Linear relationship must be observed between independent variables and log odds.

·?????? No outliers.

·?????? Large sample size.

·?????? Little or no multicollinearity between the predictor variables.

Terminologies in Logistic Regression

Here are some of the terminologies used:

·?????? Independent variables: The input data or predictors.

·?????? Dependent variable: the output or target variable.

·?????? Logistic function: Formula that transform inputs variables into probability value between 0 and 1.

·?????? Odds: ratio of something occurring to something not occurring or the chance of an event occurring.

·?????? Log-odds: Is nothing but natural log transformation of odds.

·?????? Coefficient: The logistic regression model’s estimated parameters.

·?????? Intercept: Constant in the logistic regression model, where log-odds of the outcome of all predictors are at 0.

领英推荐

Linear Regression

Raj Kishore Agrawal 6 个月前

Elastic Net Regression: Combining Both Ridge & Lasso

Shakil Khan 5 个月前

Lasso Regression: A Game-Changer for Feature Selection

Shakil Khan 5 个月前

·?????? Maximum likelihood estimation: method of estimating the coefficients of the logistic regression model, which maximizes the likelihood of observing the data.

Types of logistic regression

Binary logistic regression: or binary classification, the researcher expects for the response or dependent variable two possible outcome (e.g. 0 or 1, true or false, pass or fail).

Multinomial logistic regression: In this approach the researcher expects more than two outcomes.

Ordinal logistic regression: In this approach the researcher expects more than two outcomes, but for this case these values have a defined order. One of the examples of the outcome (e.g. 1 - Strongly Disagree, 2 – Disagree, 3 – Neutral, 4 – Agree, 5 - Strongly Agree)

Logit function

Logit function follow Bernoulli distribution, as such when the variables are connected to Bernoulli distribution it is called logit. In logistic regression the value of p is not known, however the researcher calculates the estimated value of parameter p. To map the linear combination of variables to Bernoulli distribution probability the follow formula is used:

This function range within 0 and 1. As we need the inverse of the function within this range the follow formula is applied:

Where alpha?is linear combination.

Binary Logistic Regression

As we stated that logistic regression examines the association of (categorical or continuous) independent variable(s) with one dichotomous dependent variable making prediction about category variable that can be true or false, yes or no, 1 or 0 by fit the data in “S” shaped Logistic Function, let’s define classic linear function:

This is linear function where Bo and B1 are coefficients and x1 is independent variable. Now let’s have the formula again equally to logit function:

Our object here is to estimate p, as such we must isolate.

And finally, we have:

Now we have the logistic regression function.

To adjust the model is necessary to estimate Bo and B1 of the model. To achieve this goal, Maximum likelihood estimation is used. Maximum likelihood finds the optimal way to fit a distribution to the data.

Practical Example in R

hours_spent <-c(0.1,0.2,0.3,0.9,1.3,1.4,1.7,1.8,2,2.1,2.3,2.4,2.7,2.8,3.2,3.3,3.5,3.6)
approved <- c(0,0,0,0,1,0,0,0,1,1,0,1,1,0,1,1,1,1)

df <- data.frame(hours_spent, approved)

summary(df)

# save model
m1 <- glm(approved ~ hours_spent,
          data = df,
          family = "binomial"
)

# print results
# In coefficients you find the most important results
# B0 and B1 in Estimate
# p-values in Pr(>|z|)
# H0: Bj=0 vs H1: Bj <> 0 for j = 0,1 using Wald test
# alpha = 0.05, if the p-value displayed is low in means that their are more evidence that the coefficient is different from 0
# when B1=0, X and Y are independent, (probability of approved will not depend on hours_spent)
# When B1>0, the probability that Y=1 increases with X (probability of approved will increases with hours_spent) and
# when B1<0, the probabiity that Y=1 decreases with X (probability of approved will decreases with hours_spent).
summary(m1)

# Multiplicative change in the odds when X increases by 1 unit
exp(coef(m1)["hours_spent"])
# Extra hour increase chance to pass by factor of 6.

# Predict
# predict probability to develop heart disease
pred <- predict(m1, newdata = data.frame(hours_spent = c(4.0)), type = "response"
)

# print prediction
pred

# if student studies 4.0hours has 97.4% chance of approve

# With confidence interval
pred <- predict(m1, newdata = data.frame(hours_spent = c(4.0)), type = "response", se = TRUE)

# print prediction
pred$fit

# 95% confidence interval for the prediction
lower <- pred$fit - (qnorm(0.975) * pred$se.fit)
upper <- pred$fit + (qnorm(0.975) * pred$se.fit)
c(lower, upper)

要查看或添加评论，请登录

José Jaime Comé的更多文章

Prediction Model using Autoregressive Integrated Moving Average (ARIMA)

2024年7月10日

Prediction Model using Autoregressive Integrated Moving Average (ARIMA)

An autoregressive integrated moving average (ARIMA) is a statistical analysis model that predict values based on…
Comparing means of different groups (Analysis of Variance)

2024年6月29日

Comparing means of different groups (Analysis of Variance)

Analysis of Variance (ANOVA) is collection of statistical tests used to analyze the difference between means of more…

2 条评论
Linear Discriminant Analysis

2024年6月21日

Linear Discriminant Analysis

Linear discriminant analysis (LDA) group data into categories, as such, this technique is used for dimensionality…

1 条评论
Factor Analysis

2024年6月11日

Factor Analysis

Factor analysis is a statistical method used to describe variability among large number of observed, correlated…

1 条评论
Principal Component Analysis (PCA)

2024年5月27日

Principal Component Analysis (PCA)

The number of features or dimensions in a dataset can lead to issues such as overfitting, increasing computation…

1 条评论
Data Governance

2024年5月15日

Data Governance

While Data management is part of the overall management of data. Data governance in short is just documentation…
Data Mining with Cluster Analysis

2024年2月25日

Data Mining with Cluster Analysis

The Cluster analysis is technique of statistical analysis and one of the method of data mining that consist of dividing…

See all articles

Machine Learning: Predicting outcomes using Binary Logistic Regression

José Jaime Comé

Information Management Associate @ UNHCR ? Data Specialist/Statistician (Python||R||SQL||PowerBI||Excel) ? Youtube: 15K+ subscribers

领英推荐

José Jaime Comé的更多文章

社区洞察

其他会员也浏览了

Ridge Regression: Tackling Bias-Variance Tradeoff

Matrix Operations in Linear Regression

Why Mean Squared Error for Linear Regression?

Idea of Use and Abuse of Regression

What Is Regression In Machine Learning?

Linear Regression

Linear Regression. Making Sense Of The Future Based On The Past.

Logictic or Linear Regression? Are they same? Look alike? Haha it's not

Understanding Gradient Descent in Linear Regression.

What is Regression?

领英推荐

José Jaime Comé的更多文章

Prediction Model using Autoregressive Integrated Moving Average (ARIMA)

Comparing means of different groups (Analysis of Variance)

Linear Discriminant Analysis

Factor Analysis

Principal Component Analysis (PCA)

Data Governance

Data Mining with Cluster Analysis

社区洞察

其他会员也浏览了

Ridge Regression: Tackling Bias-Variance Tradeoff

Matrix Operations in Linear Regression

Why Mean Squared Error for Linear Regression?

Idea of Use and Abuse of Regression

What Is Regression In Machine Learning?

Linear Regression

Linear Regression. Making Sense Of The Future Based On The Past.

Logictic or Linear Regression? Are they same? Look alike? Haha it's not

Understanding Gradient Descent in Linear Regression.

What is Regression?