登录查看更多内容

BxD Primer Series: Logistic Regression Models

Mayank K.

Founding Partner - BUSINESS x DATA (Implementing AI-Driven Personalization at Scale)

发布日期: 2023年3月27日

Hey there ??

Welcome to BxD Primer Series where we are covering topics such as Machine learning models, Neural Nets, GPT, Ensemble models, Hyper-automation in ‘one-post-one-topic’ format. Today’s post is on Logistic Regression Models. Let’s get started:

The What:

In the previous edition on Linear Regression, we observed that its output can have unbounded values. But in lot many scenarios you need a probability of something being true so that you can make a decision if probability is greater than certain threshold. Logistic regression puts a bound on output value by applying an activation function on output of linear equation.

The model can also be used to identify which predictor variables are most strongly associated with the outcome variable.

A metaphor to understand logistic regression is to think of it as a "medical test" that predicts the likelihood of a disease based on certain symptoms. Just as the test calculates the probability of the disease based on the presence or absence of certain symptoms, logistic regression calculates the probability of the outcome variable based on the values of the predictor variables.

The Why:

Logistic regression is a popular and widely used statistical model for several reasons:

It's easy to implement and interpret: The model is straightforward to build, and the results are easy to understand, making it accessible to both statisticians and non-statisticians.
It's versatile: Logistic regression can be used with a wide range of predictor variables, including categorical and continuous variables. This makes it useful for analysing many different types of data.
It provides probabilistic predictions: Logistic regression outputs probabilities that can be interpreted as the likelihood of a particular outcome. This is particularly useful when making decisions based on the predicted outcome.
It can handle complex relationships: Logistic regression can model nonlinear relationships between predictor variables and the outcome variable using techniques like polynomial or interaction terms.
It's widely used in various fields: Logistic regression is commonly used in many different fields, including medicine, social sciences, business, and engineering. It's often the model of choice for many applications where binary outcomes are of interest.

Choices and Tradeoffs:

There are three main choices to make when developing a logistic regression model:

Activation Function
Cost Function
Optimisation Technique

? Choosing activation function:

Activation function serves the purpose of introducing bounds on an otherwise linear function that can take any value.

If given an input x, this equation will provide h(x). Here, x represents the feature vector. b is the bias. w represents the weight vector.

Any function that has a finite output and is differentiable for all values of x, can be used as activation function. Two good choices are:

Sigmoid Function: This is the most commonly used activation function. Its output ranges from 0 to 1.
Tanh Function: Tanh function can reduce the model training time. The derivative of tanh is steeper than sigmoid, hence the weights are updated to greater magnitudes, which can make the algorithm converge faster.

For further illustrations, we will select sigmoid functions as activation function.

? Choosing cost function:

Cost function quantifies the error between predicted and expected values and presents that error in the form of a single real number for all observations. Loss function quantifies error at observation level whereas cost function quantifies error at training dataset level.

The cost function used in linear regression, Mean Square Error, cannot be used for logistic regression because it might not converge to global minima. Here the trick will be to treat h(x) as binary i.e. it can only take 0 & 1 values and define the loss function as:

For any given x, we will define the loss in prediction as:

We can see that this choice of loss function works correctly for all cases of true value and predicted value:

Using this definition of loss, we can build a cost function which can be optimised to global minima.

Cost = -1 * Average of loss of all observations

This cost function is called 'Maximum Likelihood Estimation' or 'Log-Likelihood Function' or 'Log-Loss Function' or 'Binary Cross Entropy'. You will see it being used in many different type of ML models.

? Choosing optimisation technique:

Optimisation is about adjusting the parameters of any model to fit on training data in a way that makes a generalised model i.e. the model should do well on both training/seen & new/unseen data. In a logistic model, those parameters are weights (w) and bias (b).

As Log-Loss Function is a perfectly convex function with only one true minima point, we can use many different types of optimisation methods depending on:

Size of our training data (# Observations, # Features)
Computation time and resources available
Simplicity of implementation

Commonly used methods are:

Gradient Descent: Most popular method as it is easy to implement and is efficient for large datasets.
Newton Raphson Method: Can converge is very few iterations but is computationally intensive.
BFGS: If data size is very large, you might want to choose Broyden-Fletcher-Goldfarb-Shanno (BFGS). This is memory intensive.
Limited Memory-BFGS: This is less memory intensive version of BFGS but can converge to suboptimal solutions. Preferred for large scale scenarios.
Conjugate Gradient Descent: More commonly used for Linear Regression but can work well for small scale Logistic Regression scenarios.

The Why Not:

While logistic regression is a powerful and versatile statistical tool, it may not be the best model to use in certain situations. Here are some reasons why someone might not want to use a logistic regression model:

Highly Nonlinear relationships: Logistic regression may not capture this relationship accurately. In this case, a nonlinear model like a decision tree or neural network may be more appropriate.
Multicollinearity: When predictor variables are highly correlated with each other, it can be difficult to interpret the coefficients in a logistic regression model accurately. This can lead to unstable or biased parameter estimates. In such cases, one could consider using other models that are less sensitive to multicollinearity, such as ridge or lasso regression.
Outliers: Logistic regression assumes that the data follow a normal distribution, and outliers can significantly affect the model's estimates. In such cases, other robust regression models like robust logistic regression could be used.
Small sample sizes: When the sample size is small, it can be challenging to fit a reliable logistic regression model, and the model may not generalize well to new data. In such cases, one could consider using a Bayesian approach or a machine learning model that is less sensitive to small sample sizes.

Time for you to help in return:

Reply to this article with your question
Forward/Share to a friend who can benefit from this
Join BxD on Substack (here)
Engage with BxD on LinkedIN (here)

In next coming posts, we will be covering other types of regression models such as polynomial, ridge, lasso etc. in similar format.

Let us know your feedback!

Until then,

Enjoy life in full ??

#businessxdata #bxd #logisticregression #primer

BxD Primer Series: Logistic Regression Models

Mayank K.

Founding Partner - BUSINESS x DATA (Implementing AI-Driven Personalization at Scale)

The What:

The Why:

Choices and Tradeoffs:

The Why Not:

Time for you to help in return:

BUSINESS x DATA

760 位关注者

更多精彩文章

社区洞察

其他会员也浏览了

Regression: From Theory to ML

ML Kernel with RUST

BxD Primer Series: Support Vector Machine (SVM) Models

BxD Primer Series: Decision Trees for Classification

Evolution of Machine Learning: From Regression to Transformers Models

Linear Regression and Logistic Regression in Machine Learning

Logistic regression: A deep learning approach.

BxD Primer Series: Spectral Clustering Models

How Machine Learning is used in Predicting Stock Prices - LSTM

Model Selection

The What:

The Why:

Choices and Tradeoffs:

The Why Not:

Time for you to help in return:

BUSINESS x DATA

760 位关注者

What we look for in new recruits?

2024年9月22日

500+ Enrollments, ?????????? Ratings and a Podcast

2024年9月14日

What you mean 'Build A Business'?

2024年9月7日

Why 'AI-Driven Personalization' niche?

2024年8月31日

Entering the next chapter of BxD

2024年8月24日

We are ranking #1

2024年8月17日

My favorites from the new release

2024年7月27日

Many senior level jobs inside....

2024年7月7日

People need more jobs and videos.

2024年6月29日

BxD Saturday Letter #202425

2024年6月22日

社区洞察

其他会员也浏览了

Regression: From Theory to ML

ML Kernel with RUST

BxD Primer Series: Support Vector Machine (SVM) Models

BxD Primer Series: Decision Trees for Classification

Evolution of Machine Learning: From Regression to Transformers Models

Linear Regression and Logistic Regression in Machine Learning

Logistic regression: A deep learning approach.

BxD Primer Series: Spectral Clustering Models

How Machine Learning is used in Predicting Stock Prices - LSTM

Model Selection