BxD Primer Series: Logistic Regression Models
Hey there ??
Welcome to BxD Primer Series where we are covering topics such as Machine learning models, Neural Nets, GPT, Ensemble models, Hyper-automation in ‘one-post-one-topic’ format. Today’s post is on Logistic Regression Models. Let’s get started:
The What:
In the previous edition on Linear Regression, we observed that its output can have unbounded values. But in lot many scenarios you need a probability of something being true so that you can make a decision if probability is greater than certain threshold. Logistic regression puts a bound on output value by applying an activation function on output of linear equation.
The model can also be used to identify which predictor variables are most strongly associated with the outcome variable.
A metaphor to understand logistic regression is to think of it as a "medical test" that predicts the likelihood of a disease based on certain symptoms. Just as the test calculates the probability of the disease based on the presence or absence of certain symptoms, logistic regression calculates the probability of the outcome variable based on the values of the predictor variables.
The Why:
Logistic regression is a popular and widely used statistical model for several reasons:
- It's easy to implement and interpret: The model is straightforward to build, and the results are easy to understand, making it accessible to both statisticians and non-statisticians.
- It's versatile: Logistic regression can be used with a wide range of predictor variables, including categorical and continuous variables. This makes it useful for analysing many different types of data.
- It provides probabilistic predictions: Logistic regression outputs probabilities that can be interpreted as the likelihood of a particular outcome. This is particularly useful when making decisions based on the predicted outcome.
- It can handle complex relationships: Logistic regression can model nonlinear relationships between predictor variables and the outcome variable using techniques like polynomial or interaction terms.
- It's widely used in various fields: Logistic regression is commonly used in many different fields, including medicine, social sciences, business, and engineering. It's often the model of choice for many applications where binary outcomes are of interest.
Choices and Tradeoffs:
There are three main choices to make when developing a logistic regression model:
- Activation Function
- Cost Function
- Optimisation Technique
? Choosing activation function:
Activation function serves the purpose of introducing bounds on an otherwise linear function that can take any value.
If given an input x, this equation will provide h(x). Here, x represents the feature vector. b is the bias. w represents the weight vector.
Any function that has a finite output and is differentiable for all values of x, can be used as activation function. Two good choices are:
- Sigmoid Function: This is the most commonly used activation function. Its output ranges from 0 to 1.
- Tanh Function: Tanh function can reduce the model training time. The derivative of tanh is steeper than sigmoid, hence the weights are updated to greater magnitudes, which can make the algorithm converge faster.
For further illustrations, we will select sigmoid functions as activation function.
? Choosing cost function:
Cost function quantifies the error between predicted and expected values and presents that error in the form of a single real number for all observations. Loss function quantifies error at observation level whereas cost function quantifies error at training dataset level.
The cost function used in linear regression, Mean Square Error, cannot be used for logistic regression because it might not converge to global minima. Here the trick will be to treat h(x) as binary i.e. it can only take 0 & 1 values and define the loss function as:
For any given x, we will define the loss in prediction as:
We can see that this choice of loss function works correctly for all cases of true value and predicted value:
Using this definition of loss, we can build a cost function which can be optimised to global minima.
Cost = -1 * Average of loss of all observations
This cost function is called 'Maximum Likelihood Estimation' or 'Log-Likelihood Function' or 'Log-Loss Function' or 'Binary Cross Entropy'. You will see it being used in many different type of ML models.
? Choosing optimisation technique:
Optimisation is about adjusting the parameters of any model to fit on training data in a way that makes a generalised model i.e. the model should do well on both training/seen & new/unseen data. In a logistic model, those parameters are weights (w) and bias (b).
As Log-Loss Function is a perfectly convex function with only one true minima point, we can use many different types of optimisation methods depending on:
- Size of our training data (# Observations, # Features)
- Computation time and resources available
- Simplicity of implementation
Commonly used methods are:
- Gradient Descent: Most popular method as it is easy to implement and is efficient for large datasets.
- Newton Raphson Method: Can converge is very few iterations but is computationally intensive.
- BFGS: If data size is very large, you might want to choose Broyden-Fletcher-Goldfarb-Shanno (BFGS). This is memory intensive.
- Limited Memory-BFGS: This is less memory intensive version of BFGS but can converge to suboptimal solutions. Preferred for large scale scenarios.
- Conjugate Gradient Descent: More commonly used for Linear Regression but can work well for small scale Logistic Regression scenarios.
The Why Not:
While logistic regression is a powerful and versatile statistical tool, it may not be the best model to use in certain situations. Here are some reasons why someone might not want to use a logistic regression model:
- Highly Nonlinear relationships: Logistic regression may not capture this relationship accurately. In this case, a nonlinear model like a decision tree or neural network may be more appropriate.
- Multicollinearity: When predictor variables are highly correlated with each other, it can be difficult to interpret the coefficients in a logistic regression model accurately. This can lead to unstable or biased parameter estimates. In such cases, one could consider using other models that are less sensitive to multicollinearity, such as ridge or lasso regression.
- Outliers: Logistic regression assumes that the data follow a normal distribution, and outliers can significantly affect the model's estimates. In such cases, other robust regression models like robust logistic regression could be used.
- Small sample sizes: When the sample size is small, it can be challenging to fit a reliable logistic regression model, and the model may not generalize well to new data. In such cases, one could consider using a Bayesian approach or a machine learning model that is less sensitive to small sample sizes.
Time for you to help in return:
- Reply to this article with your question
- Forward/Share to a friend who can benefit from this
- Join BxD on Substack (here)
- Engage with BxD on LinkedIN (here)
In next coming posts, we will be covering other types of regression models such as polynomial, ridge, lasso etc. in similar format.
Let us know your feedback!
Until then,
Enjoy life in full ??
#businessxdata #bxd #logisticregression #primer
Founding Partner - BUSINESS x DATA (Implementing AI-Driven Personalization at Scale)
1 年If you prefer email updates, visit here: https://anothermayank.substack.com #substack