Mastering the Sigmoid Function: From Predictive Models to Probability Mapping
Anubhav Shukla
Full-stack Web Developer || GraphQl, MongoDB, Express, React, Node ( G-MERN ) || Python Fanatic || UI/UX Designer
Sigmoid Function
The sigmoid function, also known as the logistic function, is a mathematical function that maps any real-valued number to a value between 0 and 1.
To understand it better let's take an example:
Suppose you are a data scientist tasked with building a model to predict whether a student will pass or fail an examination based on the number of hours they studied. After training your model it gives you a value between (-∞, ∞), where higher values indicate a higher likelihood of passing and lower values indicate a higher likelihood of failing.
So your model prediction can be defined as
model ? S ?? (-∞, ∞)
Fantastic, we got ourselves a model which will be going to tell us whether a student will pass or fail the examination. But wait, since your model dwells between (-∞, ∞) and if it gives you a value of 100 what are you going to predict? Will the student pass and if yes, how high is our confidence?
To make actionable decisions, we need to convert these predictions into probabilities, specifically probabilities of passing or failing. So if our model gives us the value of 0.7 we can tell that there is some chance that the student will pass the examination.
Our goal
Now, to convert our predictions into probability we need to limit our model prediction between 0 and 1.
A simple approach to this will be drawing a simple straight line between 0 and 1 like this.
Two things we can observe from this approach
Now, let's closely observe this approach. Hmm, if our model predicts the value of 0 then its probability will be 0.5. This probability represents the threshold point where the model is equally uncertain about whether the student will pass or fail the examination. it's the point where the model considers the likelihood of passing and failing to be 50-50.
But what if I increase the score from 0 to 1? Now that is a huge change because it crosses the threshold of 0.5. This means that the model becomes more confident in predicting the positive class (e.g., pass) rather than the negative class (e.g., fail). Even though the change in the score is only 1 unit, it can have a substantial impact on the predicted probability and the model's decision. So the probability should show a sudden rise in its trend like this:
Similarly, if we decrease the score from 0 to -1, that's also a huge change as the model becomes more confident in predicting the negative class(e.g. fail) rather than the positive class(e.g. pass). So the probability should show a sudden fall in it's trend like this:
Here, we are making some huge changes by increasing or decreasing the score of 0 by 1. But what will happen when we increase the score from 9 to 10 or decrease the score from -9 to -10? When the model is already very confident (e.g., a score of 9 indicating a high likelihood of passing or a score of -9 indicating a high likelihood of failing), small changes in the score will have a smaller effect on the predicted probability. Therefore, increasing the score from 9 to 10 will still increase the probability of passing, and decreasing the score from -9 to -10 will still increase the probability of failing, but the magnitude of these changes will be smaller compared to changes around the threshold of 0.5.
One more thing we have to look at is that with our current approach, the model score is limited between -10 and 10. But what if it gets a score of 11 then the probability will not lie between 0 and 1.
Hmm, so the approach we thought of has two major flaws
领英推荐
so what can we do? Comes the sigmoid function for our aid. As the definition of the sigmoid function tells us:
The sigmoid function, also known as the logistic function, is a mathematical function that maps any real-valued number to a value between 0 and 1.
and that's what we need. Also if we see the graph of the sigmoid function it is similar to what we have drawn by correcting our first approach (how that's we will learn later).
The formula of sigmoid function
The graph of sigmoid function
From the graph, we can clearly see that no matter how high the value is, our prediction will always lie between the range of 0 and 1. For -∞ the probability will be 0 and for ∞ the probability will be 1. So this will take care of our problem number 2.
But what about problem 1? I don't need to tell much about this as the graph of sigmoid speaks for itself. As you can see near the threshold of 0.5 the graph shows sudden rises or falls and near the extremes, it is nearly constant. But I will prove it.
To prove this we need to find the derivation of the sigmoid as it will tell us the change in the probability.
Derivation of the sigmoid function
If the probability is 1 then we can see that the change in the probability is 0.
Similarly, if the probability is 0 then also the change in the probability is 0.
This concludes that at the extremes the change in probability is nearly or equal to 0. But what about the threshold? When the probability is 0.5 then we should get the maximum change in the probability which is 0.25.
So that's how the sigmoid function bounds any real number between the range of 0 and 1.