登录查看更多内容

Mastering the Sigmoid Function: From Predictive Models to Probability Mapping

Anubhav Shukla

Full-stack Web Developer || GraphQl, MongoDB, Express, React, Node ( G-MERN ) || Python Fanatic || UI/UX Designer

发布日期: 2024年2月11日

Sigmoid Function

The sigmoid function, also known as the logistic function, is a mathematical function that maps any real-valued number to a value between 0 and 1.

To understand it better let's take an example:

Suppose you are a data scientist tasked with building a model to predict whether a student will pass or fail an examination based on the number of hours they studied. After training your model it gives you a value between (-∞, ∞), where higher values indicate a higher likelihood of passing and lower values indicate a higher likelihood of failing.

So your model prediction can be defined as

model ? S ?? (-∞, ∞)

Fantastic, we got ourselves a model which will be going to tell us whether a student will pass or fail the examination. But wait, since your model dwells between (-∞, ∞) and if it gives you a value of 100 what are you going to predict? Will the student pass and if yes, how high is our confidence?

To make actionable decisions, we need to convert these predictions into probabilities, specifically probabilities of passing or failing. So if our model gives us the value of 0.7 we can tell that there is some chance that the student will pass the examination.

Our goal

S? P i.e (-∞, ∞) ? (0, 1)

Now, to convert our predictions into probability we need to limit our model prediction between 0 and 1.

A simple approach to this will be drawing a simple straight line between 0 and 1 like this.

Two things we can observe from this approach

This approach will limit our model prediction between 0 and 1
Previously, our model was predicting values between the range of (-∞, ∞). But now it's going to predict a value between the range of -10 and 10 (a negative value indicates the student will fail and a positive value indicates the student will pass) which will then be converted into 0 and 1.

Now, let's closely observe this approach. Hmm, if our model predicts the value of 0 then its probability will be 0.5. This probability represents the threshold point where the model is equally uncertain about whether the student will pass or fail the examination. it's the point where the model considers the likelihood of passing and failing to be 50-50.

But what if I increase the score from 0 to 1? Now that is a huge change because it crosses the threshold of 0.5. This means that the model becomes more confident in predicting the positive class (e.g., pass) rather than the negative class (e.g., fail). Even though the change in the score is only 1 unit, it can have a substantial impact on the predicted probability and the model's decision. So the probability should show a sudden rise in its trend like this:

Sudden rise in the curve around the threshold

Similarly, if we decrease the score from 0 to -1, that's also a huge change as the model becomes more confident in predicting the negative class(e.g. fail) rather than the positive class(e.g. pass). So the probability should show a sudden fall in it's trend like this:

Sudden fall in the curve around the threshold

Here, we are making some huge changes by increasing or decreasing the score of 0 by 1. But what will happen when we increase the score from 9 to 10 or decrease the score from -9 to -10? When the model is already very confident (e.g., a score of 9 indicating a high likelihood of passing or a score of -9 indicating a high likelihood of failing), small changes in the score will have a smaller effect on the predicted probability. Therefore, increasing the score from 9 to 10 will still increase the probability of passing, and decreasing the score from -9 to -10 will still increase the probability of failing, but the magnitude of these changes will be smaller compared to changes around the threshold of 0.5.

One more thing we have to look at is that with our current approach, the model score is limited between -10 and 10. But what if it gets a score of 11 then the probability will not lie between 0 and 1.

Hmm, so the approach we thought of has two major flaws

领英推荐

Transformation by Hugging Face

Yogesh Haribhau Kulkarni 2 年前

There Is No Algorithmic Component to the NOL Effect in…

Bryan K. Orme 7 个月前

Projections with Ranges

Rainer Grimm 2 年前

The straight line doesn't tell us the exact trend we should be getting.
Our model is bound between some finite value (e.g. -10 and 10)

so what can we do? Comes the sigmoid function for our aid. As the definition of the sigmoid function tells us:

The sigmoid function, also known as the logistic function, is a mathematical function that maps any real-valued number to a value between 0 and 1.

and that's what we need. Also if we see the graph of the sigmoid function it is similar to what we have drawn by correcting our first approach (how that's we will learn later).

The formula of sigmoid function

The graph of sigmoid function

From the graph, we can clearly see that no matter how high the value is, our prediction will always lie between the range of 0 and 1. For -∞ the probability will be 0 and for ∞ the probability will be 1. So this will take care of our problem number 2.

But what about problem 1? I don't need to tell much about this as the graph of sigmoid speaks for itself. As you can see near the threshold of 0.5 the graph shows sudden rises or falls and near the extremes, it is nearly constant. But I will prove it.

To prove this we need to find the derivation of the sigmoid as it will tell us the change in the probability.

Derivation of the sigmoid function

If the probability is 1 then we can see that the change in the probability is 0.

Similarly, if the probability is 0 then also the change in the probability is 0.

This concludes that at the extremes the change in probability is nearly or equal to 0. But what about the threshold? When the probability is 0.5 then we should get the maximum change in the probability which is 0.25.

So that's how the sigmoid function bounds any real number between the range of 0 and 1.

Anubhav Shukla的更多文章

Why it's important to handle missing data

2024年1月25日

Why it's important to handle missing data

Missing data is troublesome for both humans and machines. Let me explain this with an example:- Here’s a hypothetical…

2 条评论

Mastering the Sigmoid Function: From Predictive Models to Probability Mapping

Anubhav Shukla

Full-stack Web Developer || GraphQl, MongoDB, Express, React, Node ( G-MERN ) || Python Fanatic || UI/UX Designer

Sigmoid Function

领英推荐

Anubhav Shukla的更多文章

社区洞察

其他会员也浏览了

SHAP is not all you need (or why you should always use permutation feature importance)

Unified Convergence Analysis of Nonconvex Randomized Block Coordinate Descent Methods

Look-ahead bias

Busting AD Myth Six in the new mini-series from NAG - "Adjoint Automatic Differentiation (AAD) will destroy parallelism."

Magic of Kadane Algorithm ?

Understanding Grounding Dino's Thresholds: A Deeper Dive

AI_Part_2_Regression Models with Codes

TOPIC MODELLING METHODS’ COMPARISON

Single-Source Shortest Path Problem

Sigmoid Function

领英推荐

Anubhav Shukla的更多文章

Why it's important to handle missing data

社区洞察

其他会员也浏览了

SHAP is not all you need (or why you should always use permutation feature importance)

Unified Convergence Analysis of Nonconvex Randomized Block Coordinate Descent Methods

Look-ahead bias

Busting AD Myth Six in the new mini-series from NAG - "Adjoint Automatic Differentiation (AAD) will destroy parallelism."

Magic of Kadane Algorithm ?

Understanding Grounding Dino's Thresholds: A Deeper Dive

AI_Part_2_Regression Models with Codes

TOPIC MODELLING METHODS’ COMPARISON

Single-Source Shortest Path Problem