I can barely 'RECALL'? with enough 'PRECISION'? and little 'SPECIFICITY'? what is 'SENSITIVITY'?!
ROC and AUC Pic Credit: Revolution Analytics

I can barely 'RECALL' with enough 'PRECISION' and little 'SPECIFICITY' what is 'SENSITIVITY'!

I find it very difficult and unfair to remember jargon, anything that forces me to memorize generally fails me. To avoid ambiguity and overcome the challenge, I tend to, as much possible, logically try to arrive at concepts. At most times these jargon have very simple ideas and are christened for reference. One such case I feel are terms: Recall, Precision, Specificity and Sensitivity.

My target in this article would be to try and explain from my view of looking at the problem and solving for:

  • Binary Classification Model Performance
  • Arriving at Optimal Threshold values for Classification
  • Thereby understanding what is ROC (Receiver Operator Curve) and arriving at AUC (Area Under Curve)
  • It will not cover the concepts of unbalanced classes etc, but once corrected for, these steps are all equally applicable

At the end of the article, I will share the link to a notebook, where you can use play around with different parameters and see for yourself how the whole thing falls into place.

First Gear: Run the model and get predicted probabilities

A sample of how the actual vs predicted data frame might look

Our starting point for this exercise would be to take off from the stage where you have run your first classification model on your training set and have established predicted probabilities for your test data. Consider the adjacent image.

Given we have achieved the probabilities let us try to get a first idea of how well our model did in classifying the classes correctly. One of the popular methods to achieve this is Confusion Matrix. Which is simply 'given a chosen threshold' i.e. if my predicted value is > 0.5 the result would be classified as one else zero. A concept very clean to understand. The outcome of a confusion matrix being 'given a chosen threshold' the counts of ones actually predicted as ones, the count of zeros actually predicted as zeros (the more we can achieve to get the proportion of counts in these two buckets the more better the model in classifying), the count of ones predicted as zeros and the count of actual zeros predicted as ones (i.e. the misclassification rate). But the question is how do we arrive at the optimal threshold?

Second Gear: Plot some histograms

No alt text provided for this image

Let's plot a simple histogram as shown in the adjoining image, where the blue curve plots the histogram of the predicted probabilities of the actual negative class (tagged as 0), while the red curve indicates the density curve for the actual positive (tagged as 1). The first inference: The curves look well separated with marginal overlap, i.e. the predicted probabilities of either classes do not majorly overlap and choosing a right threshold can result in a good confusion matrix. (For cases of poorer models we will see how the curve looks like in sometime)

Third Gear: Moving around to choose an optimal threshold

No alt text provided for this image

In the last section we talked about choosing the right threshold so that we can have a good confusion matrix and one of the meanings for good could be the threshold resulting in lowest misclassification rate, though from domain to domain you can choose to trade off false positives vs false negatives. Note: Positive or Negative is a nomenclature, you can choose to select one of your classes as positive while the other class as negative. Example, in a classification between 2 types of leaf, say setosa and versicolor, you may assign setosa as the positive class and versicolor as negative and vice versa.

The confusion matrix shown above helps us understand concepts of True Positive, False Positive, False Negative and True Negative. Let's tag along a few more ratios in place: True Positive Rate i.e. How many actual positive class examples where correctly predicted as positive or True Positive / (Actual Positives = True Positive + False Negative) . Similarly let's define False Positive Rate i.e. how many of the actual Negative class examples got classified into Positive class or False Positive / (Total Negatives = False Positive + True Negative) and finally Misclassification Rate can be defined as the ratio of all the cases which were incorrectly classified of all the observations i.e. (False Positive + False Negative) / (Number of examples). Why did I mention only the above specific metrics is because it will help us chart out two more graphs, which we will shortly touch.

No alt text provided for this image

Let us revisit the histogram overlap graph. We were talking about obtaining the optimal threshold value, so this is our starting point to arrive at one. The instances of the red line are the threshold selected at that instance, 3 such instances have been shown above, any example to the right of the threshold is chosen as positive class else negative.

So we are choosing a starting threshold of 0 and then evaluating our confusion matrix, i.e. the red line is at the extreme left, hence by the blue curve (Negative class) lying to the right of the threshold, all actual Negative class examples have been classified as positive, i.e. 100% False Positive Rate, similarly by the Red Curve (Positive class) all the actual positives have been classified as positives thus a 100% True Positive Rate. This continues till the threshold line (red line) enters the blue curve. Lets for simplicity break the problem into different zones the threshold line can fall on.

Zone 1: Till the threshold touches the blue curve - False Positive Rate and True Positive Rate are 100% as explained above

Zone 2: Till the threshold is in the blue curve but before the red curve - Now some of the actual Negative class examples have started falling to the left of the threshold so misclassification of the negative class is reducing hence False Positive Rate will start to go down from 100% but because the red curve or the positive class is still to the right it continues to be at 100% True Positive Rate

Zone 3: When the threshold is in the intersecting zone - Here now some of the positive class has started to lie on the left hand side causing misclassfication of the positive class. From the blue curve perspective we are more and more classifying negatives as negatives now and hence False Positive Rate is steeply dropping while more of the red curve now falling left resulting in True Positive Rate to go down from 100%

Zone 4: Till the threshold is in the red curve - At this stage all negatives class examples have been classified as negative so False Positive Rate has touched 0% while as the threshold moves towards the right tail of the red curve it is more and more misclassfiying the positive class as negative causing a steep drop of True Positive Rate towards 0%

Zone 5: When the threshold is beyond the red curve - At this point all negative class (blue curve) examples have been classified as negative hence False Positive Rate is at 0% while all positive class examples (red curve) have been misclassified into the negative class i.e. True Positive Rate is also at 0%

The above scenario has been explained w.r.t. the example displayed in the image, how the results behave in different scenarios will be displayed below, helping you get a complete clarity of the play out. All the Zones explained above when put together in a graph they result in ROC or Receiver Operator Curve. By looking at the Receiver Operator Curve and by calculating the area below it you will be able to establish how well your model is able to classify between different classes, we will explain how to arrive at the area soon, but before that lets look at some example ROC and misclassfication curves for various thresholds.

No alt text provided for this image

Given the example for various thresholds the True Positive Rate and False Positive Rate have been plotted. Because of the small overlap between the blue and red curves the ROC does not exactly touch the coordinates (0,1) but brushes just pass. Similarly the misclassfication is at a minimum when the threshold is as ~0.57. These curves behave very differently as the area of overlap vary, which we will see very soon below.

Forth Gear: Measuring AUC

Area under ROC: Pic credit Revolution Analytics

As a final exercise before we link all of them together and demonstrate different scenarios, I will briefly touch on finding the area of the ROC and how it determines model performance.

To arrive at the area of the ROC break the graph into a chunk of rectangles as seen by the green shaded zone. So the first rectangle is between coordinates (0.0, 0.4) and (0.1, 0.4) i.e. width is 0.1 unit and height is 0.4 units and hence area of the rectangle is 0.04. Similarly for the next rectangle 0.1 times 0.6 is 0.06 and continue adding the area until 0.1 * 1.0 = 0.1, for the blue zone simply consider half the area of the blue rectangle i.e. 0.5 times 0.1 times 0.1 is 0.005. For the adjoining graph hence the AUC or area under the curve is:

0.1 * 0.4 + 0.1*0.6 + 0.1*0.6 + 0.5*0.1*0.1 + 0.1*0.8 + 0.1*0.9 + 0.1*0.9 + 0.1*1 + 0.1*1+ 0.1*1+ 0.1*1 = 0.825

More and more the red line shrinks to a diagonal, lower is the classification power of the model because it is at the same time loosing more True Positive Classification rate and lowering the False Positive Rate as well, hence a model with a random classification power will have an AUC of ~0.5.

With this I conclude my blabbering and showcase to you, as you alter the threshold values for various scenarios how all of the above play out. Thank you for bearing with me this long, enjoy the simulations. If you found this useful please do like and share, for any constructive suggestions please feel free to comment and help me improve.

As promised: https://github.com/AnuragHalder/Classification_Performance_Threshold

Top Gear: Scenarios

Scenario 1: Marginal Overlap; AUC ~ 0.9998; Threshold - 0.58

Scenario 2: No Overlap; AUC ~ 1; Threshold between 0.48 and 0.55

Scenario 3: High Overlap; AUC ~ 0.75; Threshold - 0.52

Scenario 4: Complete Overlap; AUC ~ 0.5; Threshold - 0.56




Nice post Anurag. It would be really kind if you can clarify some my following queries, 1) How did you calculate the classifications rate? Is it a derivative of confusion matrix like FPR TPR etc., 2) How about plotting Precision & Recall across various cut-off points and determining the optimal cut-off point? Are there any caveat with this approach?

  • 该图片无替代文字
回复
Krishna Agrawal

Senior Data Scientist | FMCG

5 年

Very nicely explained.

Anurag Halder

Director - Analytics | Content Planning & Strategy | Media | Entertainment | OTT | Big Data Analytics | Data Science

5 年
回复

要查看或添加评论,请登录

Anurag Halder的更多文章

  • Learning by Doing

    Learning by Doing

    I always have been a strong proponent of learning by example, because I too understand by starting with a simple toy…

    4 条评论
  • A Basic Emotion Detection Model - Application

    A Basic Emotion Detection Model - Application

    Before I begin my current article, aiming to show a quick application use case, let me thank all my friends from the…

    14 条评论
  • A Basic Emotion Detection Model

    A Basic Emotion Detection Model

    Very recently I was working on an open source computer vision project to classify, looking at human faces, into 7…

    26 条评论
  • Validation - A Short Post

    Validation - A Short Post

    I have come across a lot of people in the field of data science, who believed that data for modeling should be split…

    2 条评论
  • Linear Classifiers - Perceptrons

    Linear Classifiers - Perceptrons

    I'll try and handhold you gently into the world of classification, using one of the simplest approaches yet effective…

  • Maximum Likelihood Estimation

    Maximum Likelihood Estimation

    This article is intended as short write up for those who are not very clear about the concept of Maximum Likelihood…

    1 条评论
  • From Johnny English to James Bond

    From Johnny English to James Bond

    I am sure once in a while you must have seen a 'Missing' poster either in a tree/lamp-post/walls or some public place…

    2 条评论
  • Not that Naive after all ! Text Mining for Beginners

    Not that Naive after all ! Text Mining for Beginners

    I have at times been approached by fellow mates and friends, asking me about my opinion and experience of Text…

    4 条评论
  • R and Google Analytics: Link Them Up - Step 1

    R and Google Analytics: Link Them Up - Step 1

    I am sure most of us who have been exposed to R and Google Analytics have wished some time or the other to be able to…

社区洞察

其他会员也浏览了