登录查看更多内容

Understanding the ROC & AUC

Gautam Kumar

Senior Lead Engineer

发布日期: 2020年3月15日

Introduction

In any type of machine learning, we need to calculate the accuracy of the model for performance evaluation. AUC (Area Under Curve) – ROC (Receiver Operating Characteristics) are two curves which can be counted upon to visualize the accuracy/performance of a classification model.

Usually AUC/ROC is used for two class problems however, it can also be used for multi class problems. When making a prediction for a two-class classification problem, the following types of errors can be made by a classifier:

False Positive (FP): predict an event when there was no event.
False Negative (FN): predict no event when in fact there was an event.
True Positive (TP): predict an event when there was an event.
True Negative (TN): predict no event when in fact there was no event.

The above-mentioned errors are usually represented in a matrix called Confusion Matrix.

Confusion matrix is a table with two rows and two columns comprising of combination of predicted and actual values for the model.

True Positive (TP):

Interpretation: You predicted positive and it’s true.

You predicted that it will rain and it actually rained.

True Negative (TN):

Interpretation: You predicted negative and it’s true.

You predicted that it won’t rain and it didn’t.

False Positive (FP): (Type 1 Error)

Interpretation: You predicted positive and it’s false.

You predicted that it will rain however, it didn’t.

False Negative (FN): (Type 2 Error)

Interpretation: You predicted negative and it’s false.

You predicted that it won’t rain and it did rain actually.

Accuracy of the model:

Accuracy simply states the number of correct predictions the model is doing over the total number of predictions made. For instance, if the classifier is 82% correct, it means that out of 100 predictions, the classifier is predicting 82 values correctly.

Precision & Recall:

Precision and Recall are two metrics calculated for each of the classes that we are dealing with. Precision is the number of True Positives out of all the predictions made and Recall is the number of True Positives out of the total number of available predictions.

Precision & Recall metric also has an associated term called F1-score when it comes to measure the accuracy of the classifier:

Sensitivity & Specificity:

Sensitivity & Specificity are similar to Precision and Recall with minute difference. Sensitivity is the percentage of negative predictions which are actually negative. And Specificity is the percentage of positive predictions which are actually positive.

Sensitivity and Specificity are inversely proportional to each other. When we increase one of them, the other one decrease automatically.

Receiver Operating Characteristics (ROC) curve:

The confusion matrix discussed above gives us all the accuracy metrics viz Precision-Recall, Sensitivity-Specificity with the fact that we have the actual predictions made as True and False.

However, there are scenarios where we do not get the predictions as True and False, we end up getting a probability of occurrence. In such cases, we need to define a cut-off threshold to interpret the possibilities. This curve is a plot of false positive rate versus the true positive rate on x-axis and y-axis respectively.

Area Under Curve (AUC):

The ROC curve being a two-dimensional graph to represent the accuracy of a given classifier model, it is advised to reduce it to one dimensional value to compare the accuracies of different classifiers. AUC is one way of doing that. The AUC is the area that the ROC curve captures under it. Since, ROC is a graph on the single unit axis, the area under it cannot be greater than 1.

So, the AUC value ranges between 0 to 1. However, the random guess classifier generates a diagonal graph as shown above and has an AUC of 0.5. So, any realistic classifier model should not have an AUC lesser than 0.5.

ROC-AUC score:

The ROC-AUC score of any classifier model can be calculated using sklearn.metrics library in python. Below is a simple code for doing the same:

Using AUC-ROC curve for multi class problems:

Usually the AUC-ROC curves are plotted for two class problems. However, while dealing with multi-class problems, we plot the AUC-ROC curves for N number of classes using One Vs All methodology.

Let’s consider a scenario of having three different classes A, B & C.

We will have three ROCs in this case:

ROC for A classified against B & C

ROC for B classified against A & C

ROC for C classified against A & B.

要查看或添加评论，请登录

Gautam Kumar的更多文章

Treating outliers on a dataset

2022年12月23日

Treating outliers on a dataset

An outlier is an observation that lies an abnormal distance from other values in a random sample from a population. In…
What is Cloud Computing

2021年3月15日

What is Cloud Computing

The most simplistic definition of cloud computing is the delivery of on-demand IT services over the internet. The…

1 条评论
An Introduction to Lambda Function

2020年4月16日

An Introduction to Lambda Function

Functions are basically piece of codes which execute only when we invoke them. For any programming language, functions…
Understanding Support Vector Machine

2020年4月5日

Understanding Support Vector Machine

Support Vector Machine: An Introduction I have talked about Linear regression and Classification on my prior articles…
Classification in Data Science

2020年3月31日

Classification in Data Science

What is Classification? Although classification can be performed on both structured and unstructured data, it is mainly…
Understanding the basics of Data Clustering

2020年3月30日

Understanding the basics of Data Clustering

Clustering Clustering is the task of dividing the population or data points into a few groups such that data points in…
Multicollinearity - understanding the relationship between variables

2020年3月23日

Multicollinearity - understanding the relationship between variables

Multicollinearity Multicollinearity or simply collinearity is defined by the phenomenon in which two or more…
Dimension Reduction - Principal Component Analysis (aka PCA)

2020年3月20日

Dimension Reduction - Principal Component Analysis (aka PCA)

Being in an era of data flowing from every here and there, we often come across scenarios that we gather way too much…

2 条评论
Linear Regression

2020年3月15日

Linear Regression

When it comes to supervised machine learning, there are two types of learning algorithms: Regression – this basically…

See all articles

Understanding the ROC & AUC

Gautam Kumar

Senior Lead Engineer

Introduction

True Positive (TP):

True Negative (TN):

False Positive (FP): (Type 1 Error)

False Negative (FN): (Type 2 Error)

Accuracy of the model:

Precision & Recall:

Sensitivity & Specificity:

Receiver Operating Characteristics (ROC) curve:

Area Under Curve (AUC):

ROC-AUC score:

Using AUC-ROC curve for multi class problems:

Gautam Kumar的更多文章

社区洞察

其他会员也浏览了

Most common Machine Learning algorithms to know in 2022.

Understanding statistical inference

Bias and Variance in Machine Learning

Machine Learning Perspective on the Twin Prime Conjecture

Weighted Ensemble in Machine Learning

Newton's method for Optimization

Ensemble learning

Gradient Descent Algorithm in Machine Learning

Unlocking the Power of Support Vector Machines (SVMs) in Machine Learning

Mastering the Machine Learning Journey: Fine-tuning the Performance with Hyperparameter Optimization #Stage3.2

Introduction

True Positive (TP):

True Negative (TN):

False Positive (FP): (Type 1 Error)

False Negative (FN): (Type 2 Error)

Accuracy of the model:

Precision & Recall:

Sensitivity & Specificity:

Receiver Operating Characteristics (ROC) curve:

Area Under Curve (AUC):

ROC-AUC score:

Using AUC-ROC curve for multi class problems:

Gautam Kumar的更多文章

Treating outliers on a dataset

What is Cloud Computing

An Introduction to Lambda Function

Understanding Support Vector Machine

Classification in Data Science

Understanding the basics of Data Clustering

Multicollinearity - understanding the relationship between variables

Dimension Reduction - Principal Component Analysis (aka PCA)

Linear Regression

社区洞察

其他会员也浏览了

Most common Machine Learning algorithms to know in 2022.

Understanding statistical inference

Bias and Variance in Machine Learning

Machine Learning Perspective on the Twin Prime Conjecture

Weighted Ensemble in Machine Learning

Newton's method for Optimization

Ensemble learning

Gradient Descent Algorithm in Machine Learning

Unlocking the Power of Support Vector Machines (SVMs) in Machine Learning

Mastering the Machine Learning Journey: Fine-tuning the Performance with Hyperparameter Optimization #Stage3.2