登录查看更多内容

Understanding Confusion Matrix

KARTIK LOKARE

Shivaji University||Web Development(HTML,CSS, JavaScript, ReactNative,Flutter,Nodejs,Reactjs, Django,Flask)||Computer Vision/AI/ML||

发布日期: 2021年6月3日

Hello Everyone…

Today I am here with a interesting article. Which is based on Confusion Matrix. In this article we are going with what is Confusion Matrix, why we need it? and also it’s types and lots of interesting things …

So lets jump on our today’s interesting topic guys…

What is Confusion Matrix?

A Confusion matrix is an N x N matrix used for evaluating the performance of a classification model, where N is the number of target classes. The matrix compares the actual target values with those predicted by the machine learning model.

It is useful to visualize important predictive analytics like recall, specificity, accuracy, and precision.

Let's start with an example confusion matrix for a binary classifier (though it can easily be extended to the case of more than two classes):

What can we learn from this matrix?

There are two possible predicted classes: "yes" and "no". If we were predicting the presence of a disease, for example, "yes" would mean they have the disease, and "no" would mean they don't have the disease.
The classifier made a total of 165 predictions (e.g., 165 patients were being tested for the presence of that disease).
Out of those 165 cases, the classifier predicted "yes" 110 times, and "no" 55 times.
In reality, 105 patients in the sample have the disease, and 60 patients do not.

Let's now define the most basic terms, which are whole numbers (not rates):

true positives (TP): These are cases in which we predicted yes (they have the disease), and they do have the disease.
true negatives (TN): We predicted no, and they don't have the disease.
false positives (FP): We predicted yes, but they don't actually have the disease. (Also known as a "Type I error.")
false negatives (FN): We predicted no, but they actually do have the disease. (Also known as a "Type II error.")

Confusion matrices have two types of errors: Type I and Type II.

The first way is to re-write False Negative and False Positive. False Positive is a Type I error because False Positive = False True and that only has one F. False Negative is a Type II error because False Negative = False False so thus there are two F’s making it a Type II. (Kudos to Riley Dallas for this method!)

The second way is to consider the meanings of these words. False Positive contains one negative word (False) so it’s a Type I error. False Negative has two negative words (False + Negative) so it’s a Type II error.

Confusion Metrics:

From our confusion matrix, we can calculate five different metrics measuring the validity of our model.

Accuracy (all correct / all) = TP + TN / TP + TN + FP + FN
Misclassification or Error (all incorrect / all) = FP + FN / TP + TN + FP + FN
Precision (true positives / predicted positives) = TP / TP + FP
Sensitivity or Recall (true positives / all actual positives) = TP / TP + FN
Specificity (true negatives / all actual negatives) =TN / TN + FP

Classification Measure

Basically, it is an extended version of the confusion matrix. There are measures other than the confusion matrix which can help achieve better understanding and analysis of our model and its performance.

1. Accuracy

2. Precision

3. Recall (TPR, Sensitivity)

4. F1-Score

5. FPR (Type I Error)

6. FNR (Type II Error)

1. Accuracy:

Accuracy simply measures how often the classifier makes the correct prediction. It’s the ratio between the number of correct predictions and the total number of predictions. The accuracy metric is not suited for unbalanced classes.

2. Precision:

Precision is defined as the ratio of the total number of correctly classified positive classes divided by the total number of predicted positive classes.

“Precision is a useful metric in cases where False Positive is a higher concern than False Negatives”

3. Recall:

Recall is defined as the ratio of the total number of correctly classified positive classes divide by the total number of positive classes. Or, out of all the positive classes, how much we have predicted correctly. Recall should be high(ideally 1).

“Recall is a useful metric in cases where False Negative trumps False Positive”

4. F-measure / F1-Score

There will be cases where there is no clear distinction between whether Precision is more important or Recall. We combine them. In practice, when we try to increase the precision of our model, the recall goes down and vice-versa. The F1-score captures both the trends in a single value.

5. Sensitivity & Specificity

CYBER CRIME CASES AND CONFUSION MATRIX:

Cybercrime is criminal activity that either targets or uses a computer, a computer network or a networked device.

Most, but not all, cybercrime is committed by cybercriminals or hackers who want to make money. Cybercrime is carried out by individuals or organizations.

Some cybercriminals are organized, use advanced techniques and are highly technically skilled. Others are novice hackers.

In the present world, cybercrime offenses are happening at an alarming rate. As the use of the Internet is increasing many offenders, make use of this as a means of communication in order to commit a crime. Cybercrime will cost nearly $6 trillion per annum by 2021 as per the cybersecurity ventures report in 2020. For illegal activities, cybercriminals utilize any network computing devices as a primary means of communication with a victims’ devices, so attackers get profit in terms of finance, publicity and others by exploiting the vulnerabilities over the system. Cybercrimes are steadily increasing daily.

Security analytics with the association of data analytic approaches help us for analyzing and classifying offenses from India-based integrated data that may be either structured or unstructured. The main strength of this work is testing analysis reports, which classify the offenses accurately with 99 percent accuracy.

This is a list of rates that are often computed from a confusion matrix for a binary classifier:

Accuracy: Overall, how often is the classifier correct?
(TP+TN)/total = (100+50)/165 = 0.91
Misclassification Rate: Overall, how often is it wrong?
(FP+FN)/total = (10+5)/165 = 0.09
equivalent to 1 minus Accuracy
also known as "Error Rate"
True Positive Rate: When it's actually yes, how often does it predict yes?
TP/actual yes = 100/105 = 0.95
also known as "Sensitivity" or "Recall"
False Positive Rate: When it's actually no, how often does it predict yes?
FP/actual no = 10/60 = 0.17
True Negative Rate: When it's actually no, how often does it predict no?
TN/actual no = 50/60 = 0.83
equivalent to 1 minus False Positive Rate
also known as "Specificity"
Precision: When it predicts yes, how often is it correct?
TP/predicted yes = 100/110 = 0.91
Prevalence: How often does the yes condition actually occur in our sample?
actual yes/total = 105/165 = 0.64

Thanks for reading. See you soon with new article...

THANK YOU!!

Narayan R Sangar

Ex. Hav. Indian Army || Major Dhyan Chand National Sports Awardee ( Best Coach) || Director Of Physical Education Shri Sant Gadgebaba mahavidyalay kapshi Kolhapur || YouTuber || Motivational Speaker

3 年

Keep it up my dear

1 次回应

Akash Taralekar

Executive - New projects and technology Chemical Engineer Galaxy surfactants LTD.

3 年

Keep it up

查看更多评论

要查看或添加评论，请登录

KARTIK LOKARE的更多文章

How to read data stored in RAM?(Memory Forensic)

2021年9月23日

How to read data stored in RAM?(Memory Forensic)

What is RAM and What data RAM contains? Random-access memory (RAM) is a computer’s short-term memory. None of your…
Explore date command and with options and try to use every option.

2021年9月21日

Explore date command and with options and try to use every option.

The date command displays or sets the system date. It is most commonly used to print the date and time in different…
Object Recognition using CNN model

2021年8月12日

Object Recognition using CNN model

Step 1: Create an ML model detecting vehicle number plate Step 2: Display the screen showing the WebApp taking the…
Kubernetes Integration with Python-CGI

2021年8月12日

Kubernetes Integration with Python-CGI

Kubernetes Kubernetes (also known as k8s or “kube”) is an open source container orchestration platform that automates…

3 条评论
Javascript Integration with Docker

2021年8月12日

Javascript Integration with Docker

Python CGI with Docker (Task 7) In this project, I have integrated python with Docker !! Python is one of the most…

2 条评论
Industry Usecase of JavaScript

2021年8月12日

Industry Usecase of JavaScript

August 12, 2021 Introduction JavaScript is a programming language used primarily by Web browsers to create a dynamic…

2 条评论
K-Means Clustering - Use Cases

2021年7月25日

K-Means Clustering - Use Cases

What is Clustering? Clustering is one of the most common exploratory data analysis technique used to get an intuition…

See all articles

Understanding Confusion Matrix

KARTIK LOKARE

Shivaji University||Web Development(HTML,CSS, JavaScript, ReactNative,Flutter,Nodejs,Reactjs, Django,Flask)||Computer Vision/AI/ML||

What is Confusion Matrix?

Confusion Metrics:

Classification Measure

CYBER CRIME CASES AND CONFUSION MATRIX:

KARTIK LOKARE的更多文章

社区洞察

其他会员也浏览了

Cross entropy loss function

Dear Data Padawan 4 - communication takes precedence over showing off

How to Deal with Multicollinearity?

Murder By Numbers: Don't Kill Your Own Message with Rounding Errors

What is Multicollinearity? A Visual Description

Harnessing the Power of Random Forest for Glucose Prediction, How I Completed This Task

Correlation, causation and vector autoregressions

A Beautiful Mathematical Structure Called "Group"

The balance is everything!

How can Metaheuristics help to fight Coronavirus?

What is Confusion Matrix?

Confusion Metrics:

Classification Measure

CYBER CRIME CASES AND CONFUSION MATRIX:

KARTIK LOKARE的更多文章

How to read data stored in RAM?(Memory Forensic)

Explore date command and with options and try to use every option.

Object Recognition using CNN model

Kubernetes Integration with Python-CGI

Javascript Integration with Docker

Industry Usecase of JavaScript

K-Means Clustering - Use Cases

社区洞察

其他会员也浏览了

Cross entropy loss function

Dear Data Padawan 4 - communication takes precedence over showing off

How to Deal with Multicollinearity?

Murder By Numbers: Don't Kill Your Own Message with Rounding Errors

What is Multicollinearity? A Visual Description

Harnessing the Power of Random Forest for Glucose Prediction, How I Completed This Task

Correlation, causation and vector autoregressions

A Beautiful Mathematical Structure Called "Group"

The balance is everything!

How can Metaheuristics help to fight Coronavirus?