Understanding Confusion Matrix

Understanding Confusion Matrix

Hello Everyone…

Today I am here with a interesting article. Which is based on Confusion Matrix. In this article we are going with what is Confusion Matrix, why we need it? and also it’s types and lots of interesting things …

So lets jump on our today’s interesting topic guys…

What is Confusion Matrix?

Confusion matrix is an N x N matrix used for evaluating the performance of a classification model, where N is the number of target classes. The matrix compares the actual target values with those predicted by the machine learning model.

No alt text provided for this image

It is useful to visualize important predictive analytics like recall, specificity, accuracy, and precision.

Let's start with an example confusion matrix for a binary classifier (though it can easily be extended to the case of more than two classes):

No alt text provided for this image

What can we learn from this matrix?

  • There are two possible predicted classes: "yes" and "no". If we were predicting the presence of a disease, for example, "yes" would mean they have the disease, and "no" would mean they don't have the disease.
  • The classifier made a total of 165 predictions (e.g., 165 patients were being tested for the presence of that disease).
  • Out of those 165 cases, the classifier predicted "yes" 110 times, and "no" 55 times.
  • In reality, 105 patients in the sample have the disease, and 60 patients do not.

Let's now define the most basic terms, which are whole numbers (not rates):

  • true positives (TP): These are cases in which we predicted yes (they have the disease), and they do have the disease.
  • true negatives (TN): We predicted no, and they don't have the disease.
  • false positives (FP): We predicted yes, but they don't actually have the disease. (Also known as a "Type I error.")
  • false negatives (FN): We predicted no, but they actually do have the disease. (Also known as a "Type II error.")

Confusion matrices have two types of errors: Type I and Type II.

The first way is to re-write False Negative and False Positive. False Positive is a Type I error because False Positive = False True and that only has one F. False Negative is a Type II error because False Negative = False False so thus there are two F’s making it a Type II. (Kudos to Riley Dallas for this method!)

The second way is to consider the meanings of these words. False Positive contains one negative word (False) so it’s a Type I error. False Negative has two negative words (False + Negative) so it’s a Type II error.

Confusion Metrics:

From our confusion matrix, we can calculate five different metrics measuring the validity of our model.

  1. Accuracy (all correct / all) = TP + TN / TP + TN + FP + FN
  2. Misclassification or Error (all incorrect / all) = FP + FN / TP + TN + FP + FN
  3. Precision (true positives / predicted positives) = TP / TP + FP
  4. Sensitivity or Recall (true positives / all actual positives) = TP / TP + FN
  5. Specificity (true negatives / all actual negatives) =TN / TN + FP

Classification Measure

Basically, it is an extended version of the confusion matrix. There are measures other than the confusion matrix which can help achieve better understanding and analysis of our model and its performance.

1. Accuracy

2. Precision

3. Recall (TPR, Sensitivity)

4. F1-Score

5. FPR (Type I Error)

6. FNR (Type II Error)

1. Accuracy:

Accuracy simply measures how often the classifier makes the correct prediction. It’s the ratio between the number of correct predictions and the total number of predictions. The accuracy metric is not suited for unbalanced classes.

No alt text provided for this image

2. Precision:

Precision is defined as the ratio of the total number of correctly classified positive classes divided by the total number of predicted positive classes.

“Precision is a useful metric in cases where False Positive is a higher concern than False Negatives”


No alt text provided for this image

3. Recall:

Recall is defined as the ratio of the total number of correctly classified positive classes divide by the total number of positive classes. Or, out of all the positive classes, how much we have predicted correctly. Recall should be high(ideally 1).

“Recall is a useful metric in cases where False Negative trumps False Positive”

No alt text provided for this image

4. F-measure / F1-Score

There will be cases where there is no clear distinction between whether Precision is more important or Recall. We combine them. In practice, when we try to increase the precision of our model, the recall goes down and vice-versa. The F1-score captures both the trends in a single value.

No alt text provided for this image

5. Sensitivity & Specificity

No alt text provided for this image

CYBER CRIME CASES AND CONFUSION MATRIX:

No alt text provided for this image

Cybercrime is criminal activity that either targets or uses a computer, a computer network or a networked device.

Most, but not all, cybercrime is committed by cybercriminals or hackers who want to make money. Cybercrime is carried out by individuals or organizations.

Some cybercriminals are organized, use advanced techniques and are highly technically skilled. Others are novice hackers.

In the present world, cybercrime offenses are happening at an alarming rate. As the use of the Internet is increasing many offenders, make use of this as a means of communication in order to commit a crime. Cybercrime will cost nearly $6 trillion per annum by 2021 as per the cybersecurity ventures report in 2020. For illegal activities, cybercriminals utilize any network computing devices as a primary means of communication with a victims’ devices, so attackers get profit in terms of finance, publicity and others by exploiting the vulnerabilities over the system. Cybercrimes are steadily increasing daily.

Security analytics with the association of data analytic approaches help us for analyzing and classifying offenses from India-based integrated data that may be either structured or unstructured. The main strength of this work is testing analysis reports, which classify the offenses accurately with 99 percent accuracy.

This is a list of rates that are often computed from a confusion matrix for a binary classifier:

  • Accuracy: Overall, how often is the classifier correct?
  • (TP+TN)/total = (100+50)/165 = 0.91
  • Misclassification Rate: Overall, how often is it wrong?
  • (FP+FN)/total = (10+5)/165 = 0.09
  • equivalent to 1 minus Accuracy
  • also known as "Error Rate"
  • True Positive Rate: When it's actually yes, how often does it predict yes?
  • TP/actual yes = 100/105 = 0.95
  • also known as "Sensitivity" or "Recall"
  • False Positive Rate: When it's actually no, how often does it predict yes?
  • FP/actual no = 10/60 = 0.17
  • True Negative Rate: When it's actually no, how often does it predict no?
  • TN/actual no = 50/60 = 0.83
  • equivalent to 1 minus False Positive Rate
  • also known as "Specificity"
  • Precision: When it predicts yes, how often is it correct?
  • TP/predicted yes = 100/110 = 0.91
  • Prevalence: How often does the yes condition actually occur in our sample?
  • actual yes/total = 105/165 = 0.64

Thanks for reading. See you soon with new article...

THANK YOU!!











Narayan R Sangar

Ex. Hav. Indian Army || Major Dhyan Chand National Sports Awardee ( Best Coach) || Director Of Physical Education Shri Sant Gadgebaba mahavidyalay kapshi Kolhapur || YouTuber || Motivational Speaker

3 年

Keep it up my dear

Akash Taralekar

Executive - New projects and technology Chemical Engineer Galaxy surfactants LTD.

3 年

Keep it up

回复

要查看或添加评论,请登录

KARTIK LOKARE的更多文章

  • How to read data stored in RAM?(Memory Forensic)

    How to read data stored in RAM?(Memory Forensic)

    What is RAM and What data RAM contains? Random-access memory (RAM) is a computer’s short-term memory. None of your…

  • Explore date command and with options and try to use every option.

    Explore date command and with options and try to use every option.

    The date command displays or sets the system date. It is most commonly used to print the date and time in different…

  • Object Recognition using CNN model

    Object Recognition using CNN model

    Step 1: Create an ML model detecting vehicle number plate Step 2: Display the screen showing the WebApp taking the…

  • Kubernetes Integration with Python-CGI

    Kubernetes Integration with Python-CGI

    Kubernetes Kubernetes (also known as k8s or “kube”) is an open source container orchestration platform that automates…

    3 条评论
  • Javascript Integration with Docker

    Javascript Integration with Docker

    Python CGI with Docker (Task 7) In this project, I have integrated python with Docker !! Python is one of the most…

    2 条评论
  • Industry Usecase of JavaScript

    Industry Usecase of JavaScript

    August 12, 2021 Introduction JavaScript is a programming language used primarily by Web browsers to create a dynamic…

    2 条评论
  • K-Means Clustering - Use Cases

    K-Means Clustering - Use Cases

    What is Clustering? Clustering is one of the most common exploratory data analysis technique used to get an intuition…

社区洞察

其他会员也浏览了