Understanding Confusion Matrix
KARTIK LOKARE
Shivaji University||Web Development(HTML,CSS, JavaScript, ReactNative,Flutter,Nodejs,Reactjs, Django,Flask)||Computer Vision/AI/ML||
Today I am here with a interesting article. Which is based on Confusion Matrix. In this article we are going with what is Confusion Matrix, why we need it? and also it’s types and lots of interesting things …
So lets jump on our today’s interesting topic guys…
What is Confusion Matrix?
A Confusion matrix is an N x N matrix used for evaluating the performance of a classification model, where N is the number of target classes. The matrix compares the actual target values with those predicted by the machine learning model.
It is useful to visualize important predictive analytics like recall, specificity, accuracy, and precision.
Let's start with an example confusion matrix for a binary classifier (though it can easily be extended to the case of more than two classes):
What can we learn from this matrix?
- There are two possible predicted classes: "yes" and "no". If we were predicting the presence of a disease, for example, "yes" would mean they have the disease, and "no" would mean they don't have the disease.
- The classifier made a total of 165 predictions (e.g., 165 patients were being tested for the presence of that disease).
- Out of those 165 cases, the classifier predicted "yes" 110 times, and "no" 55 times.
- In reality, 105 patients in the sample have the disease, and 60 patients do not.
Let's now define the most basic terms, which are whole numbers (not rates):
- true positives (TP): These are cases in which we predicted yes (they have the disease), and they do have the disease.
- true negatives (TN): We predicted no, and they don't have the disease.
- false positives (FP): We predicted yes, but they don't actually have the disease. (Also known as a "Type I error.")
- false negatives (FN): We predicted no, but they actually do have the disease. (Also known as a "Type II error.")
Confusion matrices have two types of errors: Type I and Type II.
The first way is to re-write False Negative and False Positive. False Positive is a Type I error because False Positive = False True and that only has one F. False Negative is a Type II error because False Negative = False False so thus there are two F’s making it a Type II. (Kudos to Riley Dallas for this method!)
The second way is to consider the meanings of these words. False Positive contains one negative word (False) so it’s a Type I error. False Negative has two negative words (False + Negative) so it’s a Type II error.
Confusion Metrics:
From our confusion matrix, we can calculate five different metrics measuring the validity of our model.
- Accuracy (all correct / all) = TP + TN / TP + TN + FP + FN
- Misclassification or Error (all incorrect / all) = FP + FN / TP + TN + FP + FN
- Precision (true positives / predicted positives) = TP / TP + FP
- Sensitivity or Recall (true positives / all actual positives) = TP / TP + FN
- Specificity (true negatives / all actual negatives) =TN / TN + FP
Classification Measure
Basically, it is an extended version of the confusion matrix. There are measures other than the confusion matrix which can help achieve better understanding and analysis of our model and its performance.
1. Accuracy
2. Precision
3. Recall (TPR, Sensitivity)
4. F1-Score
5. FPR (Type I Error)
6. FNR (Type II Error)
1. Accuracy:
Accuracy simply measures how often the classifier makes the correct prediction. It’s the ratio between the number of correct predictions and the total number of predictions. The accuracy metric is not suited for unbalanced classes.
2. Precision:
Precision is defined as the ratio of the total number of correctly classified positive classes divided by the total number of predicted positive classes.
“Precision is a useful metric in cases where False Positive is a higher concern than False Negatives”
3. Recall:
Recall is defined as the ratio of the total number of correctly classified positive classes divide by the total number of positive classes. Or, out of all the positive classes, how much we have predicted correctly. Recall should be high(ideally 1).
“Recall is a useful metric in cases where False Negative trumps False Positive”
4. F-measure / F1-Score
There will be cases where there is no clear distinction between whether Precision is more important or Recall. We combine them. In practice, when we try to increase the precision of our model, the recall goes down and vice-versa. The F1-score captures both the trends in a single value.
5. Sensitivity & Specificity
CYBER CRIME CASES AND CONFUSION MATRIX:
Cybercrime is criminal activity that either targets or uses a computer, a computer network or a networked device.
Most, but not all, cybercrime is committed by cybercriminals or hackers who want to make money. Cybercrime is carried out by individuals or organizations.
Some cybercriminals are organized, use advanced techniques and are highly technically skilled. Others are novice hackers.
In the present world, cybercrime offenses are happening at an alarming rate. As the use of the Internet is increasing many offenders, make use of this as a means of communication in order to commit a crime. Cybercrime will cost nearly $6 trillion per annum by 2021 as per the cybersecurity ventures report in 2020. For illegal activities, cybercriminals utilize any network computing devices as a primary means of communication with a victims’ devices, so attackers get profit in terms of finance, publicity and others by exploiting the vulnerabilities over the system. Cybercrimes are steadily increasing daily.
Security analytics with the association of data analytic approaches help us for analyzing and classifying offenses from India-based integrated data that may be either structured or unstructured. The main strength of this work is testing analysis reports, which classify the offenses accurately with 99 percent accuracy.
This is a list of rates that are often computed from a confusion matrix for a binary classifier:
- Accuracy: Overall, how often is the classifier correct?
- (TP+TN)/total = (100+50)/165 = 0.91
- Misclassification Rate: Overall, how often is it wrong?
- (FP+FN)/total = (10+5)/165 = 0.09
- equivalent to 1 minus Accuracy
- also known as "Error Rate"
- True Positive Rate: When it's actually yes, how often does it predict yes?
- TP/actual yes = 100/105 = 0.95
- also known as "Sensitivity" or "Recall"
- False Positive Rate: When it's actually no, how often does it predict yes?
- FP/actual no = 10/60 = 0.17
- True Negative Rate: When it's actually no, how often does it predict no?
- TN/actual no = 50/60 = 0.83
- equivalent to 1 minus False Positive Rate
- also known as "Specificity"
- Precision: When it predicts yes, how often is it correct?
- TP/predicted yes = 100/110 = 0.91
- Prevalence: How often does the yes condition actually occur in our sample?
- actual yes/total = 105/165 = 0.64
Thanks for reading. See you soon with new article...
THANK YOU!!
Ex. Hav. Indian Army || Major Dhyan Chand National Sports Awardee ( Best Coach) || Director Of Physical Education Shri Sant Gadgebaba mahavidyalay kapshi Kolhapur || YouTuber || Motivational Speaker
3 年Keep it up my dear
Executive - New projects and technology Chemical Engineer Galaxy surfactants LTD.
3 年Keep it up