CYBER SECURITY AND CONFUSION FUNCTION.
“You are an essential ingredient in our ongoing effort to reduce Security Risk.” ― Kirsten Manthorne

CYBER SECURITY AND CONFUSION FUNCTION.


confusion matrix

When we get the data, after data cleaning, pre-processing and wrangling, the first step we do is to feed it to an outstanding model and of course, get output in probabilities. But how can we measure the effectiveness of our model? Better the effectiveness, better the performance and that’s what we want. And it is where the Confusion matrix comes into the limelight. Confusion Matrix is a performance measurement for machine learning classification.

A confusion matrix is a table that is used to determine the performance of a classification model. We compare the predicted values for test data with the true values known to us. By this, we know how many cases are classified correctly and how many are classified incorrectly. The table below shows the structure of confusion matrix.


No alt text provided for this image

confusion matrix

Let’s understand the terms used here:

  • In two-class problem, such as attack state, we assign the event normal as “positive” and anomaly as “negative“.
  • “True Positive” for correctly predicted event values.
  • “False Positive” for incorrectly predicted event values.
  • “True Negative” for correctly predicted no-event values.
  • “False Negative” for incorrectly predicted no-event values.

Confusion matrices have two types of errorsType I and Type II

Now lets see these terms and their significance under the light of cyber attack prediction for better understanding.

IDS or Intrusion Detection System checks for any malicious activity on the system. It monitors the packets coming over internet using some ML model and predicts whether it is normal or an anomaly.

Lets say our model created the following confusion matrix for total of 165 packets it examined.

No alt text provided for this image

A total of 165 packets were analyzed by our model in IDS system which have been classified in the above confusion matrix.

  • “Positive” -> Model predicted no attack.
  • “Negative” -> Model predicted attack.
  • True Negative: Out of 55 times for which model predicted attack will take place, 50 predictions were ‘True’ which means 50 times attack actually took place. Due to prediction, Security Operations Centre (SOC) will receive notification and can prevent the attack.
  • False Negative: Out of 55 times for which model predicted attack will take place, 5 times the attack didn’t happen. This can be considered as “False Alarm” and also Type II error.
  • True Positive: The model predicted 110 times that attack wouldn’t take place, out of which 100 times actually no attack happened. These are the correct predictions.
  • False Positive: 10 times the attack actually took place when the model had predicted that no attack will happen. It is also called as Type I error.
No alt text provided for this image

Type I error:

No alt text provided for this image


Type I error (False Positive)

This type of error can prove to be very dangerous. Our system predicted no attack but in real attack takes place, in that case no notification would have reached the security team and nothing can be done to prevent it. The False Positive cases above fall in this category and thus one of the aim of model is to minimize this value.

Type II error:

No alt text provided for this image

Type II error — False Alarm (False Negative)

This type of error are not very dangerous as our system is protected in reality but model predicted an attack. the team would get notified and check for any malicious activity. This doesn’t cause any harm. They can be termed as False Alarm.


Which one to use and where?

This is the most common question that arises while modeling the Data and the solution lies in the problem’s statement domain. Consider these two cases:

1. Suppose you are predicting whether the person will get a cardiac arrest. In this scenario, you can’t afford any misclassification and all the predictions made should be accurate. With that said, the cost of False Negatives is high, so the person was prone to attack but was predicted as safe. These cases should be avoided. In these situations, we need a model with high recall.

2. Suppose a search engine provided random results that are all predicted as positive by the model, then there is very little possibility that the user will rely on it. Therefore, in this scenario, we need a model with high precision so that user experience improves, and the website grows in the right direction.


CYBER CRIME

Cybercrime is criminal activity that either targets or uses a computer, a computer network or a networked device.

Most, but not all, cybercrime is committed by cybercriminals or hackers who want to make money. Cybercrime is carried out by individuals or organizations.

Some cybercriminals are organized, use advanced techniques and are highly technically skilled. Others are novice hackers.

In the present world, cybercrime offenses are happening at an alarming rate. As the use of the Internet is increasing many offenders, make use of this as a means of communication in order to commit a crime. Cybercrime will cost nearly $6 trillion per annum by 2021 as per the cybersecurity ventures report in 2020. For illegal activities, cybercriminals utilize any network computing devices as a primary means of communication with a victims’ devices, so attackers get profit in terms of finance, publicity and others by exploiting the vulnerabilities over the system. Cybercrimes are steadily increasing daily.

Security analytics with the association of data analytic approaches help us for analyzing and classifying offenses from India-based integrated data that may be either structured or unstructured. The main strength of this work is testing analysis reports, which classify the offenses accurately with 99 percent accuracy.

This is a list of rates that are often computed from a confusion matrix for a binary classifier:

  • Accuracy: Overall, how often is the classifier correct?
  • (TP+TN)/total = (100+50)/165 = 0.91
  • Misclassification Rate: Overall, how often is it wrong?
  • (FP+FN)/total = (10+5)/165 = 0.09
  • equivalent to 1 minus Accuracy
  • also known as "Error Rate"
  • True Positive Rate: When it's actually yes, how often does it predict yes?
  • TP/actual yes = 100/105 = 0.95
  • also known as "Sensitivity" or "Recall"
  • False Positive Rate: When it's actually no, how often does it predict yes?
  • FP/actual no = 10/60 = 0.17
  • True Negative Rate: When it's actually no, how often does it predict no?
  • TN/actual no = 50/60 = 0.83
  • equivalent to 1 minus False Positive Rate
  • also known as "Specificity"
  • Precision: When it predicts yes, how often is it correct?
  • TP/predicted yes = 100/110 = 0.91
  • Prevalence: How often does the yes condition actually occur in our sample?
  • actual yes/total = 105/165 = 0.64


Thanks for holding up and reading the article

Hope to see you all in my next article.


要查看或添加评论,请登录

Shubham Pandit的更多文章

  • Object Recognition using CNN model

    Object Recognition using CNN model

    ?? In this task : ??Create a model that will detect a car in a live stream or video and recognize characters on the…

  • Industry Usecase of JavaScript

    Industry Usecase of JavaScript

    Introduction JavaScript is a programming language used primarily by Web browsers to create a dynamic and interactive…

  • Javascript Integration with Docker

    Javascript Integration with Docker

    Python CGI with Docker (Task 7) In this project, I have integrated python with Docker !! Python is one of the most…

  • Kubernetes Integration with Python-CGI

    Kubernetes Integration with Python-CGI

    LW_DATE_28_06_2021_Task 09 Kubernetes Kubernetes (also known as k8s or “kube”) is an open source container…

  • K-Means clustering

    K-Means clustering

    Task Description Create a blog/article/video about explaining k mean clustering and its real usecase in the security…

  • Face Recognizer

    Face Recognizer

    Thursday, June 24, 2021 Hello Fellas!!! This post a task given by Mr. Vimal Daga sir during my summer internship…

社区洞察