CONFUSION MATRIX AND CYBER CRIME

CONFUSION MATRIX AND CYBER CRIME

Confusion Matrix is the visual representation of the Actual VS Predicted values. It measures the performance of our Machine Learning classification model and looks like a table-like structure.

This is how a Confusion Matrix of a binary classification problem looks like :


No alt text provided for this image


Elements of Confusion Matrix

It represents the different combinations of Actual VS Predicted values. Let’s define them one by one.

TP: True Positive: The values which were actually positive and were predicted positive.

FP: False Positive: The values which were actually negative but falsely predicted as positive. Also known as Type I Error.

FN: False Negative: The values which were actually positive but falsely predicted as negative. Also known as Type II Error.

TN: True Negative: The values which were actually negative and were predicted negative.

Understanding it with the help of an example

Taking an example of the Stock Market Crash prediction project. This is a binary classification problem where 1 means the stock market will crash and 0 means the stock market will not crash and suppose we have 1000 records in our dataset.

Let’s see the confusion matrix of the following :

No alt text provided for this image



In the above matrix, we can analyze the model as :

True positive: 540 records of the stock market crash were predicted correctly by the model.

False-positive: 150 records of not a stock market crash were wrongly predicted as a market crash.

False-negative: 110 records of a market crash were wrongly predicted as not a market crash.

True Negative: 200 records of not a market crash were predicted correctly by the model.

Other Evaluation Metrics associated with it

Accuracy:

It is calculated by dividing the total number of correct predictions by all the predictions.

No alt text provided for this image

Recall / Sensitivity:

The recall is the measure to check correctly positive predicted outcomes out of the total number of positive outcomes.


Precision:

Precision checks how many outcomes are actually positive outcomes out of the total positively predicted outcomes.


F beta score:

F beta score is the harmonic mean of Precision and Recall and it captures the contribution of both of them. The contribution depends on the beta value in the below formula.

No alt text provided for this image


The default beta value is 1 which gives us the formula of F1score, where the contribution of Precision and Recall are the same. Higher the F1 score, the better the model.


The beta value < 1 gives more weight to Precision than Recall and the beta value>1 gives more weight to Recall.

You can calculate the values of all the above-mentioned metrics using the Stock market crash example provided above.

When to use which metrics for evaluation

Here comes the most important part of all of the above discussion i.e when to use which metric.

By this statement I mean to say which measure we should go for to evaluate our model, with Accuracy or with Recall or Precision or both.

Confusing ???? :p

It’s not, I’ll explain this by taking some examples which will clear your concepts even more. So let’s start.

Accuracy is the standard metric to go for to evaluate a classification machine learning model

But

We can not rely on Accuracy all the time as in some cases accuracy gives us a wrong interpretation of the quality of the model, for example in the case when our dataset is imbalanced.

Another case of not using Accuracy is when we are dealing with a domain-specific project or when our company wants a particular result from the model. Let’s get into more detail with some examples.

Example 1: Domain-Specific case

Taking our previous example of Stock Market Crash Prediction, our main aim should be to reduce the outcomes where the model was predicting as not a market crash whereas it was a market crash.

Imagine a situation where our model has wrongly predicted that the market will not crash and instead it crashed, the people have to go through a lot of losses in this case.

The measure which takes into account this problem is FN and therefore Recall. So we need to focus on reducing the value of FN and increasing the value of Recall. 

In most medical cases, such as cancer prediction or any disease prediction we try to reduce the value of FN.

Example 2: Spam Detection 

In the case of Email Spam detection, if an email is predicted as a scam but is not actually a scam then it can cause problems to the user.

In this case, we need to focus on reducing the value of FP (i.e when the mail is falsely predicted as spam) and as a result, increasing the value of Precision.

  • In some cases of imbalanced data problems, both Precision and Recall are important so we consider the F1 score as an evaluation metric.

There is another concept of the AUC ROC curve for evaluation of a classification model, which is one of the most important metrics to learn. We will discuss that in some other blog of mine.

Which one to use and where?

This is the most common question that arises while modeling the Data and the solution lies in the problem’s statement domain. Consider these two cases:

1. Suppose you are predicting whether the person will get a cardiac arrest. In this scenario, you can’t afford any misclassification and all the predictions made should be accurate. With that said, the cost of False Negatives is high, so the person was prone to attack but was predicted as safe. These cases should be avoided. In these situations, we need a model with high recall.

2. Suppose a search engine provided random results that are all predicted as positive by the model, then there is very little possibility that the user will rely on it. Therefore, in this scenario, we need a model with high precision so that user experience improves, and the website grows in the right direction.

CYBER CRIME

No alt text provided for this image

What is cybercrime?

Cybercrime is criminal activity that either targets or uses a computer, a computer network or a networked device.

Most, but not all, cybercrime is committed by cybercriminals or hackers who want to make money. Cybercrime is carried out by individuals or organizations.

Some cybercriminals are organized, use advanced techniques and are highly technically skilled. Others are novice hackers.

Rarely, cybercrime aims to damage computers for reasons other than profit. These could be political or personal.

Example :

In a recent sensational cybercrime, a 16-year-old student of the Air Force Bal Bharti School in new Delhi was arrested for having created a pornographic website. The case which otherwise would have gathered dust in court, was quickly capped by the juvenile welfare board, who granted bail. The student was also rusticated from school. Yet another fallout of the e-volution happened at Indore, when a group of school students spliced photographs of girls from their school with nude pictures downloaded from the net. No arrests were made. The reason being that enforcement agencies had no clear-cut idea about the definition of cybercrime and the laws under which it should be tried. Though the information technology bill 2000, dealing with cyberlaws, has been passed by the Lok Sabha, Rajya Sabha and the president of India, there is still to emerge an awareness of the same. Prof. M. S. Raste, principal of Symbiosis Law College admits that "a major portion of the judicial fraternity has no idea about cyberlaws and their applications". Even if some of the younger lawyers like advocate jitendra patil are net-savvy and understand the implications of cybercrime, there remains a cloud of confusion when it comes to the matter of jurisdiction. "What if the crime has been committed by a website developer who is not a resident of india? How does one register a crime against an unknown person who has hacked into a confidential site? Such questions need immediate clarification before legal experts can give thought to the enactment of the cyberlaws," opines prof raste. Incidentally, Symbiosis Law College will be starting a one-year diploma course in cyberlaws. Cybercrimes, as Deepak Shikarpur, it chairman of the Mahratta Chamber of Commerce, Industries and Agriculture (MCCIA), explains, is not restricted to pornography. "Some of the major felonies on the rise are those related to e-commerce. the digital economy is moving at lightning speed and has changed everything, including relationships inside and outside a company's four walls," he says. Agrees Ujwal Marathe, a chartered accountant specialising in it audit who has co-authored a book on cyberlaws with Shikarpur and Sarita Bhave". The kind of crimes which have surfaced in the financial sphere of the internet relate to hacking into banking transactions, theft of intellectual property rights, misuse of credit card numbers and misappropriations conducted by employees," informs Marathe. Here again, Indian cyberlaws are being seen as too "open-ended" to curb the adventurous spirit of cyber-thieves. "There are specific issues of jurisdiction which need to be settled," advises Marathe. However, cyberlaws do give special powers to the police force. A police officer not below the rank of Deputy Superintendent of Police (DSP), can investigate into a cybercrime and seize computer equipment from a company or cybercafe without the need for a search or arrest warrant. Deputy Commissioner of Police (DCP) sanjay varma states that some senior officers in his team are quite well aware of cybercrimes and cyberlaws, having attended workshops. "But we are yet to investigate into one and get a feel of the procedures because there has been no complaint registered so far. Complaints may not be forthcoming because the public does not know that the police has been equipped to deal with cybercrimes," he says. There was a complaint in the past but that was by pune-based rohas nagpal who filed a case against the website rediff.com for providing access to pornographic sites. "But that was before the cyberlaws came into existence and hence, it was filed under a section of the indian penal code," informs nagpal. The case is stuck in the high court. Prior to that, two pune junior college students had successfully hacked into the vsnl site and opened accounts of its subscribers, but no action was taken against them because they had confessed to doing it as a test of their skills. The biggest problem as of now is for the enforcement agencies like the police to understand how to trace the culprits. "Just about anyone can commit a crime from a cybercafe," says marathe. In such a case, it would be almost impossible to arrive at an identification. DCP Varma says that should such a situation arrive, "the help of experts would be taken." The only solution, as suggested by prof marathe, is that the government should step on a promotion drive of these new laws. till then, the damage will continue, laws or no laws.

要查看或添加评论,请登录

Aakansha Singh的更多文章

  • K-Means clustering

    K-Means clustering

    What is Clustering? Clustering is one of the most common exploratory data analysis technique used to get an intuition…

  • Kubernetes Integration with Python-CGI

    Kubernetes Integration with Python-CGI

    What is Kubernetes? Kubernetes is a portable, extensible, open-source platform for managing containerized workloads and…

  • Object Recognition using CNN model

    Object Recognition using CNN model

    ?? In this task : ??Create a model that will detect a car in a live stream or video and recognize characters on the…

  • Javascript Integration with Docker

    Javascript Integration with Docker

    In this project, I have integrated python with Docker !! Python is one of the most popular languages nowadays and…

  • JAVASCRIPT

    JAVASCRIPT

    Introduction JavaScript is a programming language used primarily by Web browsers to create a dynamic and interactive…

  • FACE RECOGNIZER

    FACE RECOGNIZER

    Create Face Dataset : First we need to create face dataset. The data set will contain 100 image sample of a face.

社区洞察

其他会员也浏览了