Use of confusion matrix in detecting cyber crime

Use of confusion matrix in detecting cyber crime

Summer 2021 Task 05 ???????

Particularly in the last decade, Internet usage has been growing rapidly. However, as the Internet becomes a part of the day to day activities, cybercrime is also on the rise.

Here is a way to predict cyber crimes using machine learning techniques ->

No alt text provided for this image

At present, there is no generalized framework is available to categorize cybercrime offenses by feature extraction of the cases. In the present work, data analysis and machine learning are incorporated to build a cybercrime detection and analytics system

For feature extraction the TFIDF vector process is used This developed methodology is based on 4 phases that are applied to the data, which are reconnaissance, preprocessing, data clustering and classification and prediction analysis.

In this phase only the feature extraction process takes place. It converts the high dimensional data to low dimensional data. This preprocessed data are helpful for data visualization because a composite data can organize well when that complex data are converted as a less number of dimensions

Here, na?ve Bayes is used for classification and k-means are used for clustering .The cybercrime offenses are clustered based on the TFIDF weighted vectors obtained from the features. The data has considered by using a 70:30 thumb rule. Where 70% of data were utilized for training and 30% of the data

In the prediction analysis step, the cybercrime data were analyzed and used to predict which crime is occurring more in a particular year at a particular location

Precision: It is the measure of truly predicted positive samples to the total number of positively predicted samples. If the precision score is more then it represents that our model is pretty good to classify the samples. 

No alt text provided for this image

Recall: It is the measure of truly predicted positive samples of all the samples present in the actual class as yes. It is also termed as the sensitivity of the model

No alt text provided for this image

CONFUSION MATRIX

No alt text provided for this image

depicts the confusion matrix for our model when the training size was 0.8 and the test size was 0.2. By this, we know how many cases are classified correctly and how many are classified incorrectly. It means we can find out the true negatives and true positives and false negatives and false positives classified by using the model.

要查看或添加评论,请登录

Yash Indane的更多文章

  • Use of K-mean clustering in security domain

    Use of K-mean clustering in security domain

    Summer Task-10 & ARTH Task 42 Github -> What is K-means Clustering? K-means clustering is one of the simplest and…

  • OSPF Routing Protocol using Dijkastra Algorithm

    OSPF Routing Protocol using Dijkastra Algorithm

    What is OSPF? The OSPF (Open Shortest Path First) protocol is one of a family of IP Routing protocols, and is an…

  • Using Face Recognition for automation

    Using Face Recognition for automation

    SUMMER-TASK-6 (Team Task) and ARTH TASK 38 GitHub -> In this article I will explain how we can use Face Recognition and…

    3 条评论
  • JavaScript use cases in Industry

    JavaScript use cases in Industry

    Summer Task 7.2 What is JavaScript? JavaScript, often abbreviated as JS, is a programming language that conforms to the…

    1 条评论
  • Running Chrome in Docker container

    Running Chrome in Docker container

    Summer - Task 02 ??????? By default containers don't support GUI, but by some way we can achieve that, let's discuss…

    3 条评论
  • Training a ML model inside a container

    Training a ML model inside a container

    Task 01 ??????? Task Description ?? ?? Pull the Docker container image of CentOS image from DockerHub and create a new…

    4 条评论
  • Deploying WordPress in Amazon EKS with RDS in Backend

    Deploying WordPress in Amazon EKS with RDS in Backend

    ARTH-TASK-23 WordPress is a free and open-source content management system written in PHP and paired with a MySQL or…

    3 条评论
  • How industry uses MongoDB

    How industry uses MongoDB

    ARTH-TASK-32 What is MongoDB? MongoDB is a source-available cross-platform document-oriented database program…

  • Creating a Multicloud Setup of Kubernetes using Ansible Roles

    Creating a Multicloud Setup of Kubernetes using Ansible Roles

    TASK 28 Task Description ?? ?? CREATE A MULTI-CLOUD SETUP of K8S cluster: ?? Lunch node in AWS ?? Lunch node in Azure…

    2 条评论
  • Helm and Charts in Kubernetes

    Helm and Charts in Kubernetes

    ARTH TASK 24 What are Charts ? A chart is a collection of files that describe a related set of Kubernetes resources. A…

社区洞察

其他会员也浏览了