登录查看更多内容

Machine Learning for CyberSecurity

Nehal Ingole

Jr React Developer at Kshan Tech Soft Pvt Ltd || DevOps | Full Stack Development | CICD | React Native

发布日期: 2021年11月19日

Security have been a major issue in today’s digital world organizations does there business via the websites/ webapp. But here we observer there are many kinds of attack / threats on them. Most common is the DOS- attack. This not only consumes resources and energy of the organization but also lead in loss of business. For Security and monitoring team via the manual ways and methodology its very challenging to observe and protect our websites 24 * 7 from threats and hackers. So here automation comes into picture we can use automation tools and its integration with machine learning to solve these cyber security requirements.

?Scope :

The approach used for creating machine learning model for the DOS attack using similar approach model can be developed for other patter based attacks too.
Internally for creation of the model we have use the K-mean clustering algorithm (Discussed further in detail). And in this algorithm we need to decide the hyper parameter i.e. number of cluster to be formed this is decided by analysis over the data (elbow method). Now we can provide this report of analysis frequently to our datascientest team so that if any changes need in code in future it can be done. This will reduce task of datascientest as well as there will be continuous improvement in the project.

Design Methodology

Architecture of Dos attack prevention system

At the most we will be hosting our demo website using the apache httpd web server1
In order to replicate the real world scenario we will hitting our webserver as genuine client and with one of the virtual machine we will try to perform DOS attack over the web server
Now apache web server generates the logs which we are storing in the NFS server so that those log can be access by other teams in the organization.
The logs collected and present in the NFS can’t be directly given to the machine learning model so for this purpose we will use logstach for this transformation operation
Now we will give the formatted logs (csv file) to the machine learning code. Now the model will be created with this code
?further over the data depending on the number of hits done by the client in a particular time span i.e approx >3000 request within an hour. If such cases are identified they will be considered to be suspicious
List of this suspicious IP will be used and? firewalld rule will be created to block those IP from using the website.

Using Jenkins to Establish Same setup

Ok now we know the final architecture which we need to achieve. Now we can do the same this with automation using jenkins.
In the jenkins we have one master node setup and 2 node i.e one is web server and other is the one with docker and nfs running.
In the jenkins we will be configuring the multiple jobs i.e one job to get the raw data from webserver to the nfs server. Further once that job is done other job will be automatically get triggered i.e to launch a logstash docker container.
Further we will be training our model via another job getting trigged after logstash job is done. The output generated will be the the list of suspicious IP. this will be stored in BlockedIP.txt file (discussed further).
Now using the suspicious IP we will be triggering the firewall rule.

Setting up the requied Environment:

Step 1: Launch 2 ec2-instances on AWS instance with RHEL-8 AMI. Also one instance with t2.micro machine-type and other with t2.large machine-type . Here we are going to use instance with t2.micro machine-type for web-services and Instance for t2.large machine-type for Docker and NFS server. We have also tagged them with same names too.

Step 2: Login to the instances with the ec2-user as username and authentication key. Now for easy access we will first allow the root login via password in both the instances for this we need to do changes in /etc/ssh/sshd_config file. Here we need to mention???

Step 3: Now we also need to set new root password for this we can use the passwd command as shown below. Further Restart the sshd service.

Note: We need to perform step 2 and 3 in both of the ec2-instances

Step 4:? Now as these instances are going to be Jenkins worker nodes so we need java installed in both the instance.?

Step 6: Now we want our demo website to be configured for this we used our git repo which we created

(https://github.com/Ingole712521/Industrial-Training).

?Here we have cloned this repo in /var/www/html i.e document root of httpd

Step 7: Further we have started and enabled the services for httpd. Also in httpd we have the logs been collected at /etc/httpd/logs. Root user have the permission to view them.

WebPage

Logs:

Step 8: Here we have launch the web-instance in the aws and aws uses its security group as the firewalld. So in the OS we need to separately install and start the firewall.

Further we also need to all hits over port 80 for this we

#firewall-cmd --zone=public--add-port=80/tcp --permanent

#firewall-cmd --reload

Now lets move toward configuring the Docker and nfs servers in other ec2-instance of t2.large machine-type

Step 9: For having the nfs setup we need to install nfs-utils package. Now we want to have one folder /share1 where my nfs client can put there data and we can also access it directly simply like accessing a normal directory. For this we need to first create a folder say /share1 now we need to mention in the nfs configuration file (/etc/exports) which folder is the shared one. Format for mentioning this is?????????????????????????????????????????????????????????

??# <folder name> <client_ip>(permisions)

Further we also need to start the nfs services. We can view our shared folders using the command?

# exposefs –v

Restart nfs Server

Step 10: Now lets setup the docker in the system. For this we need to first get the repository for the docker added in the system (for this we need yum-utils package too). And then we can install the docker-ce package

Install Docker

Exposing the docker port (2376) so that we can access it from outside world too.

Starting the docker services

Step 11: Now lets configure the Jenkins here in our case it is running over the locally. Here we need java to run Jenkins further for the installation we also need the jenkins rpm which we can get over the internet for the installation we can then simply use the command

# rpm -ivh <package_name

Further we need to start and enable the services of jenkins

?Jenkins by default runs on port number 8080.

Steps 12: Now lets configure the nodes (worker nodes for the jenkins)

领英推荐

Learn Ethical Hacking & Reverse Engineering

Free Online Courses With Printable Certificates 1 年前

Some words on model repository security on the Hub

Thomas Wolf 1 年前

Autonomous Continuous Security Testing in the Age of…

VTF University? 4 个月前

Docker-nfs node :

Web-Server Node:

Status of the nodes:?

Implementation:

Step 1: At the most we will be configuring the jenkins jobs i.e Move logs to nfs folder. (The Job is restricted to be running on web-server node only) Now via this job we are moving the log file from /etc/http/logs to /share1/current_logs. Further we are also reloading the httpd services. Once this is done we will copy the logs in /share1/current_logs to /share1/all_logs with appropriate time-stamp. So that in future if we need to see past logs we can refer these logs

Step 2: Now we will be configuring our second job i.e launch logstash via docker (This job is? restricted to be ran on the Docker-nfs-node only). The job will be triggered only if previous job (Move logs to nfs folder) was build successful Via this job we will first check if already formatted old logs are present if yes it will remove those then proceed. Further the job will lauch an logstash container. Now logstash takes a min or two to generate its output i.e (web-server-logs.csv) file. For this we will wait till its there once the file is generated we don’t need the container so we will remove the logstash container

?(Note: make shure that the volume mounted /mini_project/logstash have read write and execute powers for all or docker user if need we can give those via

# chmod ugo+rwx /mini_project/logstash

Here in the logstash we are having our custom requirements i.e we want logstash to take input from a particular file we want logstash to give output in particular destination. Here we need to use custom configuration file. (hence we have used -f option to pass that custom conf file)

In the logstash for almost any kind of requirement we use the plugins here we had the requirement to filer apache logs. We used grok plugin for this grok internally uses the regular expression to find and filter required data from raw data. Grok have many other keyword too eg. if we need to find IP in the file we can use %IP:client.?

Further in similar way to retrieve the status code and timestamp we used the mutete and data plugin. Also via the input and output we have mention the input destination and the output destination with required format

Launching logstash via docker:

#docker run --rm -dir --name lsl -v /share1/current_logs:/etc/httpd/logs? -v 

/mini_project/logstash/:/myconf/ logstash:7.7.1 -f /myconf/my.conf

With the use of -f option we are telling location of custom conf file. Also the output file generated will be in /myconf/ but we have mounted it to /mini_project/logstash so at the end we can see that in this folder too.?

Step 3: Now here we will be configuring our next job i.e Find BlockedIP. Via this job we will be getting the suspicious IP. (This job is restricted to be ran in Docker-nfs-node only). This job will be triggered once the previous job i.e launch logstash will be build successfully. In this job once the blockedIP.txt file is created it is moved to /share1/BlockedIP folder so that my web server can use this to block those IP

Model.py

import pandas as p

from sklearn.cluster import KMeans

from sklearn.preprocessing import StandardScaler




ds = pd.read_csv('/mini_project/logstash/web-server-logs.csv',header=None)

ds.columns = ['IP','Status_code','Bytes_trf','TimeStamp']

ds = ds[['IP','Status_code','TimeStamp']]

ds.info()




ds = ds[['IP','Status_code']]

ds = ds.groupby(['IP', 'Status_code']).Status_code.agg('count').to_frame('Count').reset_index()

ds = ds.sort_values(['Count'],ascending=False)

ds.head(20)




training_data = ds.drop(['IP'],axis=1

training_data.head(20)




sc = StandardScaler()

data_scaled = sc.fit_transform(training_data)




model=KMeans(n_clusters=2)

pred=model.fit_predict(data_scaled)




final_dataset = ds

final_dataset["Cluster"] = pred

final_dataset.head()




a = []

for index,row in ds.iterrows():

????if row['Count'] > 500:

????????a.append(row['Cluster'])

blockedCluster = max(set(a), key = a.count)




blockedIP = []

for index,row in ds.iterrows():

????if row['Cluster'] == blockedCluster:

????????blockedIP.append(row['IP'])




for b_ip in blockedIP:

????print("Blocked IP: {}".format(b_ip))




blockedIP=set(blockedIP)




with open('blockedIp.txt', 'w') as filehandle:

???????????for listitem in blockedIP:

????????????filehandle.write('%s\n' % listitem)


)

Now in this code we have use the k-mean clustering algorithm after multiple test we have found that depending on the number of hits its good to have at 2 clusters as static for now.

Once the model is trained we can use it and using it we have created the blockedIP.txt file compromising of the BlockedIP

Step 4: Now we come to our last job (this job is restricted to be ran in web-server node only). Via this job we are only asking the internal firewall running in the webserver to block the suspicious IP.

IPBlock.py

from os import syste

with open("/share1/blockedIP/blockedIp.txt","r") as file1:

??for i in file1:

????system('iptables -A INPUT -s {} -j DROP'.format(i.rstrip('\n')))

????system('service iptables save')

Here after this packets receiving from these suspicious IP will be dropped and not hit the server.

Step 5: Now lets create the view in jenkins for the build pipeline

This is created because we have set in the proper way for each job which one is there upstream job and which one is there down stream job

Step 6: Now from my windows machine with the help of a bash script we? will be hitting the web server with multiple fake/ dummy request and try to increase the load over the server. Before or during this process many genuin clients might have also hit the webserver

#!/bin/bas

while true

do
??????curl https://13.232.174.175/gaurav.html

????????curl https://13.232.174.175/gaurav2001.html

????????curl https://13.232.174.175/jack.html

????????curl https://13.232.174.175/nehal.html

????????curl https://13.232.174.175/asdasfadfsf.html

????????curl https://13.232.174.175/asdadw.html

done

IP address of attacker:

Now Lets trigger the pipeline and see the console outputs of the job

Move logs to nfs folder:

launch logstash:

Find BlockedIP:

From here we can see it have detected the windows machine IP as suspicious IP.?

block IP's:

Now as the firewall rules are saved so lets check the connectivity via the windows machine

Now windows can’t connect

We can also check the rules via going inside the web-server

Now at this same time we have also tried to connect to the webserver by different device and its works absolutely fine

Result:

Our Environment was configured Successfully (Jenkins-master node, webserver, docker-nfs node)The Machine learning model working internally gave the correct output i.e suspicious IP
The Whole pipeline worked successfully and at the end the suspicious IP was not able to connect to the web server while other genuin client can connect

?Conclusion

In this project we have successfully implemented the k-mean clustering algorithm for the suspicious IP detection. We did the data processing using the logstash for which we setup the docker lauch logstash over it created our own custom configuration file. We used the jenkins to automate the stuff on a single click the all the jobs can be run in such a way the pipeline were designed. Finally the implementation also gave a good response.

Nivedita Shinde

3 年

Well done Nehal, Gaurav! ???

查看更多评论

要查看或添加评论，请登录

Nehal Ingole的更多文章

K-means Clustering

2021年7月19日

K-means Clustering

Clustering Clustering is one of the most common exploratory data analysis technique used to get an intuition about the…
WordPress

2021年7月14日

WordPress

ARTH - Task 18 ??????? Task Description?? ?? Create an AWS EC2 instance ?? Configure the instance with Apache…
Using Face Recognition to send a mail ?? , WhatsApp ?? and launching AWS instance ??

2021年6月21日

Using Face Recognition to send a mail ?? , WhatsApp ?? and launching AWS instance ??

So In this task we are using the Face Recognition model to recognize someone's Face and after that we will ask the…

2 条评论
JavaScript

2021年6月20日

JavaScript

JavaScript is a light-weight object-oriented programming language that is used by several websites for scripting the…
Images Processing with python

2021年6月9日

Images Processing with python

Hello Guys !! In this article we are going to learn about the Images Processing with the help of python but before that…
Cyber Security And Confusion Matrix

2021年6月6日

Cyber Security And Confusion Matrix

Hello everyone, In this article we will discuss what role does Confusion Matrix plays in the Cybersecurity field?? As…
LVM in Hadoop

2021年3月15日

LVM in Hadoop

What is Hadoop? Apache Hadoop is the software it is a collection of open-source software utilities that facilitates…
Create a partitions using python

2021年3月14日

Create a partitions using python

In this task we are going to see how to Create a partitions using python for the python code you can visit to my GitHub…

1 条评论
PARTITION IN LINUX

2021年3月14日

PARTITION IN LINUX

In this task we going to see how to create a PARTITION IN LINUX First add a Hardisk to the ec2 instances Then install…
GUI CONTAINER ON THE DOCKER

2021年3月12日

GUI CONTAINER ON THE DOCKER

Task Description ??GUI container on the docker ??Launch a container on Docker in GUI mode ??Run any GUI software on…

See all articles

Machine Learning for CyberSecurity

Nehal Ingole

Jr React Developer at Kshan Tech Soft Pvt Ltd || DevOps | Full Stack Development | CICD | React Native

?Scope :

Design Methodology

WebPage

Logs:

领英推荐

Status of the nodes:?

Model.py

IP address of attacker:

Nehal Ingole的更多文章

社区洞察

其他会员也浏览了

Cyber Briefing: 2024.11.20

Mastering Exploit Analysis: Unlocking Exploit DB and Metasploit for Vulnerability Research and Development

Cyber Briefing - 2023.07.05

Understanding App Security: A Crucial Pillar in Modern Technology

AI in Cybersecurity: Role and Impact of Generative AI on Modern Security and Enhancing Protection with Advanced Tools, Technologies, Solutions

The Future of Cybersecurity: Integrating AI and Human Intelligence

Certified Ethical Hacker (CEH) v13 Powered by AI: The Future of Cybersecurity Training ??

The Declining Relevance of Traditional Cybersecurity Certifications in the Age of AI and Machine Learning

How AI Enhances Offensive Security: A Deep Dive

Red Team AI Penetration Testing: Essential for Business and Government in Today's Cybersecurity Landscape

?Scope :

Design Methodology

WebPage

Logs:

领英推荐

Status of the nodes:?

Model.py

IP address of attacker:

Nehal Ingole的更多文章

K-means Clustering

WordPress

Using Face Recognition to send a mail ?? , WhatsApp ?? and launching AWS instance ??

JavaScript

Images Processing with python

Cyber Security And Confusion Matrix

LVM in Hadoop

Create a partitions using python

PARTITION IN LINUX

GUI CONTAINER ON THE DOCKER

社区洞察

其他会员也浏览了

Cyber Briefing: 2024.11.20

Mastering Exploit Analysis: Unlocking Exploit DB and Metasploit for Vulnerability Research and Development

Cyber Briefing - 2023.07.05

Understanding App Security: A Crucial Pillar in Modern Technology

AI in Cybersecurity: Role and Impact of Generative AI on Modern Security and Enhancing Protection with Advanced Tools, Technologies, Solutions

The Future of Cybersecurity: Integrating AI and Human Intelligence

Certified Ethical Hacker (CEH) v13 Powered by AI: The Future of Cybersecurity Training ??

The Declining Relevance of Traditional Cybersecurity Certifications in the Age of AI and Machine Learning

How AI Enhances Offensive Security: A Deep Dive

Red Team AI Penetration Testing: Essential for Business and Government in Today's Cybersecurity Landscape