Machine Learning for  CyberSecurity

Machine Learning for CyberSecurity

Security have been a major issue in today’s digital world organizations does there business via the websites/ webapp. But here we observer there are many kinds of attack / threats on them. Most common is the DOS- attack. This not only consumes resources and energy of the organization but also lead in loss of business. For Security and monitoring team via the manual ways and methodology its very challenging to observe and protect our websites 24 * 7 from threats and hackers. So here automation comes into picture we can use automation tools and its integration with machine learning to solve these cyber security requirements.


?Scope :

  • The approach used for creating machine learning model for the DOS attack using similar approach model can be developed for other patter based attacks too.
  • Internally for creation of the model we have use the K-mean clustering algorithm (Discussed further in detail). And in this algorithm we need to decide the hyper parameter i.e. number of cluster to be formed this is decided by analysis over the data (elbow method). Now we can provide this report of analysis frequently to our datascientest team so that if any changes need in code in future it can be done. This will reduce task of datascientest as well as there will be continuous improvement in the project.

Design Methodology

No alt text provided for this image

Architecture of Dos attack prevention system

  • At the most we will be hosting our demo website using the apache httpd web server1
  • In order to replicate the real world scenario we will hitting our webserver as genuine client and with one of the virtual machine we will try to perform DOS attack over the web server
  • Now apache web server generates the logs which we are storing in the NFS server so that those log can be access by other teams in the organization.
  • The logs collected and present in the NFS can’t be directly given to the machine learning model so for this purpose we will use logstach for this transformation operation
  • Now we will give the formatted logs (csv file) to the machine learning code. Now the model will be created with this code
  • ?further over the data depending on the number of hits done by the client in a particular time span i.e approx >3000 request within an hour. If such cases are identified they will be considered to be suspicious
  • List of this suspicious IP will be used and? firewalld rule will be created to block those IP from using the website.

No alt text provided for this image

Using Jenkins to Establish Same setup

  1. Ok now we know the final architecture which we need to achieve. Now we can do the same this with automation using jenkins.
  2. In the jenkins we have one master node setup and 2 node i.e one is web server and other is the one with docker and nfs running.
  3. In the jenkins we will be configuring the multiple jobs i.e one job to get the raw data from webserver to the nfs server. Further once that job is done other job will be automatically get triggered i.e to launch a logstash docker container.
  4. Further we will be training our model via another job getting trigged after logstash job is done. The output generated will be the the list of suspicious IP. this will be stored in BlockedIP.txt file (discussed further).
  5. Now using the suspicious IP we will be triggering the firewall rule.

Setting up the requied Environment:

Step 1: Launch 2 ec2-instances on AWS instance with RHEL-8 AMI. Also one instance with t2.micro machine-type and other with t2.large machine-type . Here we are going to use instance with t2.micro machine-type for web-services and Instance for t2.large machine-type for Docker and NFS server. We have also tagged them with same names too.

No alt text provided for this image

Step 2: Login to the instances with the ec2-user as username and authentication key. Now for easy access we will first allow the root login via password in both the instances for this we need to do changes in /etc/ssh/sshd_config file. Here we need to mention???

No alt text provided for this image
No alt text provided for this image

Step 3: Now we also need to set new root password for this we can use the passwd command as shown below. Further Restart the sshd service.

No alt text provided for this image
No alt text provided for this image

Note: We need to perform step 2 and 3 in both of the ec2-instances

Step 4:? Now as these instances are going to be Jenkins worker nodes so we need java installed in both the instance.?

No alt text provided for this image

Step 6: Now we want our demo website to be configured for this we used our git repo which we created

(https://github.com/Ingole712521/Industrial-Training).

?Here we have cloned this repo in /var/www/html i.e document root of httpd

No alt text provided for this image

Step 7: Further we have started and enabled the services for httpd. Also in httpd we have the logs been collected at /etc/httpd/logs. Root user have the permission to view them.

No alt text provided for this image

WebPage

No alt text provided for this image

Logs:

No alt text provided for this image

Step 8: Here we have launch the web-instance in the aws and aws uses its security group as the firewalld. So in the OS we need to separately install and start the firewall.

No alt text provided for this image

Further we also need to all hits over port 80 for this we

#firewall-cmd --zone=public--add-port=80/tcp --permanent

#firewall-cmd --reload        

Now lets move toward configuring the Docker and nfs servers in other ec2-instance of t2.large machine-type

Step 9: For having the nfs setup we need to install nfs-utils package. Now we want to have one folder /share1 where my nfs client can put there data and we can also access it directly simply like accessing a normal directory. For this we need to first create a folder say /share1 now we need to mention in the nfs configuration file (/etc/exports) which folder is the shared one. Format for mentioning this is?????????????????????????????????????????????????????????


??# <folder name> <client_ip>(permisions)        


Further we also need to start the nfs services. We can view our shared folders using the command?


# exposefs –v        


No alt text provided for this image
No alt text provided for this image

Restart nfs Server

No alt text provided for this image

Step 10: Now lets setup the docker in the system. For this we need to first get the repository for the docker added in the system (for this we need yum-utils package too). And then we can install the docker-ce package

No alt text provided for this image

Install Docker

No alt text provided for this image

Exposing the docker port (2376) so that we can access it from outside world too.

No alt text provided for this image

Starting the docker services

No alt text provided for this image

Step 11: Now lets configure the Jenkins here in our case it is running over the locally. Here we need java to run Jenkins further for the installation we also need the jenkins rpm which we can get over the internet for the installation we can then simply use the command


# rpm -ivh <package_name        


Further we need to start and enable the services of jenkins

No alt text provided for this image

?Jenkins by default runs on port number 8080.

No alt text provided for this image

Steps 12: Now lets configure the nodes (worker nodes for the jenkins)

Docker-nfs node :

No alt text provided for this image
No alt text provided for this image

Web-Server Node:

No alt text provided for this image
No alt text provided for this image

Status of the nodes:?

No alt text provided for this image

Implementation:

Step 1: At the most we will be configuring the jenkins jobs i.e Move logs to nfs folder. (The Job is restricted to be running on web-server node only) Now via this job we are moving the log file from /etc/http/logs to /share1/current_logs. Further we are also reloading the httpd services. Once this is done we will copy the logs in /share1/current_logs to /share1/all_logs with appropriate time-stamp. So that in future if we need to see past logs we can refer these logs

No alt text provided for this image
No alt text provided for this image

Step 2: Now we will be configuring our second job i.e launch logstash via docker (This job is? restricted to be ran on the Docker-nfs-node only). The job will be triggered only if previous job (Move logs to nfs folder) was build successful Via this job we will first check if already formatted old logs are present if yes it will remove those then proceed. Further the job will lauch an logstash container. Now logstash takes a min or two to generate its output i.e (web-server-logs.csv) file. For this we will wait till its there once the file is generated we don’t need the container so we will remove the logstash container

?(Note: make shure that the volume mounted /mini_project/logstash have read write and execute powers for all or docker user if need we can give those via

# chmod ugo+rwx /mini_project/logstash        
No alt text provided for this image
No alt text provided for this image

Here in the logstash we are having our custom requirements i.e we want logstash to take input from a particular file we want logstash to give output in particular destination. Here we need to use custom configuration file. (hence we have used -f option to pass that custom conf file)

No alt text provided for this image

In the logstash for almost any kind of requirement we use the plugins here we had the requirement to filer apache logs. We used grok plugin for this grok internally uses the regular expression to find and filter required data from raw data. Grok have many other keyword too eg. if we need to find IP in the file we can use %IP:client.?

Further in similar way to retrieve the status code and timestamp we used the mutete and data plugin. Also via the input and output we have mention the input destination and the output destination with required format

Launching logstash via docker:

#docker run --rm -dir --name lsl -v /share1/current_logs:/etc/httpd/logs? -v 

/mini_project/logstash/:/myconf/ logstash:7.7.1 -f /myconf/my.conf        

With the use of -f option we are telling location of custom conf file. Also the output file generated will be in /myconf/ but we have mounted it to /mini_project/logstash so at the end we can see that in this folder too.?

Step 3: Now here we will be configuring our next job i.e Find BlockedIP. Via this job we will be getting the suspicious IP. (This job is restricted to be ran in Docker-nfs-node only). This job will be triggered once the previous job i.e launch logstash will be build successfully. In this job once the blockedIP.txt file is created it is moved to /share1/BlockedIP folder so that my web server can use this to block those IP

No alt text provided for this image

Model.py


import pandas as p

from sklearn.cluster import KMeans

from sklearn.preprocessing import StandardScaler




ds = pd.read_csv('/mini_project/logstash/web-server-logs.csv',header=None)

ds.columns = ['IP','Status_code','Bytes_trf','TimeStamp']

ds = ds[['IP','Status_code','TimeStamp']]

ds.info()




ds = ds[['IP','Status_code']]

ds = ds.groupby(['IP', 'Status_code']).Status_code.agg('count').to_frame('Count').reset_index()

ds = ds.sort_values(['Count'],ascending=False)

ds.head(20)




training_data = ds.drop(['IP'],axis=1

training_data.head(20)




sc = StandardScaler()

data_scaled = sc.fit_transform(training_data)




model=KMeans(n_clusters=2)

pred=model.fit_predict(data_scaled)




final_dataset = ds

final_dataset["Cluster"] = pred

final_dataset.head()




a = []

for index,row in ds.iterrows():

????if row['Count'] > 500:

????????a.append(row['Cluster'])

blockedCluster = max(set(a), key = a.count)




blockedIP = []

for index,row in ds.iterrows():

????if row['Cluster'] == blockedCluster:

????????blockedIP.append(row['IP'])




for b_ip in blockedIP:

????print("Blocked IP: {}".format(b_ip))




blockedIP=set(blockedIP)




with open('blockedIp.txt', 'w') as filehandle:

???????????for listitem in blockedIP:

????????????filehandle.write('%s\n' % listitem)


)        

Now in this code we have use the k-mean clustering algorithm after multiple test we have found that depending on the number of hits its good to have at 2 clusters as static for now.

Once the model is trained we can use it and using it we have created the blockedIP.txt file compromising of the BlockedIP

Step 4: Now we come to our last job (this job is restricted to be ran in web-server node only). Via this job we are only asking the internal firewall running in the webserver to block the suspicious IP.

No alt text provided for this image
No alt text provided for this image

IPBlock.py

from os import syste

with open("/share1/blockedIP/blockedIp.txt","r") as file1:

??for i in file1:

????system('iptables -A INPUT -s {} -j DROP'.format(i.rstrip('\n')))

????system('service iptables save')


        

Here after this packets receiving from these suspicious IP will be dropped and not hit the server.

Step 5: Now lets create the view in jenkins for the build pipeline

No alt text provided for this image

This is created because we have set in the proper way for each job which one is there upstream job and which one is there down stream job

Step 6: Now from my windows machine with the help of a bash script we? will be hitting the web server with multiple fake/ dummy request and try to increase the load over the server. Before or during this process many genuin clients might have also hit the webserver


#!/bin/bas

while true

do
??????curl https://13.232.174.175/gaurav.html

????????curl https://13.232.174.175/gaurav2001.html

????????curl https://13.232.174.175/jack.html

????????curl https://13.232.174.175/nehal.html

????????curl https://13.232.174.175/asdasfadfsf.html

????????curl https://13.232.174.175/asdadw.html

done




        

IP address of attacker:

No alt text provided for this image

Now Lets trigger the pipeline and see the console outputs of the job

Move logs to nfs folder:

launch logstash:

No alt text provided for this image
No alt text provided for this image

Find BlockedIP:

No alt text provided for this image

From here we can see it have detected the windows machine IP as suspicious IP.?

block IP's:

Now as the firewall rules are saved so lets check the connectivity via the windows machine

No alt text provided for this image

Now windows can’t connect

We can also check the rules via going inside the web-server

No alt text provided for this image

Now at this same time we have also tried to connect to the webserver by different device and its works absolutely fine

No alt text provided for this image

Result:

  • Our Environment was configured Successfully (Jenkins-master node, webserver, docker-nfs node)The Machine learning model working internally gave the correct output i.e suspicious IP
  • The Whole pipeline worked successfully and at the end the suspicious IP was not able to connect to the web server while other genuin client can connect

?Conclusion

In this project we have successfully implemented the k-mean clustering algorithm for the suspicious IP detection. We did the data processing using the logstash for which we setup the docker lauch logstash over it created our own custom configuration file. We used the jenkins to automate the stuff on a single click the all the jobs can be run in such a way the pipeline were designed. Finally the implementation also gave a good response.



Nivedita Shinde

Software Engineer@Principal Global Services | 1xAWS Certified | 1xMicrosoft Certified | 1xOracle Certified | AWS | Azure | Docker | K8s | Ansible

3 年

Well done Nehal, Gaurav! ???

回复

要查看或添加评论,请登录

Nehal Ingole的更多文章

  • K-means Clustering

    K-means Clustering

    Clustering Clustering is one of the most common exploratory data analysis technique used to get an intuition about the…

  • WordPress

    WordPress

    ARTH - Task 18 ??????? Task Description?? ?? Create an AWS EC2 instance ?? Configure the instance with Apache…

  • Using Face Recognition to send a mail ?? , WhatsApp ?? and launching AWS instance ??

    Using Face Recognition to send a mail ?? , WhatsApp ?? and launching AWS instance ??

    So In this task we are using the Face Recognition model to recognize someone's Face and after that we will ask the…

    2 条评论
  • JavaScript

    JavaScript

    JavaScript is a light-weight object-oriented programming language that is used by several websites for scripting the…

  • Images Processing with python

    Images Processing with python

    Hello Guys !! In this article we are going to learn about the Images Processing with the help of python but before that…

  • Cyber Security And Confusion Matrix

    Cyber Security And Confusion Matrix

    Hello everyone, In this article we will discuss what role does Confusion Matrix plays in the Cybersecurity field?? As…

  • LVM in Hadoop

    LVM in Hadoop

    What is Hadoop? Apache Hadoop is the software it is a collection of open-source software utilities that facilitates…

  • Create a partitions using python

    Create a partitions using python

    In this task we are going to see how to Create a partitions using python for the python code you can visit to my GitHub…

    1 条评论
  • PARTITION IN LINUX

    PARTITION IN LINUX

    In this task we going to see how to create a PARTITION IN LINUX First add a Hardisk to the ec2 instances Then install…

  • GUI CONTAINER ON THE DOCKER

    GUI CONTAINER ON THE DOCKER

    Task Description ??GUI container on the docker ??Launch a container on Docker in GUI mode ??Run any GUI software on…

社区洞察

其他会员也浏览了