Machine Learning for CyberSecurity
Nehal Ingole
Jr React Developer at Kshan Tech Soft Pvt Ltd || DevOps | Full Stack Development | CICD | React Native
Security have been a major issue in today’s digital world organizations does there business via the websites/ webapp. But here we observer there are many kinds of attack / threats on them. Most common is the DOS- attack. This not only consumes resources and energy of the organization but also lead in loss of business. For Security and monitoring team via the manual ways and methodology its very challenging to observe and protect our websites 24 * 7 from threats and hackers. So here automation comes into picture we can use automation tools and its integration with machine learning to solve these cyber security requirements.
?Scope :
Design Methodology
Architecture of Dos attack prevention system
Using Jenkins to Establish Same setup
Setting up the requied Environment:
Step 1: Launch 2 ec2-instances on AWS instance with RHEL-8 AMI. Also one instance with t2.micro machine-type and other with t2.large machine-type . Here we are going to use instance with t2.micro machine-type for web-services and Instance for t2.large machine-type for Docker and NFS server. We have also tagged them with same names too.
Step 2: Login to the instances with the ec2-user as username and authentication key. Now for easy access we will first allow the root login via password in both the instances for this we need to do changes in /etc/ssh/sshd_config file. Here we need to mention???
Step 3: Now we also need to set new root password for this we can use the passwd command as shown below. Further Restart the sshd service.
Note: We need to perform step 2 and 3 in both of the ec2-instances
Step 4:? Now as these instances are going to be Jenkins worker nodes so we need java installed in both the instance.?
Step 6: Now we want our demo website to be configured for this we used our git repo which we created
?Here we have cloned this repo in /var/www/html i.e document root of httpd
Step 7: Further we have started and enabled the services for httpd. Also in httpd we have the logs been collected at /etc/httpd/logs. Root user have the permission to view them.
WebPage
Logs:
Step 8: Here we have launch the web-instance in the aws and aws uses its security group as the firewalld. So in the OS we need to separately install and start the firewall.
Further we also need to all hits over port 80 for this we
#firewall-cmd --zone=public--add-port=80/tcp --permanent
#firewall-cmd --reload
Now lets move toward configuring the Docker and nfs servers in other ec2-instance of t2.large machine-type
Step 9: For having the nfs setup we need to install nfs-utils package. Now we want to have one folder /share1 where my nfs client can put there data and we can also access it directly simply like accessing a normal directory. For this we need to first create a folder say /share1 now we need to mention in the nfs configuration file (/etc/exports) which folder is the shared one. Format for mentioning this is?????????????????????????????????????????????????????????
??# <folder name> <client_ip>(permisions)
Further we also need to start the nfs services. We can view our shared folders using the command?
# exposefs –v
Restart nfs Server
Step 10: Now lets setup the docker in the system. For this we need to first get the repository for the docker added in the system (for this we need yum-utils package too). And then we can install the docker-ce package
Install Docker
Exposing the docker port (2376) so that we can access it from outside world too.
Starting the docker services
Step 11: Now lets configure the Jenkins here in our case it is running over the locally. Here we need java to run Jenkins further for the installation we also need the jenkins rpm which we can get over the internet for the installation we can then simply use the command
# rpm -ivh <package_name
Further we need to start and enable the services of jenkins
?Jenkins by default runs on port number 8080.
Steps 12: Now lets configure the nodes (worker nodes for the jenkins)
领英推荐
Docker-nfs node :
Web-Server Node:
Status of the nodes:?
Implementation:
Step 1: At the most we will be configuring the jenkins jobs i.e Move logs to nfs folder. (The Job is restricted to be running on web-server node only) Now via this job we are moving the log file from /etc/http/logs to /share1/current_logs. Further we are also reloading the httpd services. Once this is done we will copy the logs in /share1/current_logs to /share1/all_logs with appropriate time-stamp. So that in future if we need to see past logs we can refer these logs
Step 2: Now we will be configuring our second job i.e launch logstash via docker (This job is? restricted to be ran on the Docker-nfs-node only). The job will be triggered only if previous job (Move logs to nfs folder) was build successful Via this job we will first check if already formatted old logs are present if yes it will remove those then proceed. Further the job will lauch an logstash container. Now logstash takes a min or two to generate its output i.e (web-server-logs.csv) file. For this we will wait till its there once the file is generated we don’t need the container so we will remove the logstash container
?(Note: make shure that the volume mounted /mini_project/logstash have read write and execute powers for all or docker user if need we can give those via
# chmod ugo+rwx /mini_project/logstash
Here in the logstash we are having our custom requirements i.e we want logstash to take input from a particular file we want logstash to give output in particular destination. Here we need to use custom configuration file. (hence we have used -f option to pass that custom conf file)
In the logstash for almost any kind of requirement we use the plugins here we had the requirement to filer apache logs. We used grok plugin for this grok internally uses the regular expression to find and filter required data from raw data. Grok have many other keyword too eg. if we need to find IP in the file we can use %IP:client.?
Further in similar way to retrieve the status code and timestamp we used the mutete and data plugin. Also via the input and output we have mention the input destination and the output destination with required format
Launching logstash via docker:
#docker run --rm -dir --name lsl -v /share1/current_logs:/etc/httpd/logs? -v
/mini_project/logstash/:/myconf/ logstash:7.7.1 -f /myconf/my.conf
With the use of -f option we are telling location of custom conf file. Also the output file generated will be in /myconf/ but we have mounted it to /mini_project/logstash so at the end we can see that in this folder too.?
Step 3: Now here we will be configuring our next job i.e Find BlockedIP. Via this job we will be getting the suspicious IP. (This job is restricted to be ran in Docker-nfs-node only). This job will be triggered once the previous job i.e launch logstash will be build successfully. In this job once the blockedIP.txt file is created it is moved to /share1/BlockedIP folder so that my web server can use this to block those IP
Model.py
import pandas as p
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
ds = pd.read_csv('/mini_project/logstash/web-server-logs.csv',header=None)
ds.columns = ['IP','Status_code','Bytes_trf','TimeStamp']
ds = ds[['IP','Status_code','TimeStamp']]
ds.info()
ds = ds[['IP','Status_code']]
ds = ds.groupby(['IP', 'Status_code']).Status_code.agg('count').to_frame('Count').reset_index()
ds = ds.sort_values(['Count'],ascending=False)
ds.head(20)
training_data = ds.drop(['IP'],axis=1
training_data.head(20)
sc = StandardScaler()
data_scaled = sc.fit_transform(training_data)
model=KMeans(n_clusters=2)
pred=model.fit_predict(data_scaled)
final_dataset = ds
final_dataset["Cluster"] = pred
final_dataset.head()
a = []
for index,row in ds.iterrows():
????if row['Count'] > 500:
????????a.append(row['Cluster'])
blockedCluster = max(set(a), key = a.count)
blockedIP = []
for index,row in ds.iterrows():
????if row['Cluster'] == blockedCluster:
????????blockedIP.append(row['IP'])
for b_ip in blockedIP:
????print("Blocked IP: {}".format(b_ip))
blockedIP=set(blockedIP)
with open('blockedIp.txt', 'w') as filehandle:
???????????for listitem in blockedIP:
????????????filehandle.write('%s\n' % listitem)
)
Now in this code we have use the k-mean clustering algorithm after multiple test we have found that depending on the number of hits its good to have at 2 clusters as static for now.
Once the model is trained we can use it and using it we have created the blockedIP.txt file compromising of the BlockedIP
Step 4: Now we come to our last job (this job is restricted to be ran in web-server node only). Via this job we are only asking the internal firewall running in the webserver to block the suspicious IP.
IPBlock.py
from os import syste
with open("/share1/blockedIP/blockedIp.txt","r") as file1:
??for i in file1:
????system('iptables -A INPUT -s {} -j DROP'.format(i.rstrip('\n')))
????system('service iptables save')
Here after this packets receiving from these suspicious IP will be dropped and not hit the server.
Step 5: Now lets create the view in jenkins for the build pipeline
This is created because we have set in the proper way for each job which one is there upstream job and which one is there down stream job
Step 6: Now from my windows machine with the help of a bash script we? will be hitting the web server with multiple fake/ dummy request and try to increase the load over the server. Before or during this process many genuin clients might have also hit the webserver
#!/bin/bas
while true
do
??????curl https://13.232.174.175/gaurav.html
????????curl https://13.232.174.175/gaurav2001.html
????????curl https://13.232.174.175/jack.html
????????curl https://13.232.174.175/nehal.html
????????curl https://13.232.174.175/asdasfadfsf.html
????????curl https://13.232.174.175/asdadw.html
done
IP address of attacker:
Now Lets trigger the pipeline and see the console outputs of the job
Move logs to nfs folder:
launch logstash:
Find BlockedIP:
From here we can see it have detected the windows machine IP as suspicious IP.?
block IP's:
Now as the firewall rules are saved so lets check the connectivity via the windows machine
Now windows can’t connect
We can also check the rules via going inside the web-server
Now at this same time we have also tried to connect to the webserver by different device and its works absolutely fine
Result:
?Conclusion
In this project we have successfully implemented the k-mean clustering algorithm for the suspicious IP detection. We did the data processing using the logstash for which we setup the docker lauch logstash over it created our own custom configuration file. We used the jenkins to automate the stuff on a single click the all the jobs can be run in such a way the pipeline were designed. Finally the implementation also gave a good response.
Software Engineer@Principal Global Services | 1xAWS Certified | 1xMicrosoft Certified | 1xOracle Certified | AWS | Azure | Docker | K8s | Ansible
3 年Well done Nehal, Gaurav! ???