Integration of ML + Sec +Ops
ABHISHEK KUMAR SINGH
Software Developer at VectoScalar Technologies Private Limited
Machine Learning helps companies make sense of the security threats encountered by their organizations and helps the staff focus on strategic, valuable tasks.More enterprises are incorporating Machine Learning to make existing and future products and solutions more secure.
Use Of Machine Learning For IT security:-
- The demand for using Machine Learning to interpret events and actions from different sources to determine what is safe and what is not is on the rise.
- The idea of using Machine Learning to detect malicious activities and prevent online attacks is fascinating.
- we use algorithms to spot attacks within seconds in the network and mitigate the threat without causing any harm to the company. it will be the first line of defence for the organization and its customers.
- we can also use Machine Learning to automate the company’s repetitive security activities. It will allow other employees to increase productivity by focusing on core activities.
- The higher-ups will rely on you to eliminate the need for new hires that take care of low-value and repetitive decision-making tasks. Thus, you will be indispensable to the company.
- Knowledge of machine learning will also close zero-day vulnerabilities. we can monitor online traffic to identify zero-day exploits and provide insight to the organization, so they can potentially close vulnerabilities and prevent patch exploits before they cause a data breach.
About the task:-
Create an automated system which will be useful for a server in terms of the following features:-
1. This system will keep a log of the information about the clients hit or request to the server, for example, we can get log file of a webserver at location /var/log/httpd/
2. This log data of clients will be used for finding the unusual pattern of a client request for example if a client is sending requests repeatedly. for this purpose, we can use here clustering to make clusters of different patterns of client request and to identify which cluster of client requests can cause some security and performance issue in the server
3. If any kind of unusual pattern we got then we can use Jenkins to perform a certain task for example it can run some command to block that IP which is causing this trouble.
Now lets begin with the task......
I am going to use the server log file which I got from the internet as we need a real and large dataset to perform our task.
This log file is stored at /var/log/httpd in my RHEL8 VM , this log file contains complete info regarding each IP their timing and content related.Below is the sample image for my log dataset;
If you see this file closely then we will get to know that all the fields are not required for us to make clustering because we only want to block those Ip's which are trying to do DOS ( Denial Of Service ) attack on our Httpd server so in that case we only require Ip's with their frequency which will be enough for us to find uncertain pattern.
As it is clear from the image that we need to filter out the necessary items remove the rest from the log file as we only need IP adresses and their frequency to block the IP's regarding DOS(Denial of Service) attack on the Httpd server .
import pandas as pd df=pd.read_csv("C:\\Users\\AKS\\Desktop\\MLOps-workspace\\access_log.csv") df=df[['IP','Time']] freq = {} for item in df['IP']: if (item in freq): freq[item] += 1 else: freq[item] = 1 key_list = list(freq.keys()) val_list = list(freq.values()) def str_ip2_int(s_ip): lst = [int(item) for item in s_ip.split('.')] int_ip = lst[3] | lst[2] << 8 | lst[1] << 16 | lst[0] << 24 return int_ip j=0 for i in key_list: i=i.strip("[ -]") if i.split(".")[-1]=='': df['IP'][j]=(str_ip2_int(i)) j=j+1 import matplotlib.pyplot as plt plt.scatter(key_list,val_list)
Above code does feature selection ,converts the features from string to numeric value as we need numeric value for our ML model and at last plots the Scatter Graph that shows frequencies of respective IP addresses.
Now , I have used the KMeans cluster for identifying IP's doing DOS attack on our HTTP server.?
kmeans.cluster_centers_
Now i have stored those IP's doing DOS attack in a text file bd_ip.txt for using it further by Jenkins to block those IP's.
import matplotlib.pyplot as plt import ipaddress import os j=0 for i in df['freq']: if int(i)>=300: ip=ipaddress.ip_address(df['IP'][j]).__str__() with open('bd_ip.txt','a+') as file: file.write(""+ip+"\n") file.close() plt.scatter(df['IP'][j],i,c='red') else: plt.scatter(df['IP'][j],i,c='green') j+=1 plt.show()
In the above image the scatter graph clearly shows by red dot the IP's trying DOS attack.
Now moving towards the Jenkins's part:-
The very first JOB (KMeans) will run python program which creates the KMeans model that will filter out all those IP's which are trying to do DOS attack.
Now the 2nd JOB (Block_ip) will read the bd_ip.txt file to block all IP's listed in the file.
Now those IP's listed in the bd_id.txt file will be blocked by Jenkins which our tasks intention.
GitHub link for the reference:-
Thanks for reading.....
I will be happy to attend any suggestion or query regarding this article...
SDET-II At Akamai tech. | Ex- Aryaka networks | Ex- Zscaler
4 年Well done ABHISHEK KUMAR SINGH
Programmer Analyst at Cognizant
4 年Very nicely explain every bit of the steps. Every body should read this article. Great combination of MLOPs and SecOps (i.e. ML-Sec-Ops). I really like it. Great work bro..... :)
Senior Developer @KFin Technologies Ltd. || Ex-Exavalu
4 年Congratulations bro done well ??