The Growing Role of Machine Learning in Cybersecurity

The Growing Role of Machine Learning in Cybersecurity

Today, it is impossible to deploy effective cybersecurity technology without relying heavily on machine learning. At the same time, it’s impossible to effectively deploy machine learning without a comprehensive, rich, and complete approach to the underlying data.

Why has machine learning become so critical to cybersecurity?

With machine learning, cybersecurity systems can analyze patterns and learn from them to help prevent similar attacks and respond to changing behavior. It can help cybersecurity teams be more proactive in preventing threats and responding to active attacks in real-time. It can reduce the amount of time spent on routine tasks and enable organizations to use their resources more strategically. Machine learning can make cybersecurity?simpler, more proactive, less expensive, and far more effective. But it can only do those things if the underlying data that supports the machine learning provides the complete picture of the environment. As they say, garbage in, garbage out.

Why is focusing on data critical to the success of machine learning in cybersecurity?

Machine learning is about developing patterns and manipulating those patterns with algorithms. To develop patterns, you need a lot of rich data from everywhere because the data needs to represent as many potential outcomes from as many potential scenarios as possible. It is not just about the quantity of data; it’s also about the quality. The data must have complete, relevant, and rich context collected from every potential source—whether that is at the endpoint, on the network, or in the cloud. You also have to focus on cleaning the data so you can make sense of the data you capture. so you can define outcomes.

Rapidly synthesize large volumes of data: One of the biggest challenges faced by analysts is the need to rapidly synthesize intelligence generated across their attack surface, which is typically generated much faster than their teams can manually process. Machine learning can quickly analyze large volumes of historical and dynamic intelligence, enabling teams to operationalize data from various sources in near real-time.

Activate expert intelligence at scale: Regular training cycles enable models to continuously learn from their evolving sample population, which includes analyst-labeled detections or analyst-reviewed alerts. This prevents recurring false positives and enables models to learn and enforce expert-generated ground truth.

Automate repetitive, manual tasks: Applying machine learning to specific tasks can help prevent security teams from mundane, repetitive tasks, acting as a force multiplier that enables them to scale their response to incoming alerts and redirect time and resources toward complex, strategic projects.

Augment analyst efficiency: Machine learning can augment analyst insight with real-time, up-to-date intelligence, enabling analysts across threat hunting and security operations to effectively prioritize resources to address their organization’s critical vulnerabilities and investigate time-sensitive ML-alerted detections.

Autonomous threat detection and response

In the first category, machine learning enables organizations to automate manual work, especially in processes where it is critical to maintain high levels of accuracy and to respond with machine-level speed – such as automatic threat detection and response, or classifying new adversary patterns.

Applying machine learning in these scenarios augments signature-based methods of threat detection with a generalized approach that learns the differences between benign and malicious samples and can rapidly detect new in-the-wild threats.

Driving analyst efficiency with machine learning

Machine learning models can also assist in analyst-led investigations by alerting teams to investigate detections or by providing prioritized vulnerabilities for patching. Analyst review can be especially valuable in scenarios where there is insufficient data for models to predict outcomes with high degrees of confidence or to investigate benign-appearing behavior that may go unalerted by malware classifiers.

Model efficiency for malware classifiers:

One of the most common applications of ML in cybersecurity is malware classification. Malware classifiers output a scored prediction on whether a given sample is malicious; with “scored” referring to the confidence level associated with the resulting classification. One way we assess the performance of these models is by representing predictions along two axes: accuracy (whether an outcome was correctly classified; “true” or “false”), and output (the class a model assigns to a sample; “positive” or “negative”). If a malware classifier makes a “positive” detection, this indicates that the model is predicting that a given sample is malicious, based on observing features that it has learned to associate with known malicious samples.

Unparalleled intelligence of the CrowdStrike Security Cloud:

CrowdStrike’s models are trained on the rich telemetry of the CrowdStrike Security Cloud, which correlates trillions of data points across CrowdStrike’s asset graph, intel graph, and patented Threat Graph? to deliver unparalleled visibility and perpetually refine threat intelligence across an organization’s attack surface.

Augmenting human expertise:

CrowdStrike’s models fuel autonomous threat detection and response while also augmenting human expertise in expert-led domains, such as threat hunting and IT and security operations. Machine learning models across the Falcon platform operate to deliver a next-generation analyst workbench that automates detection and response, maximizes analyst efficiency with high-fidelity machine learning-alerted detections, and provides intelligent vulnerability management recommendations for proactive defense.

Multiple layers of defense:

CrowdStrike applies machine learning throughout the Falcon platform to offer a robust, multi-layered defense across the process lifecycle (pre-execution, runtime, and post-execution).? Pre-execution, on-sensor, and cloud-based machine learning models operate synchronously to automatically detect and respond to threats, equipping the lightweight Falcon agent with a robust first line of defense. The constant synchronicity between cloud and on-sensor machine learning models enables detections made on-sensor to be globally enforced across an attack surface and, similarly, enables detections made by cloud-based models to be instantly enforced across all protected endpoints. To augment this approach, CrowdStrike also applies advanced behavioral analysis at runtime, using cloud-based models to analyze endpoint events to classify?indicators of attack?(IOAs).?AI-powered IOAs?proactively detect emerging threats regardless of malware or tools used and operate asynchronously to on-sensor models to trigger local analysis of suspicious behavior based on real-time threat intelligence.

Challenge 1: The much higher accuracy requirements. For example, if you are just doing image processing, and the system mistakes a dog for a cat, that might be annoying but likely does not have a life-or-death impact. If a machine learning system mistakes a fraudulent data packet for a legitimate one that leads to an attack against a hospital and its devices, the impact of the miscategorization can be severe.

Every day, organizations see large volumes of data packets traverse firewalls. Even if only 0.1% of the data is miscategorized by machine learning, we can wrongly block huge amounts of normal traffic that would severely impact the business. It is understandable that in the early days of machine learning, some organizations were concerned that the models would not be as accurate as human security researchers. It takes time, and it also takes huge amounts of data to train a machine-learning model to get up to the same level of accuracy as a skilled human. Humans, however, do not scale and are among the scarcest resources in IT today. ML can help us detect unknown attacks that are hard for humans to detect, as ML can build up baseline behaviors and detect any abnormalities that deviate from them.

Challenge 2: Access to large amounts of training data, especially labeled data. Machine learning requires a large amount of data to make models and predictions more accurate. Gaining malware samples is a lot harder than acquiring data in image processing and NLP. There is not enough attack data, and lots of security risk data is sensitive and not available because of privacy concerns.

Challenge 3: The ground truth. Unlike images, the ground truth in cybersecurity might not always be available or fixed. The cybersecurity landscape is dynamic and changing all the time. Not a single malware database can claim to cover all the malware in the world, and more malware is being generated at any moment. What is the ground truth that we should compare to decide our accuracy?

Identification and profiling: With new devices getting connected to enterprise networks all the time, it’s not easy for an IT organization to be aware of them all. Machine learning can be used to identify and profile devices on a network. That profile can determine the different features and behaviors of a given device.

Automated anomaly detection: Using machine learning to rapidly identify known bad behaviors is a great use case for security. After first profiling devices and understanding regular activities, machine learning knows what is normal and what is not.

Zero-day detection: With traditional security, a bad action has to be seen at least once for it to be identified as a bad action. That’s the way that legacy signature-based malware detection works. Machine learning can intelligently identify previously unknown forms of malware and attacks to help protect organizations from potential zero-day attacks.

Insights at scale: With data and applications in many different locations, being able to identify trends across large volumes of devices is just not humanly possible. Machine learning can do what humans cannot, enabling automation for insights at scale.

Policy recommendations: The process of building security policies is often a very manual effort that has no shortage of challenges. With an understanding of what devices are present and what is normal behavior, machine learning can help to provide policy recommendations for security devices, including firewalls. Instead of having to manually navigate around different conflicting access control lists for different devices and network segments, machine learning can make specific recommendations that work in an automated approach.


#machinelearning #cyber #cybersecurityawareness #cybersecuritytraining #cybersecuritytips #artificialintelligence #costreduction #intelligence #automation #detection #response #malware #crowdstrike #humanexperience #expertise #layers #defence #identification #profiling #zeroday #policy

要查看或添加评论,请登录

CyberYaan Training & Consultancy的更多文章

社区洞察

其他会员也浏览了