Threat Detection: A Visual Model
Disclaimer: This post represents my own opinions, not those of my employer.
Introduction
One of the major tasks in InfoSec is the process of trying to identify and fill gaps in our security defenses before attackers can find and exploit them. This task is the primary focus of Detection Engineering. Much has already been written to define Detection Engineering as a discipline, but there are many open questions on the best ways to do it. That's where I'm hoping to offer some insights.
In this post, I'm going to set up a model for thinking about threat detection and then use it to answer two fundamental questions:
The Model
First we'll build a visual model. we'll start with Mitre's ATT&CK framework for Windows.
We're going to represent each technique as a single dot.
Then, if we've done a good job defining our attack techniques, an attacker's path through the network from initial access to objective could be represented as a path between dots. Obviously, an attacker doesn't have to use a technique from EVERY tactic, but they do have to use SOME.
Using this model, the goal of threat detection is to build mechanisms to prevent and/or detect as many techniques as possible, so an attacker can't get from initial access to objective without triggering alarms. At this point, it becomes a game of probability: how probable is it that an attacker will take a path through that doesn't alert you to their presence?
Let's make this simple model just a little more realistic. Many techniques have numerous sub-techniques. There are 193 techniques and 401 sub-techniques in ATT&CK v12. We're going to approximate that by turning some dots into clusters of dots, so that each dot now represents a sub-technique.
Then, let's consider that each sub-technique may have multiple distinct procedures for how it can be executed. Jared Atkinson provides a very compelling demonstration of this idea in an excellent blog series where he refines the definition of a "procedure." He demonstrates with an example where he identifies 4 distinct procedures for the sub-technique "OS Credential Dumping: LSASS Memory" and graphs them into a single chart, which I'm going to call a "procedure map."
(Adding my own note to Jared's work here: a procedure map should only include the ESSENTIAL operations that MUST BE EXECUTED in order to implement the procedure.)
Adding in procedures, our model becomes a mass of dots and clusters of dots, with each dot representing a procedure and clusters representing procedures that implement the same sub-technique, or Jared's "sub-technical synonyms."
Answering the Questions
Using this model, let's discuss what makes the best detection. The natural answer is "the detection that comprehensively covers the most dots." The more dots a detection covers, the more likely an attacker's path through the network will traverse one of them. But from a logistical perspective, it's impractical to detect unrelated procedures in a single detection. (It's hard enough to maintain simple detections, who needs complex ones?) So, if we can find closely related procedures that can be covered in a single detection, or a set of related ones, that's where we get the most impact. Luckily, this grouping of related procedures is already done for us: sub-techniques are often a cluster of related procedures. So, the theoretically ideal detection would be the one that comprehensively covers all the procedures of a single sub-technique.
The detection engineering task is to find the best detection possible given the sub-technique's procedure map, available telemetry, and environmental noise. We're going to borrow Jared's procedure map again to dive in a little deeper.
These are the best case scenarios. In real life, we are often only able to detect some of the procedures, or even some portion of some procedures. The rest are a known gap.
Now let's discuss the worst case scenarios.
领英推荐
The first is one where all we can do is detect the tangential elements (brown on the graph) of a procedure, like the command line parameters used by a specific tool that implements one of the procedures. When we focus on tangential elements, there are almost infinite paths through the procedure map. (An attacker can create a new path by writing a new tool, scripting the procedure, using command-line obfuscation, or load and execute the tool in memory, just to name a few.)
Detecting tangentials shifts the probability game to strongly favor the attacker. This shouldn't be done until all better options have been exhausted. (And yet, much of the publicly available threat detection content is of this nature!)
The next common detection pattern that falls in the worst case category is the one that looks for tuples of procedures. For example, the detection might look for an EXE, ISO, or ZIP file written by Outlook (T1566.001 Spearphishing Attachment) that executes a Powershell script (T1059.001 Powershell). The problem is that this detection covers a specific 2-tuple of dots.* Any other combination, even using some of the same procedures, won't be caught. This turns our hundreds of dots into a hundred thousand 2-tuples! (If we had 500 procedures, there would be 124,750 possible 2-tuple combinations.) Covering that many combinations requires a lot of detections, so this clearly shifts the probability game in the attacker's favor.
* We give this hypothetical detection more credit than it deserves. This example is worse than just a procedure chain, because it doesn't comprehensively cover all the individual procedures in the chain. An attacker could traverse the exact path and still evade detection by using a different implementation of the procedure, like phishing with a different file type.
There's one more observation to extract here. Let's explore our LSASS Memory example a little further. Let's suppose we have telemetry from the Process Access operation showing the target process and requested access rights (maybe from an EDR hook on NtOpenProcess, for example). We'll add the rights requested by each tool to our graphic. (Note that reading credentials from LSASS memory only actually needs the "PROCESS_VM_READ" permission, but at least two tools overshoot and request all possible permissions.)
Doing the Math
Using this model, we can make a rough mathematical representation of the incremental coverage value of a given detection. Let's pretend that there are 500 total procedures in our cluster of dots (that's probably way too low, but it suffices for our purposes).
Obviously, our math is rough, but it serves to illustrate a few important points:
One Last Pattern
There is one last type of detection pattern that can be effective, which I'm going to call a "grouple" detection. This is similar to the tuple pattern, but differs in a critical way. Instead of looking at just tuples of single procedures, it's looking for tuples of entire groups or categories of procedures. For example, we might look for ANY alert under the "Initial Access" tactic, followed within a certain timeframe by ANY alert in the "Persistence" tactic.
This detection pattern has the potential to be effective, so long as our coverage of the individual procedures in each group is good. But where it particularly excels is in situations where telemetry for a given procedure is too noisy to alert on directly. We can't cover the procedure itself, but we can alert when it co-occurs with other (possibly also noisy) procedures, allowing us to build in some coverage where none is otherwise possible. That makes "grouple" detections an excellent option to cope with those inevitable cases where the environment noise is just too loud to permit a high-fidelity detection (scheduled tasks, I'm looking at you!).
The Winning Strategy
Using this visual model, I hope I've offered some compelling answers to my two fundamental questions:
Q: Where is the best place to focus detection engineering efforts to maximize impact??
A: At the sub-technique level, covering each procedure as comprehensively as possible.
Q: How do we evaluate the quality of a detection? What makes one detection better or worse than another?
A: The best detection is the one that covers the procedures of a sub-technique as comprehensively as possible. Be careful to focus on essential elements, not tangential ones!
With these answers, we can see that a great strategy for winning this game of probability is to review each sub-technique one by one and implement rules to detect (or to prevent) each procedure. Where telemetry shortcomings or environmental noise prevent a detection, we can generate an event that can be bundled into "grouple" detections.
If you can cover enough of the threat space, and maybe with a little help from Lady Luck, you can keep attackers out of your network!
Thoughts?
Hopefully this exploration of my visual model has helped clarify the challenges we detection engineers are constantly trying to tackle! If you have any thoughts to add, post a comment or let me know on Twitter!
Thinking systems, designing systems
10 个月I think the clustering may be visually appealing (personally, not for me), but it would be better to keep the sub-techniques on the parallel lines. You might then benefit from some Parallel Coordinates presentation and analysis (especially if the scales are given a meaning).
Just a thrunctioneer making friends. ( :-{?▓
2 年Thanks for all the time and effort you put into this. I really enjoy the visual model.
Detection Engineering and Threat Hunting Lead at Northwestern Mutual
2 年This is a really great post
Simplifying Cybersecurity technology decisions
2 年Really great level of depth here and spot on insights! I'm going to have to start using "grouple" from now on. ??