Active Directory Security Assessment: Exploring Unchartered Territories in Bloodhound With Graph Data Science (GDS)
Jean-Michel BESNARD
Partner - Cybersecurity Audit & Advisory at Grant Thornton France Creator of AD Miner (Active Directory Attack Paths Management)
TL;DR
Calculating complex attack paths on large datasets can sometimes be impractical with Neo4j due to the significant CPU power and time required. This article introduces a journey into optimizations and a final technique that relies on the GDS plugin, reducing time by a ratio of 1500!
Background and context
Applying graph theory to assess the security weaknesses of Microsoft Active Directory is highly effective. The concept has been implemented in a powerful tool called Bloodhound.
However, with large datasets, verifying specific control paths can become computationally impractical. As a result, certain areas may remain unexplored, unless you're able to wait hours or even days for a cypher query to eventually finish executing.
Indeed, you may have found yourself in a situation where you wanted to run complex paths (e.g., check for Shadow Credentials paths from/to all eligible objects of and AD forest) but ultimately had to abandon the effort because the computation was taking too long.
A journey into improving time execution of cypher queries
Motivation: reducing execution time AD Miner
AD Miner is a wonderful tool that crunches data from a Bloodhound Graph Database to give you a global overview of existing weaknesses through a web-based static report, including detailed listing, dynamic graphs, key indicators history, along with risk ratings.
Per design, AD-Miner runs a lot of cypher queries (150+ as of current version) before it starts processing retrieved data to finally produce the HTML-based report. Not all 150+ queries take a lot?of time but some may require hours before completing. All in all, generating a report for a large Active Directory could take very long hours or even days, which was obviously not practical.
Root cause of the problem: how Neo4j is employing CPU resources
When executing a Cypher query, Neo4j uses only a single CPU core, even when multiple cores are available, thus underutilizing the system's full potential.
Introducing poor man's multiprocessing
To overcome this design, the AD Miner team came up with a quick and simple (yet effective) solution: breaking down complex queries into a series of smaller, more manageable batches that can be executed concurrently. This approach maximizes CPU core utilization and improves overall execution time.
To illustrate this concept, let's consider the following scenario: you wish to compute the paths from all users (e.g., 30 000 users) to your Domain Admin group.
So, in graph theory, this means 30 000 source nodes towards 1 node, across millions of relationship.
Instead of computing from all 30 000 source nodes at once, we could?:
This implementation offers a way of utilizing all cores of the CPU, thus dividing the execution by the number of cores (minus some obvious overhead).
While busy doing the implementation, we decided to push the concept even further by adding the capability to use of a Graph Database cluster.
领英推荐
As a result, the time needed to execute heavy queries has reduced dramatically. For instance, a query that used to run for 10 hours would complete in just 20 minutes on a 32-core CPU (and even less if executed on a cluster of computers).
The endless need for more computation capabilities
Hitting the wall again
As we could then handle very heavy computations in a timely manner, multiprocessing opened the way to featuring AD Miner with even more complex/CPU-intensive queries that had earlier been abandoned due to their impracticality.
All this proved to be really nice until we got too ambitious and hit the wall because, even with multiprocessing in place, some new scenarios were, once again, pretty much impossible to handle.
Here comes GDS and its unexplainable performance boost
In February 2024, I published an article showcasing the implementation of Smartest Path using Neo4j's Graph Data Science (GDS) plugin. The general idea was to apply weights to relationships in order to influence the computation of paths based on operational cost of exploitation (and not on the number of hops, which is a rather absurd technique).
Switching to GDS proved to be a bit tricky, notably because :
As such, a significant work has been done to implement this in AD Miner and to overcome the above-mentioned specificities.
The need to create graph projections for each query meant that, with multiprocessing, we would need to create individual projections for each thread which we did not want to add to the burden while doing the experimentation. Being unable to set a limit to path depth, we thought that it would add to the complexity (and computation time).
And we were wrong ! To our great surprise, path computation with GDS - even without multiprocessing - turned out to be even more efficient and by a long shot.
As an example, The following cypher query seeks enabled user or group objects that can perform Shadow Credentials towards any other user (note that, in this example, the MemberOf edge is missing to lower the cost of computation and the depth is limited to 3 hops for the same reason).
CALL {
MATCH (s:User{enabled:true})
RETURN s
UNION ALL
MATCH (s:Group)
RETURN s
}
WITH s
MATCH p=shortestPath((s)-[r:AddKeyCredentialLink|WriteProperty|GenericAll|GenericWrite|Owns|WriteDacl*1..3]->(t:User{enabled:true}))
WHERE s <> t
RETURN p
With the built-in shortestPath of Neo4j on a 30K users forest, this cypher takes 50 hours (i.e., over 2 days !) to complete.
With multiprocessing in place on a 32-cores CPU, the computation comes down to 2 hours, which is already a big step up.
Using GDS, to perform the same query, a graph projection must first be created.
CALL gds.graph.project.cypher(
'graph_users_shadow_credentials_to_non_admins',
'MATCH (n) RETURN id(n) AS id',
'MATCH (n)-[r:AddKeyCredentialLink|Write│Property|GenericAll|GenericWrite|Owns|WriteDacl]->(m)
RETURN id(m) as source, id(n) AS target, r.cost as cost', {validateRelationships: false})
Once that is done, the initial query (i.e., for Shadow Credentials) would translate to this:
MATCH (target:User{enabled:true})
CALL gds.allShortestPaths.dijkstra.stream('graph_users_shadow_credentials_to_non_admins', {sourceNode: target, relationshipWeightProperty: 'cost', logProgress: false})
YIELD path
WITH nodes(path)[-1] AS starting_node, path
WHERE starting_node <> target
AND (
(starting_node:User AND starting_node.enabled)
OR
starting_node:Group
)
RETURN path as p
While the previous query (pure Neo4j) was limited to 3 hops, GDS does not offer limitation capabilities so it will continue until all possibilities have been exhausted, meaning that we are computing something even more complicated. However, to our surprise, the query returned after only 2 minutes ! (returning the same results as the pure neo4j queries that look and endless time).
Conclusion
While the multi-processing workaround first came as a huge improvement, it actually turns out that using GDS brings a most incredible speed boost.
Using GDS straight from Bloodhound CE web interface is fairly complicated and almost unusable notably due to the loss of relationships types information. However, AD Miner has your back as we are in the process of transitioning existing cypher queries (and new ones) from built-in Neo4j to GDS.
So, if you are familiar with AD Miner, you should see spectacular speed improvements in the next version to be released in October 2024.
Co-Founder & CEO @ Saporo
4 个月Or you can just avoid neo4j ?? thanks for sharing
Offsec Team Leader - Login Sécurité associate
5 个月Looks amazing Well done, and thanks for AD_miner ??
Ingénieur en test d'intrusion
5 个月Impressive engineering ????
Offensive Security @Vorwerk / @Cerberus-Security
5 个月Now that is sexy and sounds promissing.