登录查看更多内容

Active Directory Security Assessment: Exploring Unchartered Territories in Bloodhound With Graph Data Science (GDS)

Jean-Michel BESNARD

Partner - Cybersecurity Audit & Advisory at Grant Thornton France Creator of AD Miner (Active Directory Attack Paths Management)

发布日期: 2024年10月21日

TL;DR

Calculating complex attack paths on large datasets can sometimes be impractical with Neo4j due to the significant CPU power and time required. This article introduces a journey into optimizations and a final technique that relies on the GDS plugin, reducing time by a ratio of 1500!

Background and context

Applying graph theory to assess the security weaknesses of Microsoft Active Directory is highly effective. The concept has been implemented in a powerful tool called Bloodhound.

However, with large datasets, verifying specific control paths can become computationally impractical. As a result, certain areas may remain unexplored, unless you're able to wait hours or even days for a cypher query to eventually finish executing.

Desperately waiting for that query to complete

Indeed, you may have found yourself in a situation where you wanted to run complex paths (e.g., check for Shadow Credentials paths from/to all eligible objects of and AD forest) but ultimately had to abandon the effort because the computation was taking too long.

A journey into improving time execution of cypher queries

Motivation: reducing execution time AD Miner

AD Miner is a wonderful tool that crunches data from a Bloodhound Graph Database to give you a global overview of existing weaknesses through a web-based static report, including detailed listing, dynamic graphs, key indicators history, along with risk ratings.

Per design, AD-Miner runs a lot of cypher queries (150+ as of current version) before it starts processing retrieved data to finally produce the HTML-based report. Not all 150+ queries take a lot?of time but some may require hours before completing. All in all, generating a report for a large Active Directory could take very long hours or even days, which was obviously not practical.

Root cause of the problem: how Neo4j is employing CPU resources

When executing a Cypher query, Neo4j uses only a single CPU core, even when multiple cores are available, thus underutilizing the system's full potential.

Introducing poor man's multiprocessing

To overcome this design, the AD Miner team came up with a quick and simple (yet effective) solution: breaking down complex queries into a series of smaller, more manageable batches that can be executed concurrently. This approach maximizes CPU core utilization and improves overall execution time.

To illustrate this concept, let's consider the following scenario: you wish to compute the paths from all users (e.g., 30 000 users) to your Domain Admin group.

So, in graph theory, this means 30 000 source nodes towards 1 node, across millions of relationship.

Instead of computing from all 30 000 source nodes at once, we could?:

Break that cypher query into smaller ones (e.g., 1 000 queries, each handling 30 source nodes)
Run a series (one client for each CPU core) of smaller cypher query concurrently? until all have been exhausted

This implementation offers a way of utilizing all cores of the CPU, thus dividing the execution by the number of cores (minus some obvious overhead).

While busy doing the implementation, we decided to push the concept even further by adding the capability to use of a Graph Database cluster.

领英推荐

October Data Digest: Spooky Stats

Lori MacVittie 4 个月前

Elegantly designed Agentic AI seals the end of the…

Graham Mabon 1 个月前

Security, Call Centers, Realtime Analytics, Humor, Big…

John J. McLaughlin 1 年前

As a result, the time needed to execute heavy queries has reduced dramatically. For instance, a query that used to run for 10 hours would complete in just 20 minutes on a 32-core CPU (and even less if executed on a cluster of computers).

The endless need for more computation capabilities

Hitting the wall again

As we could then handle very heavy computations in a timely manner, multiprocessing opened the way to featuring AD Miner with even more complex/CPU-intensive queries that had earlier been abandoned due to their impracticality.

All this proved to be really nice until we got too ambitious and hit the wall because, even with multiprocessing in place, some new scenarios were, once again, pretty much impossible to handle.

Here comes GDS and its unexplainable performance boost

In February 2024, I published an article showcasing the implementation of Smartest Path using Neo4j's Graph Data Science (GDS) plugin. The general idea was to apply weights to relationships in order to influence the computation of paths based on operational cost of exploitation (and not on the number of hops, which is a rather absurd technique).

Switching to GDS proved to be a bit tricky, notably because :

You need to create a graph projection before you can run a query
The syntax is slightly different and not very forthcoming
You can not define a path depth limit (at least with Dijkstra)
Type of relationships are not preserved when paths are returned (i.e., you know that there is a relationship but you can't tell of which type)
Paths are computed from target to source and later need to be reversed

As such, a significant work has been done to implement this in AD Miner and to overcome the above-mentioned specificities.

The need to create graph projections for each query meant that, with multiprocessing, we would need to create individual projections for each thread which we did not want to add to the burden while doing the experimentation. Being unable to set a limit to path depth, we thought that it would add to the complexity (and computation time).

And we were wrong ! To our great surprise, path computation with GDS - even without multiprocessing - turned out to be even more efficient and by a long shot.

As an example, The following cypher query seeks enabled user or group objects that can perform Shadow Credentials towards any other user (note that, in this example, the MemberOf edge is missing to lower the cost of computation and the depth is limited to 3 hops for the same reason).

CALL {
          MATCH (s:User{enabled:true}) 
          RETURN s 
          UNION ALL 
          MATCH (s:Group)
          RETURN s
} 
WITH s 
MATCH p=shortestPath((s)-[r:AddKeyCredentialLink|WriteProperty|GenericAll|GenericWrite|Owns|WriteDacl*1..3]->(t:User{enabled:true})) 
WHERE s <> t 
RETURN p

With the built-in shortestPath of Neo4j on a 30K users forest, this cypher takes 50 hours (i.e., over 2 days !) to complete.

With multiprocessing in place on a 32-cores CPU, the computation comes down to 2 hours, which is already a big step up.

Using GDS, to perform the same query, a graph projection must first be created.

CALL gds.graph.project.cypher(
'graph_users_shadow_credentials_to_non_admins',
'MATCH (n) RETURN id(n) AS id', 
'MATCH (n)-[r:AddKeyCredentialLink|Write│Property|GenericAll|GenericWrite|Owns|WriteDacl]->(m) 
RETURN id(m) as source, id(n) AS target, r.cost as cost', {validateRelationships: false})

Once that is done, the initial query (i.e., for Shadow Credentials) would translate to this:

MATCH (target:User{enabled:true}) 
CALL gds.allShortestPaths.dijkstra.stream('graph_users_shadow_credentials_to_non_admins', {sourceNode: target, relationshipWeightProperty: 'cost', logProgress: false}) 
YIELD path 
WITH nodes(path)[-1] AS starting_node, path 
WHERE starting_node <> target 
AND (
  (starting_node:User AND starting_node.enabled)
  OR 
  starting_node:Group
) 
RETURN path as p

While the previous query (pure Neo4j) was limited to 3 hops, GDS does not offer limitation capabilities so it will continue until all possibilities have been exhausted, meaning that we are computing something even more complicated. However, to our surprise, the query returned after only 2 minutes ! (returning the same results as the pure neo4j queries that look and endless time).

Conclusion

While the multi-processing workaround first came as a huge improvement, it actually turns out that using GDS brings a most incredible speed boost.

Using GDS straight from Bloodhound CE web interface is fairly complicated and almost unusable notably due to the loss of relationships types information. However, AD Miner has your back as we are in the process of transitioning existing cypher queries (and new ones) from built-in Neo4j to GDS.

So, if you are familiar with AD Miner, you should see spectacular speed improvements in the next version to be released in October 2024.

Olivier Eyries

Co-Founder & CEO @ Saporo

4 个月

Or you can just avoid neo4j ?? thanks for sharing

Romain Bentz (pixis)

Offsec Team Leader - Login Sécurité associate

5 个月

Looks amazing Well done, and thanks for AD_miner ??

1 次回应

Alexandre Z.

Ingénieur en test d'intrusion

5 个月

Impressive engineering ????

1 次回应

Daniel Scheidt

Offensive Security @Vorwerk / @Cerberus-Security

5 个月

Now that is sexy and sounds promissing.

1 次回应

查看更多评论

要查看或添加评论，请登录

Jean-Michel BESNARD的更多文章

Graph theory to assess Active Directory : Smartest vs. Shortest Control Paths

2024年2月15日

Graph theory to assess Active Directory : Smartest vs. Shortest Control Paths

This article introduces a more efficient way of leveraging graph theory within Bloodhound, to reveal control paths that…

4 条评论

Active Directory Security Assessment: Exploring Unchartered Territories in Bloodhound With Graph Data Science (GDS)

Jean-Michel BESNARD

Partner - Cybersecurity Audit & Advisory at Grant Thornton France Creator of AD Miner (Active Directory Attack Paths Management)

Background and context

A journey into improving time execution of cypher queries

Motivation: reducing execution time AD Miner

Root cause of the problem: how Neo4j is employing CPU resources

Introducing poor man's multiprocessing

领英推荐

The endless need for more computation capabilities

Hitting the wall again

Here comes GDS and its unexplainable performance boost

Conclusion

Jean-Michel BESNARD的更多文章

社区洞察

其他会员也浏览了

Most Popular Articles in Vol 309 Issue 2, Posted Week of Dec. 18th

Most Popular Articles Posted Week of August 15th (Vol 293 Issue 2)

Cloud-native Security for Generative AI Solutions : AWS, Azure, GCP

Rethinking Data Storage with IBM Flash System Family. FlashSystem 5000

Hashing, Salts and Pepper

Dedrive from EDNS Domains is set to disrupt data storage

How to Fix a RAID Puncture? The Ultimate Guide for 2024

What happens when you click the like button on YouTube?

Computing on encrypted data for better security - Part 2

Big Data Security Market to Witness Stunning Growth | Imperva, Delinea, Cloudera

Background and context

A journey into improving time execution of cypher queries

Motivation: reducing execution time AD Miner

Root cause of the problem: how Neo4j is employing CPU resources

Introducing poor man's multiprocessing

领英推荐

The endless need for more computation capabilities

Hitting the wall again

Here comes GDS and its unexplainable performance boost

Conclusion

Jean-Michel BESNARD的更多文章

Graph theory to assess Active Directory : Smartest vs. Shortest Control Paths

社区洞察

其他会员也浏览了

Most Popular Articles in Vol 309 Issue 2, Posted Week of Dec. 18th

Most Popular Articles Posted Week of August 15th (Vol 293 Issue 2)

Cloud-native Security for Generative AI Solutions : AWS, Azure, GCP

Rethinking Data Storage with IBM Flash System Family. FlashSystem 5000

Hashing, Salts and Pepper

Dedrive from EDNS Domains is set to disrupt data storage

How to Fix a RAID Puncture? The Ultimate Guide for 2024

What happens when you click the like button on YouTube?

Computing on encrypted data for better security - Part 2

Big Data Security Market to Witness Stunning Growth | Imperva, Delinea, Cloudera