登录查看更多内容

Anomalous Azure SPNs activities triage with ML clustering (Azure Sentinel Part 4)

Christophe Parisel

Senior Cloud security architect at Société Générale

发布日期: 2021年12月13日

+ 关注

Finally... After one year, here is the long awaited sequel of my previous articles on Azure Sentinel.

Part 1 demonstrates the current limitations?of Azure Kusto and Azure Sentinel time series decomposition for detecting statistical anomalies in SPN Azure Activity logs;
Part 2 proposes a new detection model?based on a Hidden-Markov Model (HMM) operating on sequences, ie. transitions between SPN activities.
Part 3 introduces CRISPR-Cas9, a threat injector able to replay and/or forge malicious sequences that one may try against the model.

Today's article will provide a case study that sheds lights on how to track anomalies at scale and in an automated way, once your HMM has been trained an is operational.

Farewell, periodicity!

Even though period analysis on raw activity logs do not provide useful information, one might be tempted to wonder if it fares any better on Markov sequences. Kusto only provides series decomposition on timestamps, not arbitrary sequences, so we need to plug-in our own custom Discrete Fourier Transforms (DFTs) to the output of the HMM residual anomalies.

Unfortunately, the few real-life samples I have fed to a DFT do not seem to provide a spectral measurement which is accurate enough to be exploited without human intervention... Maybe it's me, but I couldn't get it right.

I had to give up exploring this lane and had to find another route. Fortunately, my latest foray in Quantum Computing territory (see my recent articles on the matter) prompted me to study random walks and finite fields. Eventually this is what led me to the very promising solution I'm just going to explain :-)

Correlating SPN sequences

Unlike Azure Active Directory users, Application and service principals are very well groomed animals. So there has to be a way to nail down their deterministic behavior, don't you agree?

And you're right. To that end, we are going to get help from an old dependable friend: the "mark 1 eyeball"...

Armed with the precious device, take a look at residual anomalous sequences in this Application Insights workbook (look at the bottom part of the graph):

Each particular SPN sequence is depicted as a broken line with a distinct color. You see that several sequences behave in a similar fashion: their vertical scaling is different (and, in fact, the Y-axis is a logarithmic scale), but their envelopes are the same. It means that this bunch of SPNs are correlated: the sequences fire in harmony to perform a composite, multi-staged Azure operation.

Envelopes and multi-dimensions

So here we are with this new object: the sequence envelope. How do we compare two envelopes to check if they correlate?

We can imagine plenty of ways, here is my line of thoughts: to me and my Mark 1 eyeball, an envelope looks like a random walk in one-direction: up or down. I can describe an envelope as a list of just three tokens: '1' for upward, '-1' for downward, and '0' for steady.

Under this convention, [-1,1,1,-1,-1,1] is a zigzag, whereas [0,0,1,-1,0,0,0,0,0,0] is a single bump and [0,10,0,0,0,-1,0] is a plateau with ridges at both ends.

Now close your eyeball and remember your math at school, when you learnt about vectors... What is our zigzag, if not a 6-dimension vector? Or our bump, if not a 10-dimension vector?

More specifically, envelopes of length n are vectors over the finite field GF(3)^n.

Clustering and machine learning

The more dimensions, the merrier! Why? Because high-dimensional spaces are extremely empty.

To give you an idea, take the volume of a sphere in dimensions 1 to 25:

After an initial sharp increase, it quickly vanishes into nothingness... The volume of a unit-sphere of dimension 3 is a little more than 4, but it's about 1 in dimension 13.

领英推荐

SwiftKV: Accelerating Enterprise LLM Workloads with…

Snowflake 2 个月前

Brain Scans with promptObject API, AWS re:Invent and…

MinIO 1 个月前

Milvus 2.4 is here, Latest RAG articles, Zilliz Cloud…

Zilliz 11 个月前

I don't know for you, but

I would rather eat an apple in dimension 3 than two in dimension 13!

So if something unusual pops-up in such vast expanses of void, it's very easy to spot. Not easy for us, of course (we struggle to visualize things in a mere 3D space... ), I mean for the machine.

What's more, in high-dimensional spaces, if two or more things pop up close to one another, there's next to no chances it's an accident.

Then all we have to do is pour our n-dimension vectors into a Machine Learning clustering algorithm. The algorithm I'll pick for the demonstration is called Kmeans.

Demonstration time!

Let's start with this anomalous sequences sample:

We see that the dark blue and red lines roughly follow the same winding pattern.

The time span we consider here is two days, or 48 hours. We have used data grouped by bins of 1 hour to construct the sequences, it means that our envelopes each contains 48 dimensions.

Here is how they get converted into two GF(3)^48 vectors by a simple normalization script:

Now that the vertical scaling effect has gone, we can easily tell the similarity. Here there is indeed only a couple of differences located in dimensions 19 and 20; the one in dimension 20 is highlighted in green above: it reads '-1' in the top vector, and '0' in the bottom one.

Fine, but that's not us who we want to wake up at night for investigations... That's the machine! And not for only this particular example, but for all anomalies.

So we feed not only those two vectors, but all the concurrent anomalous vectors (there are 23 of them sharing this same time range) into Kmeans, a ML multi-dimension clustering algorithm available in the mighty Python scikit-learn module or as part of Azure Machine Learning.

Kmeans needs to have an idea of the number of clusters it needs to work on. Here we have the choice, for the demo I have chosen 10 clusters labelled C0 to C9.

After only a couple of seconds, here is the result of KMeans(n_clusters=10, init='k-means++', max_iter=300, n_init=10, random_state=0):

Conclusion

I wonder if you will be as impressed as I am with the outcome of this single, unoptimized run of Kmeans:

our two correlated sequences have been put into cluster C1, and this cluster only contains the two of them!
many other SPN anomalies are also efficiently correlated (look at clusters C0, C2 or C3)
eventually, only 8 out of 10 clusters have been used: C0 to C7.

Maybe there will be no need to wake up at night using soaring Mk1 eyeballs... Hopefully Kmeans could do the job? :)

What remains to be done is trigger a low severity alert into Sentinel for out-of-band investigation. (Had Kmeans found some more hectic patterns, we would have fired a higher severity alert)

Note: the results I'm sharing in this article are in an early phase of exploration. The solution needs much more analysis, refining and real-life testing to be considered a good candidate for production environments.

Jeroen Vandeleur

Cyber Security Expert

3 年

Great blogpost series ! Well doen Christophe Parisel

1 次回应

Christophe Humbert

Wizard in Chief @cloudswizards.com | IT Security, Infrastructure, Architecture

3 年

Mind blowing (I need some refresh in math:))

1 次回应

Christophe Parisel

Senior Cloud security architect at Société Générale

3 年

FYI Adi Eldar Rod Trent Younes Khaldi Donald Lutz and David Knickerbocker

3 次回应

Akash Kumar 阿卡什·库马尔

Cross Solutions, Multi-Cloud Tech Thought Leader, Advisor to Industry Leaders

3 年

This is awesome write up and approach

1 次回应

查看更多评论

要查看或添加评论，请登录

Christophe Parisel的更多文章

How will Microsoft Majorana quantum chip ??compute??, exactly?

2025年2月27日

How will Microsoft Majorana quantum chip ??compute??, exactly?

During the 2020 COVID lockdown, I investigated braid theory in the hope it would help me on some research I was…

14 条评论
Zero-shot attack against multimodal AI (Part 2)

2025年2月3日

Zero-shot attack against multimodal AI (Part 2)

In part 1, I showcased how AI applications could be affected by a new kind of AI-driven attack: Mystic Square. In the…

6 条评论
Zero-shot attack against multimodal AI (Part 1)

2025年1月20日

Zero-shot attack against multimodal AI (Part 1)

The arrow is on fire, ready to strike its target from two miles away..

11 条评论
2015-2025: a decade of preventive Cloud security!

2025年1月6日

2015-2025: a decade of preventive Cloud security!

Since its birth in 2015, preventive Cloud security has proven a formidable achievement. By raising the security bar of…

11 条评论
Exploiting Azure AI DocIntel for ID spoofing

2024年12月16日

Exploiting Azure AI DocIntel for ID spoofing

Sensitive transactions execution often requires to show proofs of ID and proofs of ownership: this requirements is…

10 条评论
How I trained an AI model for nefarious purposes!

2024年12月9日

How I trained an AI model for nefarious purposes!

The previous episode prepared ground for today’s task: we walked through the foundations of AI curiosity. As we've…

19 条评论
AI curiosity

2024年11月26日

AI curiosity

The incuriosity of genAI is an understatement. When chatGPT became popular in early 2023, it was even more striking…

3 条评论
The nested cloud

2024年11月13日

The nested cloud

Now is the perfect time to approach Cloud security through the interplay between data planes and control planes—a…

8 条评论
Overcoming the security challenge of Text-To-Action

2024年10月15日

Overcoming the security challenge of Text-To-Action

LLM's Text-To-Action (T2A) is one of the most anticipated features of 2025: it is expected to unleash a new cycle of…

19 条评论
Cloud drift management for Cyber

2024年9月23日

Cloud drift management for Cyber

Optimize your drift management strategy by tracking the Human-to-Scenario (H/S) ratio: the number of dedicated human…

12 条评论

See all articles

Anomalous Azure SPNs activities triage with ML clustering (Azure Sentinel Part 4)

Christophe Parisel

Senior Cloud security architect at Société Générale

Farewell, periodicity!

Correlating SPN sequences

Envelopes and multi-dimensions

Clustering and machine learning

领英推荐

Demonstration time!

Conclusion

Christophe Parisel的更多文章

社区洞察

其他会员也浏览了

New Upgrades to Zilliz Cloud, a look into Cardinal Vector Search Engine, Meetup recap, Intel and Milvus collaboration, and Valentine’s Day poems

HPC Hardware and Cloud GPU Hosting for the Development and Operation of Artificial Intelligence: A Conversation with the Founders of AIME GmbH

OpenAI, SoftBank, and Oracle $500 Billion AI Infrastructure Project in the U.S.

CONNECT: Will AI Replace DBAs too?; Google’s Quantum Competition; Prof. Perkins On Thinking in a World of Information and Spin;

Azure OpenAI with Azure API Management

Top Tools and Technologies Used in IT Services

Navigating the Horizon: An In-Depth Exploration of the Global Artificial Intelligence Server Market

NeuralSeek: Unlocking the Power of Answer Caching

Gooxi Joins "Leading with Compute Power, Shaping the Digital Future" AI Series Events

Unleashing the Power of OpenAI with Azure Cognitive Search: Introducing OpenSource Middleware

Farewell, periodicity!

Correlating SPN sequences

Envelopes and multi-dimensions

Clustering and machine learning

领英推荐

Demonstration time!

Conclusion

Christophe Parisel的更多文章

How will Microsoft Majorana quantum chip ??compute??, exactly?

Zero-shot attack against multimodal AI (Part 2)

Zero-shot attack against multimodal AI (Part 1)

2015-2025: a decade of preventive Cloud security!

Exploiting Azure AI DocIntel for ID spoofing

How I trained an AI model for nefarious purposes!

AI curiosity

The nested cloud

Overcoming the security challenge of Text-To-Action

Cloud drift management for Cyber

社区洞察

其他会员也浏览了

New Upgrades to Zilliz Cloud, a look into Cardinal Vector Search Engine, Meetup recap, Intel and Milvus collaboration, and Valentine’s Day poems

HPC Hardware and Cloud GPU Hosting for the Development and Operation of Artificial Intelligence: A Conversation with the Founders of AIME GmbH

OpenAI, SoftBank, and Oracle $500 Billion AI Infrastructure Project in the U.S.

CONNECT: Will AI Replace DBAs too?; Google’s Quantum Competition; Prof. Perkins On Thinking in a World of Information and Spin;

Azure OpenAI with Azure API Management

Top Tools and Technologies Used in IT Services

Navigating the Horizon: An In-Depth Exploration of the Global Artificial Intelligence Server Market

NeuralSeek: Unlocking the Power of Answer Caching

Gooxi Joins "Leading with Compute Power, Shaping the Digital Future" AI Series Events

Unleashing the Power of OpenAI with Azure Cognitive Search: Introducing OpenSource Middleware