A new detection model for Azure Sentinel
https://d3rv1nmzvje89q.cloudfront.net/optimized_v5/2017/01/fetured-1024x683.jpg

A new detection model for Azure Sentinel

Picking up where we left off in part 1, we know that time series decomposition is not entirely suited for detecting cyberattacks from the Azure Activity logs produced by the plentiful SPNs operating in our subscriptions. Let's figure out what is their limit and how we could get around them in Azure Sentinel.

Current limitations

In a context of suspicious operations detection, I think the three main grievances one might have against anomalies decomposition are:

  1. Non-distributivity. As we discovered previously, anomalies(op1+op2) != anomalies(op1) + anomalies(op2). Likewise, anomalies(spn1+spn2) != anomalies(spn1) + anomalies(spn2). To perform detection at scale, with so many ops and SPNs to manage, it would be much desirable that anomaly detection be at least roughly distributive.
  2. No learning capability. An anomaly which triggers once will always trigger, even if it is a false positive (or if it's a benign true positive). This approach is not sustainable in a context of automated devSecOps.
  3. No time-orientation. If analyzing things in the right order might not be crucial for failure prediction and health monitoring, it is of key importance for cybersecurity: patching an image before publishing it in a registry is better than publishing before and patching after. Time-orientation eliminates many false positives (but it could also ignore some true positives).We could take time-orientation for granted because one can't imagine anything more chronological than time-series. But in fact, the process of decomposition destroys chronology: the only component that retains a flavor of time-orientation is the seasonality. Unfortunately as we have seen previously, even automated tasks -when complex, can be unseasonal.

In our search for a successful replacement of time-series, we must thrive to get those three properties: distributivity, memorization and chronology.

But above all, we must find the right balance between perfect and functional anomaly detection. This is really important if we want to go anywhere. In support for this argument, let me quote Mahmoud ElAssir, VP of Customer experience at Google Cloud:

Complexity needs to be managed because it’s too complex to solve. What you want to do is manage complexity with better measurements, better prediction, and better accountability

Achieving better detection with Markov models

I propose to follow a classical approach in anomaly detection: evaluate the ebb and flow of SPNs activity against a first-order hidden Markov model.

Such models are made of two parts: a "hidden state", and "observable outcomes". Here, the hidden state (also named the 'emission matrix') holds all acceptable transitions between two subsequent operations of the {OperationNameValue} set. It is a square matrix of rank c, where c is the cardinality of {OperationNameValue}.

Observable outcomes are long sequences of legitimate operations taken from Azure Activity logs.

https://www.researchgate.net/figure/State-transition-diagram-of-a-hidden-Markov-model_fig1_245563174

The construction of the emission matrix is straightforward: each time operation A is followed by operation B in a given time-series, we increase a counter at coordinates (A,B). So this counter simply tracks the number of A->B transitions in the series. When we have ingested the whole data set, we normalize row A so that each row cell represents a probability and that the row sums up to 1.0.

Optimizations

To keep the matrix rank small, we may hash operation names with a modulus (at the expense of precision). Kusto built-in hash(object,modulus) is good for that, beware the algorithm is subjected to change by Microsoft without notice.

To make the process less CPU intensive, we may replace the emission matrix with a simpler object without loss of precision: a logical matrix. That's not a problem because we do not want to know the likelihood of a given transition between two ops, we just want to know whether the transition is legitimate (probability > 0.0) or not (probability == 0.0)

In the logical matrix, the "ones" represent legitimate transitions, and the "zeroes" represent unexpected transitions. Hitting a zero during a routine evaluation is like setting off a canary or detonating a url: we have found an anomaly which needs to be investigated.

Model assessment

Distributivity

Distributivity should be "good enough" if we take care to group SPNs into families with similar semantics so as to reduce:

a) false positives caused by artefacts[*]

b) false positives in the symmetric difference[**]

This grouping is very business-dependent; it's not guaranteed to scale well with the number of SPNs, but if it does it's not difficult to identify and to set up.

Without grouping we have:

markov(spn1 OR spn2) = markov(spn1) AND markov(spn2) OR artefacts(spn1,spn2) OR delta(spn1,spn2)

With proper grouping, we hope to have spn1 ~= spn2, hence: markov(spn1 OR spn2) ~= markov(spn1) OR markov(spn2).

Memorization and chronology

The learning ability is straightforward: checking a false positive and forgetting about it in future evaluations just means OR-ing the false positive with the existing matrix.

Time-orientation is ensured by design: the highest the order of the model, the more time-oriented it will be. In practice however, memory constraints limit us to orders 1 and 2.

Conclusion

A simplified markov model looks like a good substitute for anomalies decomposition when tackling the seemingly intractable problem of outling Azure activities for a given SPN:

  • on one hand, three properties work in sympathy to limit false positives drastically: this is an important criteria for performing sustainable devSecOps.
  • on the other hand, record-keeping transitions offers assurance that most true positives won't be missed. This is an equally important criteria, this time for cyberdefense.

The main current grey area is whether the model scales as the number of SPNs grows. If not, its use could be limited to business-critical SPNs.

In part 3, I will describe a case study to comfort the conclusions we've had of far, and how we can stitch this together with the native and superb Azure Sentinel incidents management workflow.

In part 4, I will describe a pen-testing tool (yes! you read me...) I use to probe this model against frauds.

Finally, let me quote the second part of Mahmoud ElAssir's point on complexity:

What you want to do is manage complexity with better measurements, better prediction, and better accountability. In other words, better data management and analytics.

Notes

[*]: artefacts are caused by artificial transitions across two SPNs: an operation triggered by SPN1 is followed incidentally by an operation triggered by SPN2.

[**]: the more two SPNs are similar, the smaller the symmetric difference of their logical matrix.

Well written. We are adding sequence mining capabilities to Kusto, I think it might be relevant for detecting these types of security anomalies, as you could identify non-legitimate sequences.

回复
Guillaume EHINGER

Empowering companies to thrive in hostile environments

4 年

Very clear paper that can also easily be transported to other technologies! Thanks!

Sylvain Cortes

VP of Strategy @ Hackuity ?? Speaker ?? Follow me on Linkedin to be updated on ?????????????????????????? and ?????? news ??

4 年

Un des meilleurs blog post sur Sentinel qu'il m'ait été donné de lire. Bon boulot, bonne réflexion, j'ai hate de lire la suite.

Fotis M.

CyberSec practitioner

4 年

Thanks, Christophe! Sentinel POC is already planned for us ...

Christophe Parisel

Senior Cloud security architect at Société Générale

4 年

Thanks to David Knott for reporting the wonderful quote I use in this article, as well as for him being an incredible source of inspiration as a thinker and enterprise architect. Do follow him on linkedin!

要查看或添加评论,请登录

Christophe Parisel的更多文章

  • Adversarial lateral motion in Azure PaaS: are we prepared?

    Adversarial lateral motion in Azure PaaS: are we prepared?

    Lateral motion techniques are evolving in PaaS, and we should be worried. Let's discuss a risk confinement approach.

    18 条评论
  • How will Microsoft Majorana quantum chip ??compute??, exactly?

    How will Microsoft Majorana quantum chip ??compute??, exactly?

    During the 2020 COVID lockdown, I investigated braid theory in the hope it would help me on some research I was…

    16 条评论
  • Zero-shot attack against multimodal AI (Part 2)

    Zero-shot attack against multimodal AI (Part 2)

    In part 1, I showcased how AI applications could be affected by a new kind of AI-driven attack: Mystic Square. In the…

    6 条评论
  • Zero-shot attack against multimodal AI (Part 1)

    Zero-shot attack against multimodal AI (Part 1)

    The arrow is on fire, ready to strike its target from two miles away..

    11 条评论
  • 2015-2025: a decade of preventive Cloud security!

    2015-2025: a decade of preventive Cloud security!

    Since its birth in 2015, preventive Cloud security has proven a formidable achievement. By raising the security bar of…

    11 条评论
  • Exploiting Azure AI DocIntel for ID spoofing

    Exploiting Azure AI DocIntel for ID spoofing

    Sensitive transactions execution often requires to show proofs of ID and proofs of ownership: this requirements is…

    10 条评论
  • How I trained an AI model for nefarious purposes!

    How I trained an AI model for nefarious purposes!

    The previous episode prepared ground for today’s task: we walked through the foundations of AI curiosity. As we've…

    19 条评论
  • AI curiosity

    AI curiosity

    The incuriosity of genAI is an understatement. When chatGPT became popular in early 2023, it was even more striking…

    3 条评论
  • The nested cloud

    The nested cloud

    Now is the perfect time to approach Cloud security through the interplay between data planes and control planes—a…

    8 条评论
  • Overcoming the security challenge of Text-To-Action

    Overcoming the security challenge of Text-To-Action

    LLM's Text-To-Action (T2A) is one of the most anticipated features of 2025: it is expected to unleash a new cycle of…

    19 条评论

社区洞察

其他会员也浏览了