Cyber AI: Focus on the wolf, not the boy who cried It!
Dirk Hodgson
Cybersecurity | Data & AI | Business & Technology Transformation | Leadership
Children’s stories of old were often designed to share important lessons with future generations. Every now and again, one wonders if their writers could have possibly predicted just how profound their fable might become may hundreds of years into the future! This is the case with ‘the boy who cried wolf’ and how it relates to the nascent industry forming around Cyber-AI.
The story goes like this: there once was a young shepherd boy who was bored on the hillside watching the village sheep. To amuse himself he cried, "Wolf! Wolf! The wolf is chasing the sheep!" The villagers ran up the hill. But when they arrived, they found no wolf. The boy laughed at their angry faces. "Don't cry 'wolf’ when there's no wolf!" said the villagers. They went grumbling back down the hill… Later, the boy saw a REAL wolf, leapt to his feet, and sang out as loudly as he could, "Wolf! Wolf!" But the villagers thought he was trying to fool them again and didn't come to help.
Artificial Intelligence (AI), and, specifically, the discipline within it termed Machine Learning (ML), is changing the world… and not just in cybersecurity. For example, the New York Times described one natural language processing algorithm’s (GPT-3 – the solution underpinning ChatGPT) capabilities as “being able to write original prose with fluency equivalent to that of a human”… that’s high praise from an organisation that prides itself on having the best humans writing the best prose each and every day!
There is even talk of AI solutions replacing referees in sport – imagine if, as a spectator at your favourite game of football, you could no longer ‘blame the ref’ for your team not winning because the refereeing algorithm made such errors impossible!
Sporting jokes aside, Artificial intelligence (AI) burst into cyber-action a few years ago, and was accompanied by a loud and sustained 'boom' from marketing teams everywhere. There wasn't a problem it couldn't solve - from detecting advanced threats through to solving the cyber skills crisis. Some folk were worried about skynet-esque scenarios (a sentiment which has absolutely been stoked with the November 2022 release of ChatGPT!), but most were excited by the possibilities.
The rationale for why cyber-AI is so exciting is as follows: attackers can launch millions of attacks with minimal consequences if 99.999% of them fail… indeed, even if 0.001% of 1,000,000 attacks succeed , then that (1000 successful attacks) would be seen as a great day for any ransomware syndicate! Whereas defenders need to defend 100% of attacks successfully or see their organisation suffer potentially serious consequences. As a result, the few humans we have simply won’t be able to do their job properly if they have to look at every single one of the 1,000,000 attacks referenced above as they’ll likely miss the needle in the haystack and let the 0.001% (i.e. 1,000 attacks) through. In this context, cyber-AI has become synonymous with a way to reduce the ‘noise’ security analysts must deal with each day and help them focus on the attacks that really matter to their organisation’s cyber risk posture.
However, even with the recent Chat-GPT led buzz, depending on who you ask, cyber-AI continues to be seen as either all of the above... or, arguably more commonly, the cause of a great deal of unnecessary noise for SOC analysts. Why is this? well, oftentimes, the issue lies more in how the technology is implemented rather than the solution itself. Indeed, just as the mere existence of a firewall doesn't stop an attacker at the perimeter of ICT system (well designed rulesets do that!), merely adding AI to a cyber-solution without considering the threat environment and tuning the algorithm accordingly is a path to failure by way of excessive false positives... or worse, far too many false negatives!
Classifying cyber data to train AI algorithms ?
For Cyber AI to be effective, it must be trained with a lot of data. AI and ML are two different, but closely related disciplines. According to MIT (2021), “when companies today deploy artificial intelligence programs, they are most likely using machine learning — so much so that the terms are often used interchangeably, and sometimes ambiguously. Machine learning is a subfield of artificial intelligence that gives computers the ability to learn without explicitly being programmed”. Put simply, ML is but one way of achieving AI, yet it is the dominant one in use in cyber today.
Whilst there are subdisciplines of ML that all operate slightly differently, in a broad sense, the way that ML works is as follows (broad being the operative word – textbooks have been written on this very topic, hence the below is perhaps over-simplified for the sake of brevity):
The key to cyber-AI success is good fit – an algorithm must accurately predict an outcome, and the only way this happens is if it sees enough definitive past data to reflect the way the future is likely to look. In practice, the story of the boy who cried wolf gives us a good way to analyse this concept. If the question our algorithm asks is “is there a wolf amongst the flock?” the potential answers (courtesy of Google’s manual on model training ) an ML algorithm might return are as follows:
领英推荐
Self-evidently, we want our algorithm to only return the green boxes – true positives and true negatives; if we can, we want to avoid the red boxes. The term 'false positive' describes the situation when the solution incorrectly classifies event data as an attack when it isn't one. But, to be effective, cyber-AI must balance false positives and false negatives: if too many false positives are excluded (i.e. we ignore the cries of 'wolf'), so too will too many false negatives be included (i.e. we'll potentially miss the actual wolf because the algorithm fails to expose it to the analyst).
The reason for this is that data in cybersecurity is rarely definitive. Unlike some other disciplines where AI has taken off, like marketing, the point at which something can categorically be classified as a cyber incident (i.e. a true positive) isn't always conclusive. Oftentimes, attackers will mask their behaviour in noise and/or by pretending to be conducting legitimate business. Indeed, there is an adage that as soon as you're certain something is an attack, it's too late to stop it. The flipside of this is that, typically, the earlier we surface an alert of a possible incident to an analyst, the more ambiguous the data underpinning that alert can be.
Hence, Cyber-AI algorithms must be able to handle the reality that the answer to the question of 'is this a breach' is often 'maybe' rather than 'yes' or 'no'; and, this 'maybe' is actually really important. Indeed… if we ignore all of the maybes, essentially treating them as if they are the answer of ‘no’, then we may well let a preventable attack through our cyber defences.
Operationalising Cyber-AI to hunt for the wolves in the flock
There is no doubt that the future of cybersecurity includes a very healthy dose of AI - it's awesome and can make a real difference to our war against cybercrime. But, to make this happen, we need to treat AI like every other cyber technology… that is, not see it as a silver bullet, and take a balanced, threat led approach to implementing it into our environments. In practice, this means we must focus on (at least) the following five tenets:
·??????Infuse data and AI skills into your SOC (not just the IT or project team) so that you can keep your algorithms close to your operational cyber knowledge
·??????Use AI where it makes sense, not as a silver bullet. Machine Learning won't fit every use case, so be selective rather than trying to apply it universally. Just because you can, doesn’t mean you should…
·??????Treat AI models like your staff and business processes. Focus on using open models, not 'black box' algorithms that can't be interpreted, analysed, tuned and retrained before, during and after cyber incidents. AI needs to learn and adapt based on both our mistakes and successes!
·??????Focus on the threat, not noise: When training algorithms, the focus should be on getting the basics right first - do your threat modelling; understand what you're trying to defend your organisation against and train your models to maximise precision and recall against these threats, not to minimise noise / false positives. In other words, focus on the wolf, not the boy who cried it! Indeed, you should be concerned if your algorithm is returning no noise at all, as this means its probably missing legitimate attempted cyber-attacks.
·??????Regularly exercise and update your threat models and AI algorithms as a standard part of SOC operations. Just as your staff require regular training, so too do your algorithms – focusing on either to the exclusion of the other is suboptimal.
AI has already changed the cyber industry (for the better); and the potential for it to do so even more into the future is almost limitless; by focusing on the right areas (the threat; the proverbial wolf on the hillside) we can build cyber-defender trust and accelerate our cyber-AI fuelled transformation to machine-speed! The poor old wolf doesn’t stand a chance…
This is a fascinating topic! The convergence of AI and cybersecurity is definitely a game changer. It’s interesting to think about how innovative solutions will emerge to tackle the evolving threats. What specific advancements do you see having the most impact in the near future?
NTT Data Inc.
1 年Very insightful Dirk Hodgson - so many clients talking about AI and ChatGPT in particular - super excited (& nervous) to see the true possibilities
Executive Consultant
1 年Very insightful reading Dirk. Thanks for sharing. Hope this reaches a broad target audience as they too will be blown away.
Senior Client Manager: Strategic Partnerships at NTT Ltd.
1 年Very insightful as always Mr Dirk Hodgson!