Rogue AI: A Five Part Series
Welcome to Trend Micro’s monthly newsletter, The Strategic CISO. Discover the latest and most popular blogs from Research, News, and perspectives, a dedicated space for the latest strategic insights, best practices, and research reports to help security leaders better understand, communicate, and minimize cyber risk across the enterprise.
Our goal is to inform security leaders about best practices, the latest industry insights, and more. Let us know what you would like to see from The Strategic CISO newsletter.
Rogue AI is the Future of Cyber Threats
Understanding Rogue AI
While most of the AI-related cyber threats grabbing headlines today are carried out by fraudsters and organized criminals, Rogue AI is where security experts are focusing their long-term attention.
The term “Rogue AI” refers to artificial intelligence systems that act against the interests of their creators, users, or humanity in general. While present-day attacks like fraud and deepfakes are concerning, they are not the only type of AI threat we should prepare for. They will remain in a cat-and-mouse game of detection and evasion. Rogue AI is a new risk, using resources that are misaligned to one’s goal.
Rogue AI falls into three categories: malicious, accidental, or subverted. Each has different causes and potential outcomes; understanding the distinctions helps mitigate threats from Rogue AI.
Find out more in our first Rogue AI blog, "Rogue AI is the Future of Cyber Threats"
How AI Goes Rogue
Alignment and Misalignment
As AI systems become increasingly intelligent and tasked with more critical functions, inspecting the mechanism to understand why an AI took certain actions becomes impossible due to the volume of data and complexity of operations. The best way to measure alignment, then, is simply to observe the behavior of the AI. Questions to ask when observing include:
Maintaining proper alignment will be a key feature for AI services moving forward. But doing this reliably requires an understanding of how AI becomes misaligned in order to mitigate the risk.
How Misalignment Happens
One of the great challenges of the AI era will be the fact that there is no simple answer to this question. Techniques for understanding how an AI system becomes misaligned will change along with our AI architectures. Right now, prompt injection is a popular exploitation, though sort of command injection is particular to GPT. Model poisoning is another widespread concern, but as we implement new mitigations for this—for example, tying training data to model weights verifiably—risks will arise in other areas. Agentive AI is not fully baked yet, and no best practices have been established in this regard.
What won’t change are the two overarching types of misalignments:
Learn more in our second blog, "How AI Goes Rogue"
Identifying Rogue AI
What’s the problem with agentic AI?
Agentic AI is in many ways a vision of the technology that has guided development and popular imagination over the past few decades. It’s about AI systems that think and do rather than just analyze, summarize and generate. Autonomous agents follow the goals and solve the problems set for them by humans, in natural language or speech. But they’ll work out their own way to get there, and will be capable of adapting unaided to changing circumstances along the way.
Additionally, rather than being based on single LLMs, agentic AI will engage and coordinate multiple agents to do different things in pursuit of a single goal. In fact, the value of agentic AI comes from being part of a larger ecosystem—accessing data from diverse sources such as web searches and SQL queries, and interacting with third-party applications. These will be incredibly complex ecosystems. Even a single agentic AI may rely on multiple models, or agents, various data stores and API-connected services, hardware and software.
As discussed, there are various causes of Rogue AI. But they all stem from the idea that risk increases when an AI uses resources and takes actions misaligned to specific goals, policies and requirements. Agentic AI dials up the risk because of the number of moving parts which may be exposed to Rogue AI weaknesses.
Necessary mitigations: Protect the agentic ecosystem.
To mitigate this risk, the data and tools agentic AI uses must be safe. Take data: subverted Rogue AI risk may stem from poisoned training data. It may also come from malicious prompt injections—data inputs which effectively jailbreak the system. Meanwhile, Accidental Rogue AI might feature the disclosure of non-compliant, erroneous, illegal or offensive information.
When it comes to safe use of tools, even read-only system interaction must be guarded, as the above examples highlight. We must also beware the risk of unrestricted resource consumption—e.g., agentic AI creating problem-solving loops that effectively DoS the entire system, or worse still, acquiring additional compute resources which were neither anticipated nor desired to be used.
Find out more in the third Rogue AI blog, "Identifying Rogue AI"
What the Security Community is Missing
Let's explore community efforts currently underway to assess AI risk. While there’s some great work being done, what they’re missing to date is the idea of linking causality with attack context.
Who’s doing what?
Different parts of the security community have different perspectives on Rogue AI:
OWASP
Rogue AI is related to all of the Top 10 large language model (LLM) risks highlighted by OWASP, except perhaps for LLM10: Model Theft, which signifies “unauthorized access, copying, or exfiltration of proprietary LLM models.” There is also no vulnerability associated with “misalignment”—i.e., when an AI has been compromised or is behaving in an unintended manner.
Misalignment:
Excessive Agency is particularly dangerous. It refers to situations when LLMs “undertake actions leading to unintended consequences,” and stems from excessive functionality, permissions or autonomy. It could be mitigated by ensuring appropriate access to systems, capabilities and use of human-in-the-loop.
MITRE ATLAS
MITRE’s tactics techniques and procedures (TTPs) are a go-to resource for anyone involved in cyber-threat intelligence—helping to standardize analysis of the many steps in the kill chain and enabling researchers to identify specific campaigns. Although ATLAS extends the ATT&CK framework to AI systems, it doesn’t address Rogue AI directly. However, Prompt Injection, Jailbreak and Model Poisoning, which are all ATLAS TTPs, can be used to subvert AI systems and thereby create Rogue AI.
The truth is that these subverted Rogue AI systems are themselves TTPs: agentic systems can carry out any of the ATT&CK tactics and techniques (e.g., Reconnaissance, Resource Development, Initial Access, ML Model Access, Execution) for any Impact. Fortunately, only sophisticated actors can currently subvert AI systems for their specific goals, but the fact that they’re already checking for access to such systems should be concerning.
MIT AI Risk Repository
Finally, there’s MIT’s risk repository, which includes an online database of hundreds of AI risks, as well as a topic map detailing the latest literature on the subject. As an extensible store of community perspective on AI risk, it is a valuable artifact. The collected risks allow more comprehensive analysis. Importantly, it introduces the topic of causality, referring to three main dimensions:
Intent is particularly useful in understanding Rogue AI, although it’s only covered elsewhere in the OWASP Security and Governance Checklist. Accidental risk often stems from a weakness rather than a MITRE ATLAS attack technique or an OWASP vulnerability.
The bottom line is that adopting AI systems increases the corporate attack surface—potentially significantly. Risk models should be updated to take account of the threat from Rogue AI. Intent is key here: there’s plenty of ways for accidental Rogue AI to cause harm, with no attacker present. And when harm is intentional, who is attacking whom with what resources is critical context to understand. Are threat actors, or Malicious Rogue AI, targeting your AI systems to create subverted Rogue AI? Are they targeting your enterprise in general? And are they using your resources, their own, or a proxy whose AI has been subverted.
These are all enterprise risks, both pre- and post-deployment. And while there’s some good work going on in the security community to better profile these threats, what’s missing in Rogue AI is an approach which includes both causality and attack context. By addressing this gap, we can start to plan for and mitigate Rogue AI risk comprehensively.
Learn more in our fourth Rogue AI blog, "Rogue AI: What the Security Community is Missing"
How to Mitigate the Impact of Rogue AI Risks
The first step is to properly configure the relevant AI services, which provides a foundation of safety against all types of Rogue AI by specifying allowed behaviors. Protecting and sanitizing the points where known AI services touch data or use tools primarily prevents Subverted Rogues, but can also address other ways accidents happen. Restricting AI systems to allowed data and tool use, and verifying the content of inputs to and outputs from AI systems forms the core of safe use.
Malicious Rogues can attack your organization from the outside or act as AI malware within your environment. Many patterns used to detect malicious activities by cyber attackers can also be used to detect the activities of Malicious Rogues. But as new capabilities enhance the evasiveness of Rogues, learning patterns for detection will not cover the unknown unknowns. In this case, machine behaviors need to be identified on devices, in workloads and in network activity. In some cases, this is the only way to catch Malicious Rogues.
Behavioral analysis can also detect other instances of excessive functionality, permissions or autonomy. Anomalous activity across devices, workloads, and network can be a leading indicator for Rogue AI activity, no matter how it was caused.
Comprehensive defense across the OSI communications stack
However, for a more comprehensive approach, we must consider defense in depth at every layer of the OSI model, as follows:
Find out more in our final Rogue AI blog, "How to Mitigate the Impact of Rogue AI Risks"
Before you go:
Are you heading to AWS re:Invent? Make sure to come check out all of our activities during the week! #reInvent
Great dad | Inspired Risk Management and Security Profesional | Cybersecurity | Leveraging Data Science & Analytics My posts and comments are my personal views and perspectives but not those of my employer
1 天前Trend Micro great perspectives and resources shared this week. Indeed companies have to be cautious and intentional in evaluating their AI solutions and strategy to verify the AI are generating the expected results/outcomes and no impact of Rouge AI.