Rogue AI: A Five Part Series

Rogue AI: A Five Part Series

Welcome to Trend Micro’s monthly newsletter, The Strategic CISO. Discover the latest and most popular blogs from Research, News, and perspectives, a dedicated space for the latest strategic insights, best practices, and research reports to help security leaders better understand, communicate, and minimize cyber risk across the enterprise.

Research, News, and Perspectives

Our goal is to inform security leaders about best practices, the latest industry insights, and more. Let us know what you would like to see from The Strategic CISO newsletter.


Rogue AI is the Future of Cyber Threats

Understanding Rogue AI

While most of the AI-related cyber threats grabbing headlines today are carried out by fraudsters and organized criminals, Rogue AI is where security experts are focusing their long-term attention.

The term “Rogue AI” refers to artificial intelligence systems that act against the interests of their creators, users, or humanity in general. While present-day attacks like fraud and deepfakes are concerning, they are not the only type of AI threat we should prepare for. They will remain in a cat-and-mouse game of detection and evasion. Rogue AI is a new risk, using resources that are misaligned to one’s goal.

Rogue AI falls into three categories: malicious, accidental, or subverted. Each has different causes and potential outcomes; understanding the distinctions helps mitigate threats from Rogue AI.

  • Malicious Rogues are deployed by attackers to use others’ computing resources. An attacker installs the AI in another system to accomplish their own goals. The AI is doing what it was designed to do, intended for malicious purposes.
  • Accidental Rogues are created by human error or inherent technology limitations. Misconfigurations, failure to test models properly, and poor permission control can result in an AI program returning bad responses (like hallucinations), having greater system privileges than intended, and mishandling sensitive data.
  • Subverted Rogues make use of existing AI deployments and resources. An attacker subverts an existing AI system to misuse it and accomplish their own goals. Prompt injections and jailbreaks are nascent techniques subverting LLMs. The AI system is made to operate differently than it was designed to.

Find out more in our first Rogue AI blog, "Rogue AI is the Future of Cyber Threats"

How AI Goes Rogue

Alignment and Misalignment

As AI systems become increasingly intelligent and tasked with more critical functions, inspecting the mechanism to understand why an AI took certain actions becomes impossible due to the volume of data and complexity of operations. The best way to measure alignment, then, is simply to observe the behavior of the AI. Questions to ask when observing include:

  • Is the AI taking actions contrary to express goals, policies, and requirements?
  • Is the AI acting dangerously—whether in terms of resource consumption, data disclosure, deceptive outputs, corrupting systems, or harming people?

Maintaining proper alignment will be a key feature for AI services moving forward. But doing this reliably requires an understanding of how AI becomes misaligned in order to mitigate the risk.

How Misalignment Happens

One of the great challenges of the AI era will be the fact that there is no simple answer to this question. Techniques for understanding how an AI system becomes misaligned will change along with our AI architectures. Right now, prompt injection is a popular exploitation, though sort of command injection is particular to GPT. Model poisoning is another widespread concern, but as we implement new mitigations for this—for example, tying training data to model weights verifiably—risks will arise in other areas. Agentive AI is not fully baked yet, and no best practices have been established in this regard.

What won’t change are the two overarching types of misalignments:

  • Intentional, where someone is trying to use AI services (yours or theirs) to attack a system (yours or another).
  • Unintentional, where your own AI service does not have the appropriate safeguards in place and becomes misaligned due to an error.

Learn more in our second blog, "How AI Goes Rogue"

Identifying Rogue AI

What’s the problem with agentic AI?

Agentic AI is in many ways a vision of the technology that has guided development and popular imagination over the past few decades. It’s about AI systems that think and do rather than just analyze, summarize and generate. Autonomous agents follow the goals and solve the problems set for them by humans, in natural language or speech. But they’ll work out their own way to get there, and will be capable of adapting unaided to changing circumstances along the way.

Additionally, rather than being based on single LLMs, agentic AI will engage and coordinate multiple agents to do different things in pursuit of a single goal. In fact, the value of agentic AI comes from being part of a larger ecosystem—accessing data from diverse sources such as web searches and SQL queries, and interacting with third-party applications. These will be incredibly complex ecosystems. Even a single agentic AI may rely on multiple models, or agents, various data stores and API-connected services, hardware and software.

As discussed, there are various causes of Rogue AI. But they all stem from the idea that risk increases when an AI uses resources and takes actions misaligned to specific goals, policies and requirements. Agentic AI dials up the risk because of the number of moving parts which may be exposed to Rogue AI weaknesses.

Necessary mitigations: Protect the agentic ecosystem.

To mitigate this risk, the data and tools agentic AI uses must be safe. Take data: subverted Rogue AI risk may stem from poisoned training data. It may also come from malicious prompt injections—data inputs which effectively jailbreak the system. Meanwhile, Accidental Rogue AI might feature the disclosure of non-compliant, erroneous, illegal or offensive information.

When it comes to safe use of tools, even read-only system interaction must be guarded, as the above examples highlight. We must also beware the risk of unrestricted resource consumption—e.g., agentic AI creating problem-solving loops that effectively DoS the entire system, or worse still, acquiring additional compute resources which were neither anticipated nor desired to be used.

Find out more in the third Rogue AI blog, "Identifying Rogue AI"

What the Security Community is Missing

Let's explore community efforts currently underway to assess AI risk. While there’s some great work being done, what they’re missing to date is the idea of linking causality with attack context.

Who’s doing what?

Different parts of the security community have different perspectives on Rogue AI:

  • OWASP focuses on vulnerabilities and mitigations, with its Top 10 for LLM Applications report, and high-level guidance in its LLM AI Cybersecurity and Governance Checklist
  • MITRE is concerned with attack tactics and techniques, via an ATLAS matrix that extends MITRE ATT&CK to AI systems
  • A new AI Risk Repository from MIT provides an online database of over 700 AI risks categorized by cause and risk domain

OWASP

Rogue AI is related to all of the Top 10 large language model (LLM) risks highlighted by OWASP, except perhaps for LLM10: Model Theft, which signifies “unauthorized access, copying, or exfiltration of proprietary LLM models.” There is also no vulnerability associated with “misalignment”—i.e., when an AI has been compromised or is behaving in an unintended manner.

Misalignment:

  • Could be caused by prompt injection, model poisoning, supply chain, insecure output or insecure plugins
  • Can have greater Impact in a scenario of LLM09: Overreliance (in the Top 10)
  • Could result in Denial of Service (LLM04), Sensitive Information Disclosure (LLM06) and/or Excessive Agency (LLM08)

Excessive Agency is particularly dangerous. It refers to situations when LLMs “undertake actions leading to unintended consequences,” and stems from excessive functionality, permissions or autonomy. It could be mitigated by ensuring appropriate access to systems, capabilities and use of human-in-the-loop.

MITRE ATLAS

MITRE’s tactics techniques and procedures (TTPs) are a go-to resource for anyone involved in cyber-threat intelligence—helping to standardize analysis of the many steps in the kill chain and enabling researchers to identify specific campaigns. Although ATLAS extends the ATT&CK framework to AI systems, it doesn’t address Rogue AI directly. However, Prompt Injection, Jailbreak and Model Poisoning, which are all ATLAS TTPs, can be used to subvert AI systems and thereby create Rogue AI.

The truth is that these subverted Rogue AI systems are themselves TTPs: agentic systems can carry out any of the ATT&CK tactics and techniques (e.g., Reconnaissance, Resource Development, Initial Access, ML Model Access, Execution) for any Impact. Fortunately, only sophisticated actors can currently subvert AI systems for their specific goals, but the fact that they’re already checking for access to such systems should be concerning.

MIT AI Risk Repository

Finally, there’s MIT’s risk repository, which includes an online database of hundreds of AI risks, as well as a topic map detailing the latest literature on the subject. As an extensible store of community perspective on AI risk, it is a valuable artifact. The collected risks allow more comprehensive analysis. Importantly, it introduces the topic of causality, referring to three main dimensions:

  • Who caused it (human/AI/unknown)
  • How it was caused in AI system deployment (accidentally or intentionally)
  • When it was caused (before, after, unknown)

Intent is particularly useful in understanding Rogue AI, although it’s only covered elsewhere in the OWASP Security and Governance Checklist. Accidental risk often stems from a weakness rather than a MITRE ATLAS attack technique or an OWASP vulnerability.

The bottom line is that adopting AI systems increases the corporate attack surface—potentially significantly. Risk models should be updated to take account of the threat from Rogue AI. Intent is key here: there’s plenty of ways for accidental Rogue AI to cause harm, with no attacker present. And when harm is intentional, who is attacking whom with what resources is critical context to understand. Are threat actors, or Malicious Rogue AI, targeting your AI systems to create subverted Rogue AI? Are they targeting your enterprise in general? And are they using your resources, their own, or a proxy whose AI has been subverted.

These are all enterprise risks, both pre- and post-deployment. And while there’s some good work going on in the security community to better profile these threats, what’s missing in Rogue AI is an approach which includes both causality and attack context. By addressing this gap, we can start to plan for and mitigate Rogue AI risk comprehensively.

Learn more in our fourth Rogue AI blog, "Rogue AI: What the Security Community is Missing"

How to Mitigate the Impact of Rogue AI Risks

The first step is to properly configure the relevant AI services, which provides a foundation of safety against all types of Rogue AI by specifying allowed behaviors. Protecting and sanitizing the points where known AI services touch data or use tools primarily prevents Subverted Rogues, but can also address other ways accidents happen. Restricting AI systems to allowed data and tool use, and verifying the content of inputs to and outputs from AI systems forms the core of safe use.

Malicious Rogues can attack your organization from the outside or act as AI malware within your environment. Many patterns used to detect malicious activities by cyber attackers can also be used to detect the activities of Malicious Rogues. But as new capabilities enhance the evasiveness of Rogues, learning patterns for detection will not cover the unknown unknowns. In this case, machine behaviors need to be identified on devices, in workloads and in network activity. In some cases, this is the only way to catch Malicious Rogues.

Behavioral analysis can also detect other instances of excessive functionality, permissions or autonomy. Anomalous activity across devices, workloads, and network can be a leading indicator for Rogue AI activity, no matter how it was caused.

Comprehensive defense across the OSI communications stack

However, for a more comprehensive approach, we must consider defense in depth at every layer of the OSI model, as follows:

  • Physical: Monitor processor use (CPU, GPU, TPU, NPU, DPU) in cloud, endpoint and edge devices. This applies to AI-specific workload patterns, querying AI models (inference), and loading model parameters into memory close to AI-specific processing.
  • Data layer: Use MLOps/LLMOps versioning and verification to ensure models are not poisoned or replaced, recording hashes to identify models. Use software and AI model bills of materials (SBoMs/MBoMs) to ensure the AI service software and model can be trusted.
  • Network: Limit AI services that can be reached externally as well as the tools and APIs that AI services can reach. Detect anomalous communicators such as human-to-machine transitions and novel machine activity.
  • Transport: Consider rate limiting for external AI services and scanning for anomalous packets.
  • Session: Insert verification processes such as human-in-the-loop checks, especially when instantiating AI services. Use timeouts to mitigate session hijacking. Analyze user-context authentications and detect anomalous sessions.
  • Application and Presentation layers: Identify misconfiguration of functionality, permissions and autonomy (as per the table above). Use guardrails on AI inputs and outputs, such as scrubbing of personal (PII) and other sensitive information, offensive content, and prompt injections or system jailbreaks. Restrict LLM agent tools according to an allow list which limits APIs and plugins and only allows well-defined use of well-known websites.

Find out more in our final Rogue AI blog, "How to Mitigate the Impact of Rogue AI Risks"


Before you go:

Are you heading to AWS re:Invent? Make sure to come check out all of our activities during the week! #reInvent


Mauricio Ortiz, CISA

Great dad | Inspired Risk Management and Security Profesional | Cybersecurity | Leveraging Data Science & Analytics My posts and comments are my personal views and perspectives but not those of my employer

1 天前

Trend Micro great perspectives and resources shared this week. Indeed companies have to be cautious and intentional in evaluating their AI solutions and strategy to verify the AI are generating the expected results/outcomes and no impact of Rouge AI.

回复

要查看或添加评论,请登录

趋势科技的更多文章