Secure corporate LLMs using only 3 patterns
Three design patterns are all it takes to secure a Corporate LLM in 2024. CISOs will perceive this as a welcome relief.
Corporate LLM IT risks are much simpler than they look
There is just one IT risk which is actually novel, and that is prompt injection. All the rest is run-of-the-mill and should be readily handled by your existing application, IAM and infrastructure security controls.
Staying focused
Cybersecurity analyst who need to ingest the flurry of risks & recommendations dealing with AI risks can easily get lost and find it hard to quantify residual risks with accuracy. Even NIST itself published its own taxonomy of AI related risks in January |6].
For corporate LLMs, my one recommendation is simple:
DON'T focus on the many scary scenarios. Focus on prompt injection!
However, the bad news is that we can't deal with natural language malevolence upfront and efficiently. Don't play casino: the current so-called mitigations are just too early and too experimental to be trusted. Human language is way more subtle and complex than computer protocols used in machine-to-machine communication. Putative "LLM message parsers" have no way to filter natural language messages efficiently. We cannot just settle for poor filtering.
Always keep in mind: a single natural language instruction can be all it takes to cripple your organization.
We need checkpoints to "prevent always", not just guardrails that "detect sometimes".
So how to prevent prompt injection?
Reason like a security architect
Corporate LLMs feed on corporate data. There are basically two non-exclusive ways: fine-tuning and making calls (APIs / events / queries, including RAG).
For fine-tuning, corporate data have been pre-injected into LLM "memory".
When making calls, corporate data are read, modified or acted upon with the agency of a plug-in.
The crucial point to acknowledge is that
Fine-tuning and remote calls (RAG,...) are completely independent. They deserve completely independent mitigations.
Keeping this in mind, the security architect in our heads is going to introduce one pattern tailored for fine-tuning, and two patterns tailored for remote calls.
Pattern #1: In fine-tuned LLMs, prevent prompt injection with Knowledge Reflection
By default, fine-tuned LLMs gather too much corporate information and are too talkative.
They must be broken down (like in anti-trust laws) following a very special kind of least privilege pattern, especially coined for AI:
Fine-tuned LLMs must only contain the corporate data that their consumers are allowed to know.
So, if several consuming populations have different rights, LLM instances must be isolated accordingly. LLMs will be bounded to provide a reflection of their consumer's own knowledge.
With Knowledge Reflection, LLMs act as (very) sophisticated mirrors. No more jailbreaking!
In a nutshell: horizontal trust boundaries must be applied to fine-tuned LLMs, and Knowledge Reflection serves as a guidance for designing these boundaries.
When RAG or other remote calls are required, prevent prompt-injection with Conversational Integrity (pattern #2) and Linguistic Isolation (pattern #3)
In their inevitable interaction with their environment, LLMs are not autonomous, they need plug-ins (API gateways with workflows). Plug-ins, by definition, have the ability to do gating, hence to prevent.
领英推荐
(Image credits: Ashish Bhatia , click on image to get to his post)
For gating, plug-ins have a mighty weapon at their disposal, it's statefulness and we're gonna need it to prevent prompt injection, because, in the end of the day, the stateful nature of plug-ins is what fosters Conversational Integrity.
But what is Conversational Integrity? It's 3 things into one!
We have a necessary condition, but it is not sufficient. We need one extra mitigation, tailored-made for LLMs: linguistic isolation.
Linguistic isolation
One can see an LLM plug-in as the modern version of Janus:
Information passing to and fro the Janus brain of the plug-in must stick to 2 simple principles:
Put together, these 5 rules (the 3 ones from Conversational Integrity, and the remaining 2 from Linguistic Isolation) enforce an efficient design which prevents semantic confusion (leading to data corruption and arbitrary decision making) and noxious payload reentrance (leading to ??self poisoning?? and ??cross LLM contamination??) in the case of RAG and remote calls.
In a nutshell: vertical trust boundaries must be applied to remote calling LLMs. Plug-ins serve as custodians of vertical isolation, provided they enforce Linguistic Isolation and Conversational Integrity.
Conclusion & takeaways
An LLM subjected to Knowledge Reflection, Conversational Integrity and Linguistic Isolation prevent is a tutored AI.
A tutored AI prevents prompt injection, this is a highly desirable feature in a Corporate environment [5].
As of early 2024, AI tutoring is ONE possible state of the art reference architecture for securing enterprise LLMs enriched with corporate data with fine-tuning, RAG and APIs integration.
Recall that:
These boundaries & mitigations are VERY different from the usual IT security WAFs and filters.
The rest is not disruptive: risks must be governed, assessed, mitigated, refused or accepted. Strings must be sanitized. Models must be versioned. Logs must be monitored. API calls must be authenticated, scoped and "least-privileged". AI actions to back-end systems [4] must have human-in-the-loop (for now), etc [2].
A final recommendation: don't treat Corporate LLMs like LLMs for individuals, or LLMs on-device. The focus of today's article is businesses exclusively. If you are seeking guidance for securing digital assistants, which is the main anticipated need for individuals, I warmly recommend you read the thorough blog [1] that Daniel Miessler recently published in Unsupervised Learning, his newsletter.
State-of-the art, the references
[1] Daniel Miessler, https://danielmiessler.com/p/ai-predictable-path-7-components-2024
[2] Rob van der Veer et alter, OWASP AI Exchange, https://owasp.org/www-project-ai-security-and-privacy-guide/owaspaiexchange.html
[3] Christophe Parisel, https://www.dhirubhai.net/pulse/ai-security-2024-christophe-parisel-km2ie/
[4] Q&A with AI Safety Alliance Chair Caleb Sima , https://pureai.com/Articles/2023/12/27/Caleb-Sima-QA.aspx
[6] NIST Publication, Adversarial Machine Learning
[8] LangChain AI, the langGraph library
[9] Microsoft Learn, Understand Azure Policy Effects
Platform Engineer | AWS and Azure Systems Analyst | AI Governance and Compliance
1 个月Now my interest is piqued in how to tutor AI, this seems to me a catch-all way to prevent many attacks, not just prompt injection
Co-founder @ Raito | Data Security for GenAI and Analytics
1 个月Super informative!
Sustainable Architecture & Responsible Innovation | #ArchitectTomorrow & Consultants Saying Things Podcasts | R&D / Technology Director | Speaker & Facilitator | MBCS CITP | ex Chief Architect, ex Big 4
9 个月Thanks for the reference and great article - you'll probably like this from Andrew Martin - although I think it outlines more issues (although this is possibly broader than just corporate LLMs?) https://www.youtube.com/watch?v=PCTYC7PIj-c
CISO @ Starburst
9 个月This is fantastic, Christophe Parisel. ?? Concise, and focused directly on the delta rather than trying to start from scratch. These are exactly the type of security models that will actually get traction in business, where most of us are just trying to iterate a functioning Security practice, rather than re-invent the wheel.
Aldrick ZAPPELLINI