The Hidden Complexity of Securing AI Embeddings in Enterprise Chatbots
I've been researching how to secure general-purpose chatbots that leverage embedding models, and I see a lot of confusion and misconceptions around how these technologies can be effectively secured.
This article will explore why securing embeddings in AI systems is such a complex task, focusing on practical constraints, real-world use cases, and the current tools and techniques available. It is intended for AI practitioners, security professionals, and anyone looking to understand the intricacies of embedding security within enterprise chatbot applications.
If you're not familiar, embeddings are a type of inference
Which means embeddings are numerical representations of data, capturing conclusions drawn from it in a way that's ready for storage and later use. Unlike simple vectors or hashes, embeddings preserve semantic relationships, enabling AI models to interpret and retrieve nuanced meanings from data.
They’re sometimes called the “memory” of AI, but they are not like hashes; they can be decoded back to reveal information. Think of an embedding as a vector like this:
vector_array_high_dim = [
[0.12, 0.33, 0.56, 0.78, 0.19, 0.24, 0.51, 0.87, 0.98],
[0.67, 0.45, 0.24, 0.89, 0.32, 0.71, 0.54, 0.47, 0.49],
[0.25, 0.78, 0.11, 0.49, 0.56, 0.32, 0.99, 0.51, 0.74],
[0.54, 0.92, 0.17, 0.38, 0.25, 0.76, 0.80, 0.58, 0.37]
]
These numbers come from processing input data through an AI model, which turns it into a mathematical form that captures the relationships and meanings in the data. Even though it looks like a random set of numbers, this might represent something like the relationship between words in a sentence, the features of an image, or even user behavior patterns.
Embeddings can store huge amounts of data
Embeddings help ground large language models (LLMs), reducing hallucinations by anchoring responses to real data. But this utility brings a challenge, managing access to sensitive information embedded within these structures.
Consider a chatbot handling tens of thousands of users, each querying documents with unique, evolving permissions. It’s like a library where users can access only specific shelves; managing this access is intricate.
For every query, you would need to:
These steps introduce latency, risking usability. Real-world organizations add even more complexity with nested groups, inherited permissions, and constant updates, making real-time checks both computationally heavy and technically demanding.
Adding to this challenge are security risks. Embedding attacks can reconstruct original inputs, exposing sensitive information like names and phrases. Researchers have shown that these attacks are increasingly feasible, especially with open-source tools that make these methods more accessible
Microsoft has a huge advantage (kind of) in this space
Microsoft has a unique edge here. Through Microsoft Graph, they’ve created a unified permissions model across services, backed by robust access control lists and the infrastructure to manage permissions at scale. Few, if any, companies are as well-positioned to ensure security and control across diverse user scenarios.
Since 2023, Microsoft has come a long way, preparing both their organization and clients for the AI wave with foundational security, privacy, and governance measures others are only now beginning to consider. This infrastructure gives them a significant advantage.
However, there's a catch. Despite their strong foundation, Microsoft’s Copilot lineup currently lags behind the performance people are used to for frontier chatbots like ChatGPT, Claude, Meta AI, Gemini, and LeChat. They need to tackle the latency issues created by content filters, compliance checks, grounding, and other backend processes. Without these improvements, users may turn to faster, external AI tools, bypassing organizational systems and policies.
This phenomenon, often called Shadow AI, happens when employees turn to unapproved AI tools for work tasks, leading to data flowing into external systems. This creates significant security risks and the potential for data leaks.
Security that blocks productivity is a classic anti-pattern. When security slows down work, it inevitably loses out.
I believe Microsoft can overcome this hurdle in time. Addressing the latency caused by content filters, compliance checks, grounding, and other backend processes will allow them to create a high-performing AI ecosystem. And unlike many other companies, Microsoft has the infrastructure and expertise to solve this problem effectively. Latency is a far simpler issue to tackle than building something as robust as Entra and Microsoft Graph from the ground up.
They just need to speed things up.
In lieu of Microsoft's infrastructure, what's a realistic approach for embedding Security?
There aren't many good options. I mean that.
Securing the vector database and the infrastructure supporting the embedding model is the first priority. This involves setting up strong access controls, encryption, and monitoring to block unauthorized access. Since the vector database contains embeddings that could expose sensitive information if leaked, securing it directly is crucial for preventing data leaks. It’s also essential to protect the serving environment so that no one can exploit the system to extract embeddings or carry out inversion attacks.
Beyond that, leveraging embedding models at scale is challenging, and there aren't many comprehensive solutions. Here are some practical approaches:
Here’s where open-source tools hit a wall with embedding security. They just aren’t built for the scale or real-time permissions that complex enterprise setups demand. Unlike Microsoft’s infrastructure, which has robust, large-scale permission models baked in, open-source solutions struggle with dynamic checks across thousands of users. This leads to latency, making them tough to use in real-world scenarios where speed and precision matter. Sure, open-source tools are flexible and customizable, but they’re not cut out for the high-stakes, real-time permissions that many organizations rely on.
The best approach for building a chatbot with embeddings
I'm going to oversimplify this for the purpose of this article by just focusing on the elements that are relevant to embeddings.
The two most common approaches I see are either using open-source platforms like LibreChat or leveraging wrappers like LangChain or Semantic Kernel. But what’s the difference, and when should you choose one over the other?
Open-source Platforms (e.g., LibreChat)
AI?Framework Wrappers (e.g., LangChain, Semantic Kernel)
Neither option adequately addresses the embedding security challenge—and that's the problem. These tools are built for functionality, not for managing the intricate permission models that large enterprises require.
Embedding security is strongest within realistic boundaries
The reality is that truly securing embeddings remains challenging to implement at scale. Most successful implementations today are limited in scope, focused on specific departments or use cases where permission models are simpler and existing infrastructure can be leveraged.
Here’s a recap of practical steps for mitigating embedding security risks:
Until we see significant advances in either:
Most organizations should focus on targeted implementations with clear security boundaries rather than attempting to roll out a general-purpose chatbot with embeddings. As the field evolves, we'll likely see better solutions emerge, but for now, understanding these constraints is crucial for successful enterprise AI implementation.
Part of effective security architecture is understanding the boundaries of what is feasible or practical. Leveraging embeddings in a general-purpose chatbot can be secure, performant, scalable, and adaptable, but only within a carefully defined and realistic scope that accounts for current infrastructure and technology limitations.
Disclaimer: The views and opinions expressed in this article are my own and do not reflect those of my employer. This content is based on my personal insights and research, undertaken independently and without association to my firm.