AI, Databases, and a New Wave of Software

AI, Databases, and a New Wave of Software


1. Introduction

Generative AI has introduced a fundamental shift in how applications are built, deployed, and experienced. With large language models (LLMs) able to perform tasks once requiring human creativity or reasoning, software is no longer purely deterministic. Instead, it’s becoming probabilistic, meaning the “correct” result may not always be the same each time—and ensuring high-quality outcomes (99.99%+ reliability) is now a crucial challenge.

As Sahir Azam, who leads product and growth at MongoDB, explains, achieving that “last mile” of quality in a probabilistic world involves an ongoing interplay between large language models, real-time data, robust database architectures and evaluation mechanisms. MongoDB—traditionally known as a document-oriented (NoSQL) database—has evolved to handle vector data, full-text search, and classic operational workloads, all under one umbrella. This speaks to a larger market trend: AI software is weaving together multiple data structures, advanced indexes, and real-time analytics.


2. The Shift Toward Probabilistic Software

2.1 Deterministic vs. Probabilistic

In classic software, a deterministic approach is the norm. When you query a relational database, you expect an exact, predictable result—like a bank withdrawal or a hotel booking. In probabilistic software, especially applications powered by LLMs, you might ask the system a question and get slightly different answers each time, depending on the “probabilistic” nature of the model’s reasoning.

  • Quality Challenge: Large enterprises demand near-perfect accuracy (99.9%+). But when the software can produce slightly variable outcomes, the question becomes: How do you guarantee the reliability of results in a mission-critical setting?
  • Integration with Data: Achieving this reliability often means grounding the LLM in real-time information—stored and served by a database—so that the application’s responses are both up-to-date and contextually accurate.


2.2 The Role of Quality Engineering

Ben Thompson’s concept of “quality engineering” for software, akin to manufacturing standards, resonates strongly in AI. A database can inject structured rigor into a system that by default produces fluid, non-deterministic outputs. For instance, robust indexing, filtering, or vector-similarity checks can ensure the LLM only returns answers with verified or “embedded” knowledge from the corporate dataset.

“You’re not going to necessarily get a deterministic result like you would with a traditional application talking to a traditional database. The quality of your embedding models, how you construct rag architectures, and how you merge them with the real-time view of what’s happening in the business is the key to high-quality retrieval.” – Sahir


3. Vector Databases, RAG, and the Future of Search

3.1 From Semantic Search to Enterprise Retrieval

MongoDB’s foray into vector databases predates the ChatGPT boom. Its initial usage: e-commerce clients needed semantic search to go beyond simple keyword matches. By storing vector embeddings—mathematical representations of text meaning—they could surface more relevant search results (e.g., understanding that “running shoes” and “trainers” may refer to the same concept).

With generative AI’s rise, vectors now power retrieval-augmented generation (RAG):

User QueryDatabase finds relevant data chunks via vector similarity → LLM uses those data chunks to craft a grounded, accurate response → Evaluation of LLM output


3.2 Evaluating RAG and LLM Outputs

To ensure the quality and accuracy of RAG-powered systems, it’s crucial to implement robust evaluation mechanisms. Here are key components for evaluating RAG and LLM outputs:

Metrics for Evaluation

  1. Answer Relevancy: Assesses whether the LLM output addresses the input query informatively and concisely.
  2. Contextual Relevancy: Determines if the retriever in a RAG system extracts the most pertinent information as context for the LLM.
  3. Correctness: Measures the factual accuracy of the LLM output against a ground truth.
  4. Hallucination Detection: Identifies instances where the LLM generates fake or made-up information.
  5. TRIAD Framework: Evaluates Context Relevance, Faithfulness, and Answer Relevance.


Evaluation Implementation Strategies

  1. Prompt Engineering: Design effective prompts for evaluation LLMs to accurately judge outputs.
  2. Benchmark Datasets: Utilise datasets like the Answer Equivalence Dataset for consistent evaluation across different models.
  3. Continuous Monitoring: Implement regular evaluation cycles to track performance over time and across model iterations.
  4. Multi-faceted Approach: Combine multiple evaluation methods for a comprehensive assessment of RAG system performance.

Note* These evaluation methods are still evolving, and there’s ongoing research in this area!


3.3 Unified Data Models

Most AI-powered applications also need to search by keywords (classic indexing) or filter by metadata (e.g., “documents from Region X only”). Many solutions require gluing together multiple databases: a standard relational system, a full-text search engine, a vector engine, and so on. MongoDB’s approach merges all these queries—metadata, textual, vector—into a single system.

“This blending of multiple data modalities—keyword, text, vector—improves retrieval quality. Instead of rag ‘gymnastics’, it’s a single API.” – Sahir

That’s particularly important for enterprises wanting 99.99%+ reliability. When all relevant filters and embeddings exist in a single store, it’s easier to ensure consistency, versioning, and fine-grained security.


4. Agents, Memory, and State in AI-Driven Applications

4.1 The Database as Memory + World State

One key concept in emerging AI apps—especially those using “agentic” frameworks—is the idea that an LLM (the “brain”) needs memory (the “database”). But it’s more than memory: the database also reflects the real-world state of the business. For instance:

  • E-commerce: Real-time product inventory, pricing, user data.
  • Enterprise: Updated compliance policies, newly approved documents, or knowledge bases.
  • IoT or Robotics: Sensor data, historical logs, operational constraints.

LLMs, by themselves, generally have knowledge from a fixed training cut-off, or they rely on external “plug-ins” to fetch updated info. If that information is inaccurate or out of date, the application’s final output will falter. Hence, the synergy of “LLM + Real-Time Database” underpins next-level, production-grade AI.


4.2 Handling Longer-Running Logic

When AI workflows become more complex—chain-of-thought prompting, multi-step reasoning, or multiple agents collaborating—the system must track intermediate steps. This introduces the need for a stateful conversation log, state transitions, and record-keeping of each agent’s partial outputs.

Databases shine here because you can store:

  • Agent Queries: Steps or sub-goals in the workflow.
  • User Actions: Confirming or rejecting suggestions.
  • Context: Environment data that changes, e.g. new shipping orders.


5. Transforming a Company to Meet the AI Era

Beyond the purely technical discussion, one vital aspect is how an organisation transitions into an AI-augmented future. Sahir draws parallels with MongoDB’s own transformation from on-prem software to a cloud-based consumption model. The lessons apply equally to companies aiming to “infuse AI” across their products:

1. Top-Down Support

Leadership must align on the strategic priority of AI. Half-measures or side-projects rarely gain traction.

2. Holistic Change, Not Just a “New SKU”

AI shouldn’t be an optional add-on. Every function—product, sales, customer success—should realign around the new approach to data, quality, and user experience.

3. Developer (or Agent) Experience

Much of MongoDB’s success hinged on delivering a frictionless developer experience. By analogy, if future software is partly “agent-generated,” ensuring that an agent can easily and effectively query your database is the next big challenge.

4. Continuous Feedback Loops

Building trust inside the organisation is crucial. Early successes—like AI prototypes that meaningfully reduce costs or improve workflows—help convert skeptics and secure further buy-in.


6. The Road Ahead: What Stays the Same, What Changes

6.1 More Software, Not Less

Contrary to fears that AI might reduce the need for developers, most signs point to an explosion of software. AI tools dramatically lower the barrier to creation, allowing more individuals (and potentially AI agents themselves) to spin up new applications at scale. That means more data—and more reliance on robust data architectures.


6.2 Composability Across Modalities

AI applications will combine various indexing strategies—vector, graph, relational, textual. Each helps a model interpret data differently:

  • Vector: Capturing semantic relationships.
  • Graph: Representing intricate entity-to-entity relationships.
  • Text: Traditional string and lexical queries.
  • Structured: Deterministic transactions, updates, and filters.

As Sahir notes, “pulling them together elegantly” is no trivial feat, but it’s fast becoming table stakes for advanced AI.


6.3 Evolution of Database Primitives

Databases will continue innovating on storage, indexing, and performance. Vector indexing is here to stay—“a new primitive,” as described—but expect constant improvements in how embeddings are stored, retrieved, and updated. Index tuning, cost optimisation, and caching may all look different in an AI-driven future.


6.4 AI Transformation in the Enterprise

Finally, large organisations—often the most risk-averse—are gradually adopting AI. Sahir cites examples in the automotive and pharmaceutical sectors:

  • Automotive: Recording engine noises, converting them into embeddings, and matching them to known issues drastically shortens diagnostic times.
  • Pharma: Large language models auto-generate initial drafts of clinical study reports, trimming a labor-intensive process from hours/days to minutes.


Both illustrate that if AI can tangibly reduce cost and time, enterprises will invest. The gating factor is often ensuring output correctness (and regulatory compliance), which loops back to the vital role of data retrieval, “last mile” reliability, and well-designed architecture.


7. Conclusion: Quality, State, and the Next Software Revolution

Generative AI offers enormous promise, but realising it at scale demands an architecture that elegantly interweaves probabilistic (LLMs) and deterministic (databases) components. With more complex agent workflows, real-time external data, and near-zero tolerance for errors, the importance of robust data-layer solutions has never been greater.

Whether you’re a lean startup or a global enterprise, the emerging “LLM + Vector + Graph + Structured” architecture is quickly becoming the new normal. As Sahir Ram’s experience shows, success requires both technical excellence—choosing or building the right database capabilities—and organisational transformation. AI isn’t just an additional feature; it’s a fundamental shift in how software is built, delivered, and trusted.

For AI-driven applications—and the developers (or autonomous agents) behind them—the future is probabilistic, but the foundation must be solid. Ensuring that LLMs can reliably tap into the world’s latest data, ground their answers in verified knowledge, and coordinate complex tasks demands a powerful, flexible database backbone. By aiming for that last mile of quality, companies can unlock mission-critical possibilities once unimaginable in the world of purely deterministic apps.


Key Takeaways

1. AI Is Probabilistic

Traditional software relies on deterministic outputs. In contrast, AI often yields slightly different outcomes each run, which heightens the importance of robust data validation and retrieval.

2. Vector Databases as a New Primitive

Vectors enable semantic understanding of unstructured data. This is foundational for advanced e-commerce search and retrieval-augmented generation (RAG) with LLMs.

3. Unified Data Approaches

Combining full-text, metadata filtering, graph relationships, and vector similarity in one system can drastically improve reliability, reduce engineering overhead, and help achieve enterprise-level “99.99% quality.”

4. Stateful AI, Agentic Workflows

As AI shifts from single-question prompts to extended agentic logic, a database must act as memory and reflect real-time world state.

5. Business Transformation

Embracing AI is not just a new SKU or side project. It’s a holistic transformation of product development, sales, customer success, and company culture—mirroring MongoDB’s earlier pivot from on-prem to the cloud.

6. Trust and Reliability

Enterprises won’t risk mission-critical operations on “loose” generative AI. Achieving “last mile quality” is the crucial step to widespread adoption, requiring close attention to data, indexing, and retrieval architecture.


By keeping these insights in mind, organisations can navigate the ever-shifting frontier of AI and databases, ultimately delivering innovative products without compromising on quality or trust.

要查看或添加评论,请登录

Franklin Nnah的更多文章

社区洞察