Agentic RAGs: consolidated querying of SQL & Document repositories

Agentic RAGs: consolidated querying of SQL & Document repositories


1. Introduction

Given the way technology has evolved, we have become accustomed to thinking of certain solution constructs in terms of specific tools and platforms. For example, retrieval-augmentation-generation (RAG) is pervasive in the Gen AI world today. Given its emergence in the Gen AI era, we mostly think of RAGs in terms of unstructured data / document processing pipelines only. However, in reality, the RAG solution construct can very well be applied to retrieving structured data from a databases / data warehouses using SQL, and then augmenting a prompt with the retrieved structured data.

The above observation was recently triggered by Gen AI use-cases and their solutioning discussions, where we are increasingly seeing the need to query SQL and document repositories in parallel - consolidating the respective query pipelines.

For example, let us consider the following marketing data landscape with:

  • profile data of customers, products, and sales agents, stored in documents,
  • transactional data on orders placed by customers including the involved sales agents stored in a SQL DB.?

Given this, we will need to get insights from both the profile (unstructured) and transactional (structured) data repositories to answer the below query:

Provide a detailed summary of the the Top 3 sales agents for Product X in year?2023.

Needless to say, resolving the above query requires a sequential decomposition of the user query:

  • retrieving the Top 3 sales agents for Product X first from the SQL DB, followed by
  • retrieving their respective profiles from the profiles document repository, followed by
  • generating a summary based on the retrieved profiles and sales data.

In the sequel, we outline a solution flow for the above scenario leveraging an Agentic AI framework for the query decomposition part, followed by Snowflake Cortex AI as the unified platform to implement RAGs over structured and unstructured data repositories.?

2. RAGs using Snowflake Cortex

Following the theme of this article, Snowflake is a good example of a platform known for its data processing prowess in the structured data world. However, it has made significant strides in the Gen AI world recently, positioning itself as a unified platform of choice for both Data & AI / Gen AI pipelines (refer to my previous article on the topic?—?link )?

Cortex AI (Fig. 1) is Snowflake’s Gen AI / large language model (LLM) platform with Snowflake ML catering to the more traditional AI (data science / predictive analytics) capabilities.

Fig. 1: Snowflake’s unified Data & AI platform (Source: Snowflake)

We start by showing how easy it is to build RAG pipelines on documents / unstructured data leveraging Cortex AI. We then extend the same to a consolidated architecture implementing RAGs over both structured and unstructured data repositories.

Together with fine-tuning and reinforcement learning with human feedback (RHLF), RAG forms one of the primary mechanisms of ‘adapting’ a pre-trained LLM with enterprise data making it more contextual, reducing hallucinations in the process— captured in the Gen AI lifecycle stages illustrated in Fig. 2.

Fig. 2: Gen AI lifecycle stages

Given a user query, a RAG pipeline literally consists of the 3 phases below (Fig. 3):

  • Retrieve: Transform user queries to embeddings (vector format) to compare its similarity score (search) with other content.
  • Augment: with search results / context retrieved from a vector store that is kept current and in sync with the underlying document repository.?
  • Generate: contextualized responses by making retrieved chunks part of the prompt template that provides additional context to the LLM on how to answer the query.

Fig. 3: RAG pipeline on Snowflake

We highlight the Snowflake components below that map to the key architecture artefacts of the above RAG pipeline:

  • Context repository: The knowledge repository can be easily updated and governed using Snowflake stages . Once documents are loaded, all the data preparation, including generating chunks (smaller, contextually rich blocks of text), can be performed with Snowpark . For the chunking in particular, teams can also seamlessly leverage LangChain as part of a Snowpark User Defined Function .
  • Vector search: Thanks to the native support of VECTOR as a data type in Snowflake, there is no need to integrate and govern a separate store or service. VECTOR data can be stored in Snowflake tables and similarity queries executes with system-defined similarity functions (L2, cosine, or inner-product distance).
  • LLM inference: Snowflake Cortex completes the workflow with serverless functions for embedding and text completion inference (using either Mistral AI, Llama or Gemma LLMs).

Snowflake also provides Cortex Search as a managed service implementing the retrieval and augmentation parts of a RAG pipeline.

So it can be combined with Cortex LLM Functions to create a full fledged RAG pipeline using data in Snowflake as the underlying context repository?—?illustrated in Fig. 4.

Fig. 4: Leveraging Cortex Search for RAG pipelines (Source: Snowflake)?

3. Text2SQL: Natural Language Queries over Structured Data

In this section, we extend the natural language querying capabilities to structured data stored in a SQL data repository. This capability is called Text2SQL, also referred to as Conversational BI, and we have explored Snowflake’s capabilities to deliver this in a managed and serverless fashion in a previous article (link ).

Cortex Analyst is the Text2SQL equivalent of Cortex Search to query structured data / SQL databases and data warehouses.

Fig. 5: Cortex Analyst & Search for querying structured & unstructured data (Source: Snowflake)

Cortex Analyst is actually an end-to-end text-to-answer solution as it returns the generated SQL as well?—?providing the final query response. It is very easy to deploy (Fig. 6): available as an API, and Snowflake also provides a simple Streamlit application that can be deployed with a few lines of code.?

Fig. 6: Cortex Analyst architecture (Source: Snowflake)

Overall, it provides the following three differentiating features with respect to other Text2SQL tools in the market:

  • User intent validation and query explanation
  • Lightweight semantic model
  • Flexibility with respect to the underlying LLM

We all know that LLMs hallucinate, so the first (and best) thing to do is to validate the system’s understanding of a given query with the user?—?before responding with the final answer. Cortex Analyst enables this by:

  • having a dialogue with the user, where for any given query, it first presents its understanding of the query to the user with an explanation of how it generated the SQL query.
  • In addition, it also provides suggestions to make the query more concrete in case of ambiguities.

Secondly, Cortex Analyst addresses the data repository metadata mapping problem with the help of a semantic model.

The semantic model is the bridge mapping the domain or business specific terms used by users to the database schemas.

These additional semantic details, like more descriptive names or synonyms, enable Cortex Analyst to answer natural language queries more reliably.

Finally, Cortex Analyst provides a lot of flexibility with respect to choosing the underlying LLM. By default, Cortex Analyst leverages Snowflake-hosted Cortex LLMs, which have been heavily fine-tuned for text-to-SQL generation tasks and are one of the most powerful LLMs available today. It is however possible to explicitly opt-in to allow Cortex Analyst to use the latest OpenAI GPT models, hosted by Microsoft Azure, alongside the Snowflake-hosted models.

At runtime, Cortex Analyst will select the optimal combination of models to ensure the highest accuracy and performance for each query.

4. RAG Agents consolidating SQL and Document?Queries

In this section, we tie everything together by outlining an Agentic AI framework to build RAG pipelines that work seamlessly over both structured and unstructured data stored in Snowflake. The reference architecture of such a RAG Agent is illustrated in Fig. 7.

Fig. 7: Reference architecture of a RAG Agent querying both SQL & Document repositories

We first focus on the query decomposition aspect. Given a user task, the goal of an AI agent is to identify (compose) an agent (group of agents) capable to executing the given task. This is achieved by leveraging an LLM as a reasoning engine capable of decomposing the task (query in this case) into sub-tasks, with execution of the respective agents orchestrated by an orchestration engine, e.g. LangGraph .?

The Supervisor agent is responsible for coordinating the overall execution, routing (via if-then-else logic) to the individual agents. As such, the Router is primarily used to route the (SQL and/or document related) sub-queries to the SQL & Document Query agents.

The SQL & Document Query agents are self-explanatory, and leverage the respective Snowflake Cortex Analyst and Search components detailed earlier to query the underlying SQL and Document repositories. Finally, to complete the RAG pipeline, the retrieved data is added to the original query (prompt)?—?leading the generation of a contextualized response.

To conclude, the lines between Data and AI / Gen AI platforms are blurring, and companies like Snowflake are making it ever so easier to leverage Gen AI / LLM capabilities on secure and governed data already stored in Snowflake. So it is high time to give Snowflake’s Cortex AI a shot for your strategic Gen AI use-cases.

Debmalya Biswas

AI/Analytics @ Wipro | x- Nokia, SAP, Oracle | 50+ patents | PhD - INRIA

8 小时前
Samo Zorc

AI Governance @ NLB

2 周

Thanks for sharing. Any information on how to evaluate relevance and accuracy of resposes of such system that integrates such different information sources (SQL vs. documents vs. LLM)?

Thank you for sharing. As always, an outstanding article.

Taekyu Han

A patent engineer, an engineer and an inventor. - THINKWARE Corporation General Manager (Leader of Intellectual Property Department)

3 周

?? ??? ???.

要查看或添加评论,请登录

社区洞察

其他会员也浏览了