Comparing RAG Chatbot implementations with Databricks, Snowflake, and Azure OpenAI
ELCA Group
ELCA is one of the biggest independent Swiss IT companies with 2'300 experts. We make it work.
Generative AI (GenAI) 's rapid rise has significantly transformed how businesses interact with their data. Still, it comes with important challenges, such as ensuring the truthfulness and accuracy of the generated content. This is where Retrieval-Augmented Generation chatbots step in. RAG chatbots combine the power of a generative language model with a company’s internal knowledge base, ensuring that responses are not only contextually accurate but also based on real, trusted data.
The idea behind a chatbot based on retrieval augmented generation (RAG) is that when a user asks a question, your private knowledge base, which can be stored in tables, PDFs, or any other unstructured format, is searched for relevant information. This retrieved data, along with the original query, forms an augmented prompt, which is then processed by a generative LLM to deliver a more accurate and precise response, significantly reducing the risk of hallucination — a common problem where AI models produce plausible but factually incorrect responses.
In this article, we will review three different cloud services — Azure OpenAI, Azure Databricks, and Snowflake — that offer powerful tools for implementing a Retrieval-Augmented Generation chatbot. We’ll dive into how each service approaches the task, comparing their implementation methods, and highlighting the pros and cons of each. Again, the database serving as your private knowledge base for the RAG chatbot can contain various types of data — structured, semi-structured, or unstructured — and may reside in different locations, whether on-premises or in the cloud. However, for the purpose of this comparison, and to make it more meaningful we will assume the same initial setup for all three approaches. We assume unstructured data, specifically PDF files, stored in a Azure Blob Storage.
Here both Azure Databricks and Snowflake require linking your workspace to the storage to enable data access and retrieval. With Databricks this also requires access to Unity catalog through which the access to your data is managed. With Snowflake you need to create an external stage, and then copy the data from the Azure storage.
With Azure OpenAI, however, this step isn’t necessary; instead, you create a data source within Azure Cognitive Search which can directly access your data.
Three essential Steps in a RAG Chatbot Framework
1. Setting Up Your Knowledge Base
Before your knowledge base can be queried to retrieve documents for the chatbot’s responses, it’s essential to properly prepare your database. This initial setup is a crucial step; depending on the type of data you have, it should be processed into text chunks and mapped into embeddings to create a vector database. Each platform — Azure OpenAI, Azure Databricks, and Snowflake — takes a slightly different approach, resulting in vector databases optimized for querying. You will also need to choose an embedding model, considering that models offer different strengths depending on the use case.
Databricks: With Azure Databricks, the processing and chunking of data can be done either through the user interface or programmatically within a notebook. The resulting text chunks are stored in a Delta table, which is the default table format in Databricks. Then, with Databricks’ Mosaic AI Vector Search, you can compute embeddings and construct a vector database that is optimized for efficient storage and querying of these embeddings.
Assuming your initial PDF files are in the Azure storage and accessed through Databricks’ Unity Catalog, the steps are as follows:
First, you need to process your data into text chunks, for example using LangChain’s RecursiveCharacterTextSplitter. At this point, your data can be stored as text chunks in a Source Delta table, as in the illustration.
Next, create a Vector Search endpoint, used to access your vector indexes:
from databricks.vector_search.client import VectorSearchClient
VECTOR_SEARCH_ENDPOINT_NAME='my_vector_search_endpoint'
vsc = VectorSearchClient()
vsc.create_endpoint(name=VECTOR_SEARCH_ENDPOINT_NAME, endpoint_type="STANDARD")
Then, create a model serving endpoint for the embedding model, through the UI in your Databricks workspace. This is where you decide which LLM to use to compute the embeddings of your knowledge base. It will be the same model to embed the input question once you actually use your chatbot. Databricks offers a large choice of embedding models, both foundation models made available by Databricks, and external or custom models.
Finally, create a Vector search Index, with the option to update the index manually or continuously when data is added to your database. With this you create your vector database, which will be queried in the next step. Embeddings are also stored in a table, as seen in the illustration, that you can query like a regular table.
#The table we want to index
source_table_fullname = f"{catalog}.{db}.{text_table_name}"
# Where we want to store the index
vs_index_fullname = f"{catalog}.{db}.{vs_index}"
vsc.create_delta_sync_index(
endpoint_name=VECTOR_SEARCH_ENDPOINT_NAME,
index_name=vs_index_fullname,
source_table_name=source_table_fullname,
pipeline_type="TRIGGERED", #Sync needs to be manually triggered
primary_key="id-column",
embedding_source_column='text-chunk-column', #column containing text
embedding_model_endpoint_name='embedding-model' #embedding endpoint
)
Snowflake: In Snowflake you can work either in python or SQL worksheets, or with notebooks, supporting both SQL and python. Assuming you copied your data from the Azure storage through an external staging in Snowflake, text chunks and embeddings are stored in tables, that serve as vector store to be queried. Snowflake’s Cortex Search is used to compute embeddings, and query your vector store.
Start by chunking your data, for example using LangChain’s RecursiveCharacterTextSplitter. At this point your data is stored as text chunks in a table.
Then, compute embedding of each text chunks. Snowflake offers a sql-like syntax:
insert into docs_chunks_table (
relative_path, size,
file_url, scoped_file_url,
chunk,
chunk_vec
)
select relative_path,
size,
file_url,
build_scoped_file_url(@docs, relative_path) as scoped_file_url,
func.chunk as chunk,
SNOWFLAKE.CORTEX.EMBED_TEXT_768('e5-base-v2', chunk) as chunk_vec
from
directory(@docs),
TABLE(pdf_text_chunker(build_scoped_file_url(@docs, relative_path))) as func;
Where pdf_text_chunker would be a udtf in python created by you, to compute text chunks, @docs would be the stage with your PDF files.
The focus here however is on SNOWFLAKE:CORTEX:EMBED_TEXT_768('embed-model', chunk) , which allows to compute embeddings with a LLM that you choose. Here the choice is a bit restricted compared to Databricks, at least for now. Snowflake is making more and more models available.
Azure OpenAI: Since your data is already in a Azure storage, the initial setup is slightly different here.
Given that your input documents are in a storage account, create a primary index and projection index enabling vector search, as well as a datasource that you link to your storage container.
Next, create a skillset, with one skill for chunking, and one for text embedding.
领英推荐
Finally, create and run indexer: Data is accessed through datasource, chunking and embedding is applied. Text embeddings are stored in primary index, which gets mapped to projection index in a one-to-many relation. Now projection index is ready for vector search. Note that index can be run manually or automatically for updates.
2. Document Retrieval: Finding the Right Information
Once the vector database is set up, we can begin querying it using similarity search algorithms to find the most relevant documents. Each platform — Azure OpenAI, Azure Databricks, and Snowflake — offers its own tools for this step, all optimized to efficiently handle similarity searches within the RAG framework.
Databricks: Databricks makes again use of Mosaic AI Vector Search to retrieve the documents that are most relevant to the user’s question. In particular with the following syntax, relevant documents form your vector database are retrieved. Here you have some flexibility regarding the search method to use, the number of documents to retrieve or a threshold for the minimum similarity. In particular, you can choose to use a pure semantic search, or a hybrid search, which in addition uses keyword matching.
question = "A sample question?"
results = vsc.get_index(
VECTOR_SEARCH_ENDPOINT_NAME,
vs_index_fullname).similarity_search(
query_text=question,
columns=["id", "text_chunk"],
num_results=1)
Snowflake: With Snowflake you essentially concatenate two functions, the first to compute the embedding of the question, and the second to perform the similarity search, by computing the similarity between two vectors. Snowflake offers other similarity metrics as well.
SNOWFLAKE.CORTEX.EMBED_TEXT_768( 'embedding-model', question )
VECTOR_COSINE_SIMILARITY( question_embedding, chunk_embedding )
Azure OpenAI: With Azure OpenAI you first create a SearchClient, then using the search method as follows, you can query your index with the provided text. In addition, you choose the number of most relevant documents to retrieve, and whether to use a keyword based, semantic based or hybrid search method.
vector_question = VectorizableTextQuery(
text=question,
k_nearest_neighbors=5,
fields="vector"
)
results = search_client.search(
vector_queries=[vector_question],
top=5
)
3. Generating Responses with Precision
In the response generation phase, there are fewer differences between the platforms, as they all follow a similar approach. A prompt is constructed using both the original question and the retrieved documents, which can be fine-tuned and engineered to fit the specific use case. All three platforms then rely on a generative LLM, which you define, to generate a response based on the retrieved information. Each platform also provides a specific list of models to choose from, which evolves over time to make some of the latest models available.
In addition, you can construct this step so that the chatbot keeps the conversation history somehow in memory. There are several possible approaches for this, the simplest being to concatenate messages. Another option is to constantly summarize the chat history with another LLM for example. But it is up to you, depending on the use case, to choose how to leverage the different tools made available by Databricks, Snowflake, and Azure OpenAI.
Databricks
from langchain_databricks import ChatDatabricks
model = ChatDatabricks(endpoint='gen_model_endpoint', extra_params= #temperature etc. optional)
# You can pass a list of messages
messages = [
("system", "You are a chatbot that can answer questions about Databricks."),
("user", "What is Databricks Model Serving?")
]
model.invoke(messages)
Snowflake
SELECT SNOWFLAKE.CORTEX.COMPLETE(
'llama2-70b-chat',
[{"role": "system", "content": "You are a chatbot that can answer questions about Snowflake."},
{"role": "user", "content": "What is Snowflake Cortex Search?"}]
) as response;
Azure OpenAI
response = client.chat.completions.create(
model="gpt-model",
messages=[
{"role": "system", "content": "Assistant is a large language model trained by OpenAI."},
{"role": "user", "content": "Who were the founders of Microsoft?"}
]
)
Conclusion
In conclusion, while all three platforms — Azure OpenAI, Databricks, and Snowflake — offer robust tools for building a RAG chatbot, their strengths lie in different areas.
Snowflake is ideal for teams seeking simplicity and quick implementation, although it sacrifices some flexibility, particularly in the choice of language models and document retrieval options.
Databricks, on the other hand, requires a more complex implementation, with a great advantage given by the flexibility in choosing for example the LLMs, where this flexibility comes with a steeper learning curve and more complex setup, which may not suit teams looking for a faster solution. In addition, Databricks’ Unity Catalog offers a single place to administer data access policies, with a high granularity. This is a considerable asset when working with proprietary data, for which data should be well managed.
In terms of cost, Snowflake tends to be pricier than Databricks and Azure OpenAI due to its billing model for LLMs, but its rapid release of new features means it continues to evolve as a competitive player in the Generative AI Space.
Azure OpenAI might strike a middle ground. While requiring a few additional implementation efforts compared to Snowflake, it offers seamless integration with Azure storage and a relatively straightforward approach to data retrieval and response generation, falling into the average cost category.
Ultimately, your choice between these platforms should be guided by your specific use case, considering factors such as your needs in terms of LLM choice, data governance and cost.
ELCA has assisted several businesses in implementing solutions based GenAI and RAG particularly, our experts are available to assist you.
Article posted originally on medium.com.
Thank you Justine Stoll
HR Marketing & Employer Branding Manager
1 个月Great job on this article Justine Stoll !