Integrating RAG API with Vertex AI Vector Search for Enhanced LLM Grounding
Mohammad Jazim
AI Product Owner at DoctusTech-[Building a portfolio of AI Data Products]
Retrieval-Augmented Generation (RAG) combines the power of retrieval systems with generative models to answer queries based on both pre-trained knowledge and external datasets. By pairing the RAG API with Vertex AI Vector Search, developers can create scalable, efficient, and highly accurate systems for semantic retrieval and grounded generation.
In this guide, we walk you through the process of integrating the RAG API with Vertex AI Vector Search, leveraging Google Cloud's powerful tools for creating enhanced LLM applications.
Prerequisites
Before diving into the integration, ensure you have:
1. Setting Up the Environment
Install and Initialize Vertex AI SDK
First, set up the Vertex AI SDK to interact with the platform:
pip install google-cloud-aiplatform
In your Python environment:
from google.cloud import aiplatform
PROJECT_ID = "your-project-id"
LOCATION = "your-location"
aiplatform.init(project=PROJECT_ID, location=LOCATION)
Authenticate
If working in Google Colab, authenticate with:
from google.colab import auth
auth.authenticate_user()
Enable the necessary APIs:
! gcloud services enable compute.googleapis.com aiplatform.googleapis.com --project "$PROJECT_ID"
2. Create a Vertex AI Vector Search Index
The Vector Search index acts as the database for storing vector embeddings. It enables efficient similarity searches for documents or other data representations. Create the index with the following parameters:
my_index = aiplatform.MatchingEngineIndex.create_tree_ah_index(
display_name="my-index",
description="Index for RAG",
dimensions=768, # Match embedding dimensions of your model
distance_measure_type="DOT_PRODUCT_DISTANCE",
index_update_method="STREAM_UPDATE",
approximate_neighbors_count=10,
leaf_node_embedding_count=500,
leaf_nodes_to_search_percent=7,
)
Key Parameters:
3. Deploy the Index to an Endpoint
Next, create an endpoint to query the index. Endpoints provide a way to expose your index for integration with other applications:
my_index_endpoint = aiplatform.MatchingEngineIndexEndpoint.create(
display_name="my-index-endpoint",
public_endpoint_enabled=True
)
DEPLOYED_INDEX_ID = "my-deployed-index"
my_index_endpoint.deploy_index(index=my_index, deployed_index_id=DEPLOYED_INDEX_ID)
Note: Deployment may take up to 30 minutes initially. You can check the status in the Google Cloud Console under the “Index endpoints” tab. After the first deployment, subsequent updates are processed much faster.
4. Setting Up the RAG Corpus
A RAG corpus acts as the bridge between your data and the generative AI model. It structures and manages the data to be retrieved during generation.
Create and Link a RAG Corpus
from google.cloud.aiplatform.experimental import rag
CORPUS_DISPLAY_NAME = "my-rag-corpus"
vector_db = rag.VertexVectorSearch(
index=my_index.resource_name,
index_endpoint=my_index_endpoint.resource_name
)
rag_corpus = rag.create_corpus(display_name=CORPUS_DISPLAY_NAME, vector_db=vector_db)
Alternatively, create an empty RAG corpus to update later:
rag_corpus = rag.create_corpus(display_name=CORPUS_DISPLAY_NAME)
Update the Corpus with Vector Search Information
rag.update_corpus(corpus_name=rag_corpus.name, vector_db=vector_db)
5. Importing Files into the RAG Corpus
Add your datasets to the RAG corpus for use during generation. This can include PDFs, text files, or other structured data. Use the ImportRagFiles API to import documents from Google Cloud Storage or Google Drive:
GCS_BUCKET = "your-bucket-name"
response = rag.import_files(
corpus_name=rag_corpus.name,
paths=[f"gs://{GCS_BUCKET}/your-file.pdf"],
chunk_size=512, # Adjust chunk size based on your use case
chunk_overlap=100, # Optional
)
Tips for Importing:
6. Querying the RAG Corpus
After importing the data, you can query the corpus to retrieve relevant contexts for specific questions or inputs.
RETRIEVAL_QUERY = "Your search query here"
response = rag.retrieval_query(
rag_resources=[rag.RagResource(rag_corpus=rag_corpus.name)],
text=RETRIEVAL_QUERY,
similarity_top_k=10, # Optional
vector_distance_threshold=0.3, # Optional
)
print(response)
Understanding the Response:
7. Grounding LLMs with RAG and Vertex AI
Grounding LLMs involves providing them with external data to improve their accuracy and relevance. By integrating RAG with Vertex AI, you can ensure your generative models are contextually aware.
Setting Up the Integration
from vertexai.preview.generative_models import GenerativeModel, Tool
tool = Tool.from_retrieval(
retrieval=rag.Retrieval(
source=rag.VertexRagStore(
rag_resources=[rag.RagResource(rag_corpus=rag_corpus.name)],
similarity_top_k=10,
),
)
)
model = GenerativeModel(model_name="gemini-1.5-flash-001", tools=[tool])
Generating Grounded Responses
PROMPT = "What is the cargo capacity of Cymbal Starlight?"
response = model.generate_content(PROMPT)
print(response.text)
Example Use Cases:
By integrating RAG API with Vertex AI Vector Search, you can build powerful, scalable systems for semantic search and retrieval-augmented generation. This combination allows your applications to handle large data volumes while delivering precise, contextually grounded responses.
Key Takeaways: