Implement Agentic RAG - The NextGen Intelligent Systems
Lakshminarasimhan S.
StoryListener | Polymath | PoliticalCritique | AgenticRAG Architect | Strategic Leadership | R&D
In the ever-evolving landscape of artificial intelligence, a new paradigm is emerging—one that shifts from passive, query-driven models to proactive, decision-making entities. This transformation is embodied in Agentic AI, an advanced form of AI that autonomously plans, decides, and executes actions, bridging the gap between static machine intelligence and dynamic human cognition.
The Evolution of AI: From Reactive to Agentic
Traditionally, AI systems have functioned as reactive tools, responding to queries and commands with pre-trained knowledge. These models, including retrieval-augmented generation (RAG) approaches, rely heavily on searching vast corpora of data and generating responses based on existing knowledge. While effective, they lack the capability to self-direct or adapt beyond what they have been explicitly trained on.
Agentic AI, however, takes a giant leap forward. Instead of merely retrieving and generating responses, these systems are designed to:
How Agentic AI Works: A Python Implementation
To illustrate Agentic AI in action, let’s examine a Python-based framework that integrates retrieval-augmented generation (RAG) with a decision-making agent. The system employs FAISS for vector-based retrieval, Llama models for natural language generation, and a custom agent class that determines when retrieval is necessary versus direct generation.
Step 1: Building a Knowledge Retrieval System
Using FAISS, we create a vector store of pre-embedded knowledge. Documents are encoded via Sentence Transformers, allowing efficient similarity searches.
import faiss
import numpy as np
from sentence_transformers import SentenceTransformer
import json
# Load embedding model
embedding_model = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")
# Sample documents
documents = [
"Jamsetji Tata's vision laid the foundation for India's industrial revolution.",
"The Tata group has pioneered industries like steel, aviation, and IT.",
"The Tata Trusts have contributed significantly to education and healthcare.",
]
# Generate embeddings
embeddings = np.array(embedding_model.encode(documents), dtype=np.float32)
# Create FAISS index
dimension = embeddings.shape[1]
index = faiss.IndexFlatL2(dimension)
index.add(embeddings)
# Save index and documents
faiss.write_index(index, "vector_store.index")
with open("doc_map.json", "w") as f:
json.dump(documents, f)
Step 2: Implementing Retrieval-Augmented Generation (RAG)
With FAISS storing our knowledge base, we retrieve the most relevant documents for a given query and augment an LLM’s response.
def retrieve_relevant_documents(query, k=3):
"""Retrieve top-k relevant documents for a query using FAISS"""
query_embedding = np.array(embedding_model.encode([query]), dtype=np.float32)
distances, indices = index.search(query_embedding, k)
with open("doc_map.json", "r") as f:
document_list = json.load(f)
return [document_list[i] for i in indices[0]]
Step 3: Integrating a Large Language Model (LLM)
We load a DeepSeek LLM to generate responses based on the retrieved documents.
from llama_cpp import Llama
# Load on-prem model
llm = Llama(model_path="C:/models/deepseek-llm-7b-base.Q8_0.gguf")
def generate_response(query):
"""Generate a response using retrieved context and the LLM"""
retrieved_docs = retrieve_relevant_documents(query)
context = "\n".join(retrieved_docs)
prompt = f"""
You are an AI agent using Retrieval-Augmented Generation (RAG).
Answer the query using the following retrieved documents:
{context}
Query: {query}
Answer:
"""
response = llm(prompt, max_tokens=300)
return response["choices"][0]["text"]
Step 4: Creating an Agent for Decision-Making
Rather than retrieving knowledge for every query, we implement an Agent that decides whether to rely on retrieval or generate a response independently.
class Agent:
"""Custom agent to decide whether to retrieve, generate, or refine responses"""
def __init__(self, llm):
self.llm = llm
def decide_action(self, query):
"""Decide if retrieval is necessary or if LLM alone can answer"""
prompt = f"""
Determine if the query requires external retrieval.
Respond with 'retrieve' if knowledge from documents is needed, otherwise 'generate':
Query: {query}
Answer:
"""
response = self.llm(prompt, max_tokens=10)["choices"][0]["text"].strip().lower()
return response
def execute(self, query):
"""Execute the best approach based on decision"""
action = self.decide_action(query)
if "retrieve" in action:
return generate_response(query)
else:
return self.llm(query, max_tokens=300)["choices"][0]["text"]
# Initialize agent
agent = Agent(llm)
# Example agent decision
query = "Who founded Tata Steel?"
response = agent.execute(query)
print(response)
The Future of Agentic AI
Agentic AI holds immense potential across industries:
领英推荐
Conclusion
The transition from static AI models to Agentic AI marks a significant evolution in intelligent systems. With the ability to autonomously retrieve, generate, decide, and execute, these agents promise to revolutionize how AI interacts with and augments human capabilities.
As we step into this new frontier, the challenge lies in balancing autonomy with control, ensuring that these agents remain aligned with human values, objectives, and ethical considerations. The future of AI is not just about intelligence—it’s about agency.
Appendix
Here the code for downloading the pretrained model for onprem.LLM.
from huggingface_hub import hf_hub_download,HfApi
import os
# Security note: Never hardcode tokens! Use environment variables instead
hf_token = os.getenv("HF_TOKEN", "REPLACE WITH ACCESS_TOKEN") # Replace with your actual token
api = HfApi()
files = api.list_repo_files(
repo_id="TheBloke/deepseek-llm-7B-base-GGUF",
token=hf_token
)
for filename in files:
print(filename)
model_path = hf_hub_download(
repo_id="TheBloke/deepseek-llm-7B-base-GGUF",
filename="deepseek-llm-7b-base.Q8_0.gguf",
token=hf_token,
local_dir="C:/models"
)
StoryListener | Polymath | PoliticalCritique | AgenticRAG Architect | Strategic Leadership | R&D
3 周You can find the working code here. https://github.com/sln2737/AgenticRAG