Quaestor-AI: An Extensible Framework for Advanced Retrieval-Augmented Generation

Quaestor-AI: An Extensible Framework for Advanced Retrieval-Augmented Generation

Introduction

Quaestor AI is an innovative framework designed to address the limitations of current Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) systems. It offers a flexible, extensible architecture that allows for customization at various levels, from knowledge base management to query processing and evaluation.

For the latest source code and implementation details, please refer to our GitHub repository: https://github.com/sanjivjha/Quaestor-AI

Key Features and System Architecture

1. Dynamic Knowledge Base

Quaestor AI employs a dynamic knowledge base that can be continuously updated and expanded:

  • PDF Ingestion: Allows uploading of domain-specific documents.
  • Wikipedia Integration: Dynamically fetches and integrates up-to-date information.

class SelfRAGSystem:
    def ingest_pdf(self, pdf_path: str) -> int:
        loader = PyPDFLoader(pdf_path)
        documents = loader.load()
        text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
        texts = text_splitter.split_documents(documents)
        added_docs = self.knowledge_base.add_documents(texts)
        return len(added_docs)        

2. Multi-Stage Query Processing

The system implements a sophisticated query processing pipeline:

  • Query Classification: Categorizes queries to determine the most appropriate retrieval strategy.
  • Contextual Retrieval: Uses FAISS for efficient, similarity-based information retrieval.
  • Answer Generation: Leverages advanced language models for response generation.
  • Answer Evaluation: Assesses the quality of generated responses.


class AnswerEvaluator:
    def evaluate(self, query: str, answer: str) -> AnswerEvaluation:
        eval_prompt = PromptTemplate.from_template(
            "Evaluate the following answer for relevance, completeness, and accuracy:\n\n"
            "Query: {query}\nAnswer: {answer}\n\n"
            "Provide scores (0-1) and explanations for each criterion."
        )
        eval_result = self.llm(eval_prompt.format(query=query, answer=answer))
        # Parse eval_result and return AnswerEvaluation object        

3. Iterative Query Refinement

To improve performance on complex queries, the system can refine queries based on initial results:

class QueryEnhancer:
    def enhance_query(self, original_query: str, context: str, previous_answer: str, evaluation: AnswerEvaluation) -> str:
        enhance_prompt = PromptTemplate.from_template(
            "Given the original query, context, previous answer, and evaluation, suggest an improved query:\n\n"
            "Original Query: {original_query}\nContext: {context}\n"
            "Previous Answer: {previous_answer}\nEvaluation: {evaluation}\n\n"
            "Improved Query:"
        )
        return self.llm(enhance_prompt.format(
            original_query=original_query,
            context=context,
            previous_answer=previous_answer,
            evaluation=evaluation
        ))        

4. Transparent Processing

The system offers a debug mode that provides insights into the decision-making process:

def process_query(self, query, message_placeholder):
    self.log_action("Query Received", f"Query: {query}")
    try:
        response, iterations = self.rag_system.query(query)
        full_response = f"**Answer:** {response}\n\n"
        if st.session_state.debug_mode:
            full_response += "**Process Details:**\n"
            for iteration in iterations:
                full_response += f"Iteration {iteration.get('iteration', 'N/A')}:\n"
                full_response += f"- Strategy: {iteration.get('strategy', 'N/A')}\n"
                full_response += f"- Explanation: {iteration.get('explanation', 'N/A')}\n"
                # ... more debug information ...
        return full_response
    except Exception as e:
        error_message = f"Error processing query: {str(e)}"
        self.log_action("Query Processing Failed", error_message)
        return error_message        

Extensibility and Customisation

Quaestor AI is designed as a flexible framework that can be extended and customized to meet specific needs. Here's how you can leverage its extensibility:

1. Federated Knowledge Structure

Instead of centralising all information, Quaestor AI supports a federated approach:

class FederatedKnowledgeBase:
    def __init__(self):
        self.local_store = FAISS.from_texts(["Initial empty knowledge base"], self.embeddings)
        self.external_sources = {}

    def add_external_source(self, name: str, source: Callable):
        self.external_sources[name] = source

    def query(self, query: str, sources: List[str] = ["local"]):
        results = []
        if "local" in sources:
            results.extend(self.local_store.similarity_search(query))
        for source in sources:
            if source in self.external_sources:
                results.extend(self.external_sources[source](query))
        return results

# Usage
knowledge_base = FederatedKnowledgeBase()
knowledge_base.add_external_source("enterprise_db", query_enterprise_database)
knowledge_base.add_external_source("pubmed", query_pubmed_api)        

This structure allows easy integration with enterprise knowledge bases or public databases without copying all data locally.

2. Custom Tool Integration

You can extend the system's capabilities by adding custom tools:

class SelfRAGSystem:
    def add_tool(self, tool: Tool):
        self.tools.append(tool)
        if self.agent_executor:
            self.agent_executor.tools.append(tool)

# Example: Adding a custom PubMed search tool
class PubMedSearchTool:
    def search_pubmed(self, query: str) -> str:
        # Implement PubMed search logic here
        pass

    def get_tool(self) -> Tool:
        return Tool(
            name="PubMed Search",
            func=self.search_pubmed,
            description="Search PubMed for medical research papers"
        )

rag_system = SelfRAGSystem()
pubmed_tool = PubMedSearchTool()
rag_system.add_tool(pubmed_tool.get_tool())        

3. Pluggable Evaluation and Classification

The evaluation and classification mechanisms can be customised:

class SelfRAGSystem:
    def set_answer_evaluator(self, evaluator: BaseEvaluator):
        self.answer_evaluator = evaluator

    def set_query_classifier(self, classifier: BaseClassifier):
        self.query_classifier = classifier

# Custom Evaluation Example
class SentimentBasedEvaluator(BaseEvaluator):
    def evaluate(self, query: str, answer: str) -> AnswerEvaluation:
        sentiment = analyze_sentiment(answer)
        return AnswerEvaluation(
            relevance_score=sentiment.relevance,
            completeness_score=sentiment.completeness,
accuracy_score=sentiment.accuracy
        )

# Custom Classification Example
class DomainSpecificClassifier(BaseClassifier):
    def classify(self, query: str) -> str:
        if "medical" in query.lower():
            return "medical_rag"
        elif "legal" in query.lower():
            return "legal_rag"
        else:
            return "general_rag"

rag_system.set_answer_evaluator(SentimentBasedEvaluator())
rag_system.set_query_classifier(DomainSpecificClassifier())        

4. Dynamic Query Enhancement

When the local knowledge base is insufficient, the system can dynamically query external sources:

class QueryProcessor:
    def process_query(self, query: str) -> str:
        local_answer = self.query_local_knowledge_base(query)
        if self.answer_evaluator.is_satisfactory(local_answer):
            return local_answer
        
        enhanced_query = self.query_enhancer.enhance(query, local_answer)
        external_answer = self.query_external_sources(enhanced_query)
        
        return self.combine_answers(local_answer, external_answer)

    def query_external_sources(self, query: str) -> str:
        for tool in self.external_tools:
            if tool.is_relevant(query):
                return tool.execute(query)
        return ""        

Practical Implementation

Here's how you might use these features in practice:

# Initialize the system
rag_system = SelfRAGSystem()

# Add custom knowledge sources
rag_system.add_external_source("enterprise_db", EnterpriseDBConnector())
rag_system.add_external_source("pubmed", PubMedAPIConnector())
# Add custom tools
rag_system.add_tool(CustomCalculatorTool().get_tool())
rag_system.add_tool(DomainSpecificSearchTool().get_tool())

# Set custom evaluation and classification
rag_system.set_answer_evaluator(IndustrySpecificEvaluator())
rag_system.set_query_classifier(MultiLabelClassifier())

# Use the system
query = "What are the latest treatments for type 2 diabetes?"
answer, process_details = rag_system.query(query)

print(f"Answer: {answer}")
print("Process Details:")
for step in process_details:
    print(f"- {step['description']}: {step['result']}")        

Conclusion

Quaestor AI offers a comprehensive solution to the limitations of current LLM and RAG systems. Its flexible architecture allows for customisation at every level, from knowledge base management to query processing and evaluation. By providing a federated knowledge structure, an extensible tool ecosystem, and pluggable components, it enables the creation of specialized, adaptive AI research assistants tailored to specific domains and use cases.

Whether you're integrating with enterprise systems, adding domain-specific tools, or implementing custom evaluation criteria, ResearchPal AI provides the flexibility to build a system that meets your unique requirements while addressing the common challenges in information retrieval and knowledge synthesis.

For developers and researchers interested in contributing to or extending ResearchPal AI, we encourage you to explore our https://github.com/sanjivjha/Quaestor-AI, where you'll find detailed documentation, contribution guidelines, and the latest updates to the framework.

Kausik Sen

Leading the Analytics CoE for digital transformation of the largest energy company of India

5 天前

Great effort! Developers just need to build connectors to different sources and it can be a part of this repository of knowledge. Will be of good help for Q and A on technical manuals, compliance guidelines etc.

回复

要查看或添加评论,请登录

Sanjiv Kumar Jha的更多文章