登录查看更多内容

Hybrid search - Retrieve from Graph and Embeddings.

Soumen Mondal

Enthusiastic on Generative AI and open source.

发布日期: 2024年7月9日

Introduction

In today's digital world, the ability to extract and interact with information from various documents is crucial for businesses and individuals alike. Leveraging cutting-edge technologies like Streamlit, LangChain, and Neo4j, we can build sophisticated chatbots that not only retrieve information from documents but also integrate with knowledge graphs for enhanced insights. This article explores two powerful implementations of such chatbots. In earlier article link, I have provided information on how to build the Streamlit chatbot and integrate with Neo4J.

Take away from the article:-

After this article and the earlier article, readers will understand the concept on:-

How to build Streamlit chatbot to create Neo4J graph
How to perform Hybrid search RAG from Neo4J graph.
How to build a chatbot to implement Hybrid search.
Video tutorial of entire implementation.

1. How to build Streamlit chatbot to create Neo4J graph:-

This following highlighted implementation is already covered in another article. Please read that article and video URL to implement the chatbot, setup Neo4J, create Graph and show graph in chatbot.

Video URL of the the first implementation:-

2. Advanced RAG Chatbot with Neo4j Knowledge Graph

The second implementation enhances the basic RAG chatbot by integrating a Neo4j knowledge graph. This allows for more sophisticated data retrieval and visualization. We will implement Hybrid search techniques on the already created graph in Neo4J. This approach will combine search technique of Cypher query and Embedding similarity search.

Setting Up the Environment

To start, we load necessary libraries and environment variables, and set up Neo4j credentials. This ensures that the environment is correctly configured for the subsequent operations.

import os
import tempfile
from langchain_core.runnables import RunnableParallel, RunnablePassthrough
from langchain_community.graphs import Neo4jGraph
from dotenv import load_dotenv, find_dotenv
from langchain_groq import ChatGroq

class RAG_Graph:
    load_dotenv(find_dotenv())

    def __init__(self):
        os.environ["NEO4J_URI"] = "bolt://localhost:7687"
        os.environ["NEO4J_USERNAME"] = "neo4j"
        os.environ["NEO4J_PASSWORD"] = "password"
        self.graph = Neo4jGraph()
        self.llm = ChatGroq(temperature=0.5, groq_api_key=os.getenv("GROQ_API_KEY"), model_name="llama3-70b-8192")

Embedding with Vector Index

The create_vector_index function retrieves unstructured content from the index name 'vector' using HuggingFace embeddings.

Pavan Belagatti 11 个月前

AI Scraping for product data now available in Zyte API

Zyte 8 个月前

A Practical Introduction to Protégé: Open Source…

Ketan Raval 3 个月前

def create_vector_index(self):
    model_name = 'sentence-transformers/all-mpnet-base-v2'
    self.vector_index = Neo4jVector.from_existing_index(
        HuggingFaceEmbeddings(model_name=model_name, model_kwargs={'device': 'cpu'}),
        url=os.environ["NEO4J_URI"],
        username=os.environ["NEO4J_USERNAME"],
        password=os.environ["NEO4J_PASSWORD"],
        index_name="vector",
    )

Preparing the Chat Template

The prepare_chat_template function sets up a chat prompt template, defining how the chatbot should extract and present information from the documents.

def prepare_chat_template(self):
    prompt = ChatPromptTemplate.from_messages(
        [
            ("system", "You are extracting fields and business rules from the text"),
            ("human", "Use this given format to extract the information from the following input: {question}"),
        ]
    )
    self.entity_chain = prompt | self.llm.with_structured_output(Entities)

Retrieving Data

The retriever function handles user queries by retrieving both structured and unstructured data. It combines the results into a final response.

def retriever(self, question: str):
    structure_data = self.structured_retriever(question)
    unstructured_data = [el.page_content for el in self.vector_index.similarity_search(question)]
    final_data = f"""Structured data:\n{structure_data}\nUnstructured data:\n{"#Document ".join(unstructured_data)}"""
    return final_data

Structured Data Retrieval

The structured_retriever function queries the Neo4j knowledge graph to retrieve structured data based on the user's question.

def structured_retriever(self, question: str) -> str:
    result = ""
    entities = self.entity_chain.invoke({"question": question})
    
    for entity in entities.names:
        response = self.graph.query(
            """CALL db.index.fulltext.queryNodes('entity', $query, {limit:2})
            YIELD node,score
            CALL {
              WITH node
              MATCH (node)-[r:!MENTIONS]->(neighbor)
              RETURN node.id + ' - ' + type(r) + ' -> ' + neighbor.id AS output
              UNION ALL
              WITH node
              MATCH (node)<-[r:!MENTIONS]-(neighbor)
              RETURN neighbor.id + ' - ' + type(r) + ' -> ' +  node.id AS output
            }
            RETURN output LIMIT 50""",
            {"query": self.generate_full_text_query(entity)},
        )
        result += "\n".join([el['output'] for el in response])
    return result

Asking Questions

The ask_question_chain function orchestrates the entire process of creating vector indexes, preparing chat templates, and retrieving data to answer user questions. This function will be called from Streamlit UI python file.

def ask_question_chain(self, query):
    self.graph.query("CREATE FULLTEXT INDEX entity IF NOT EXISTS FOR (e:__Entity__) ON EACH [e.id]")
    self.create_vector_index()
    self.prepare_chat_template()

    template = """Answer the question based only on the following context\n{context}\n\nQuestion: {question}\nUser natural language and be concise.\nAnswer: """
    prompt = ChatPromptTemplate.from_template(template)

    chain = (
        RunnableParallel({"context": self.retriever, "question": RunnablePassthrough()})
        | prompt
        | self.llm
        | StrOutputParser()
    )
    result = chain.invoke(query)
    return result

Video tutorial:-

The video tutorial has explanations and entire coding. While watching the video you can also implement in your favorite editor.

Conclusion

By combining Streamlit, LangChain, and Neo4j, we can create a powerful chatbot capable of extracting and interacting with information from documents. This approach leverages both structured and unstructured data retrieval to provide comprehensive answers to user queries. Explore the full code and customize it to fit your specific needs. Happy coding!

Aditya Bagwadkar

Machine Learning Operations practitioner

4 个月

Thanks for sharing this. Very much useful to understand about RAG!

1 次回应

要查看或添加评论，请登录

查看全部

Hybrid search - Retrieve from Graph and Embeddings.

Soumen Mondal

Enthusiastic on Generative AI and open source.

Introduction

Take away from the article:-

1. How to build Streamlit chatbot to create Neo4J graph:-

Video URL of the the first implementation:-

2. Advanced RAG Chatbot with Neo4j Knowledge Graph

Setting Up the Environment

Embedding with Vector Index

领英推荐

Preparing the Chat Template

Retrieving Data

Structured Data Retrieval

Asking Questions

Video tutorial:-

Conclusion

更多精彩文章

社区洞察

其他会员也浏览了

Exploring the Frontier of AI Scraping: A Fireside Chat with Zyte's Tech Leaders- Kevin Magee and Konstantin Lopukhin

Semantic Modeling WITHOUT third-party tools

Building an Efficient Data Scraper Tool : A Step-by-Step Guide to Algorithm Creation

How Web Scraping APIs Can Transform Big Data into Competitive Intelligence

Qlik OpenAI Connector: What You Need to Know

Timescale Newsletter ?? Shaping the Future of Development

Streamlit, The Magic of Data Storytelling

AI and data analytics creating interactive data

Prepare ML Data Faster and at Scale with Open-Source LLMs

Embedding Distance To Enhanced Answer Quality: A Simple Dive

Introduction

Take away from the article:-

1. How to build Streamlit chatbot to create Neo4J graph:-

Video URL of the the first implementation:-

2. Advanced RAG Chatbot with Neo4j Knowledge Graph

Setting Up the Environment

Embedding with Vector Index

领英推荐

Preparing the Chat Template

Retrieving Data

Structured Data Retrieval

Asking Questions

Video tutorial:-

Conclusion

Agentic AI: The Next Evolution in Intelligent Automation

2024年10月5日

Unlocking the Power of AutoGen RAG with Agentic AI

2024年10月3日

Reduce hallucination in RAG with Reflective and Adaptive RAG Using LangGraph.

2024年7月24日

Graph RAG - Streamlit chatbot to generate knowledge graph using Neo4J

2024年6月10日

LLMOps: Evaluate LLM apps with Langsmith.

2024年5月28日

Chapter1 of LLMOps: Overview and integrate LLM apps with Langsmith. Langsmith intro.

2024年5月24日

AutoGen with Local LLM setup - Build and execute code

2023年12月17日

Open Source LLM Chatbot-Local Setup: with Llama2 ,vector DB and PDF(RAG)

2023年10月3日

Scripted Jenkins Input Step Pipeline with Sonar Quality Gate

2021年6月29日

AWS SDKs: Integration of RDS(MySQL), Spring Boot, S3 and deploy in EC2

2021年4月11日

社区洞察

其他会员也浏览了

Exploring the Frontier of AI Scraping: A Fireside Chat with Zyte's Tech Leaders- Kevin Magee and Konstantin Lopukhin

Semantic Modeling WITHOUT third-party tools

Building an Efficient Data Scraper Tool : A Step-by-Step Guide to Algorithm Creation

How Web Scraping APIs Can Transform Big Data into Competitive Intelligence

Qlik OpenAI Connector: What You Need to Know

Timescale Newsletter ?? Shaping the Future of Development

Streamlit, The Magic of Data Storytelling

AI and data analytics creating interactive data

Prepare ML Data Faster and at Scale with Open-Source LLMs

Embedding Distance To Enhanced Answer Quality: A Simple Dive