Developing an AI bot powered by RAG and Oracle Database

Developing an AI bot powered by RAG and Oracle Database

What is RAG?

An excellent introduction to Retrieval Augmented Generation (RAG) can be found?here

We can build the UI of the Knowledge Assistant using Langchain, OCI Generative AI, and store data in Oracle 23ai Vector DB.

Customers can easily build the chatbot using Oracle 23ai Vector DB and OCI Generative AI.

To test these functionalities, you can visit our GitHub repository for the python rag chatbot. Follow the instructions in the README file to install the appropriate versions of the required software libraries.

Code and functionalities may change as a result of customer feedback.


High Level Use case Diagram

Building a Full-Stack RAG Chatbot with OCI Generative AI and Oracle Vector Database (Python Powerhouse)

In the realm of chatbots, where responsiveness and knowledge are paramount, Retrieval-Augmented Generation (RAG) offers a compelling solution. This approach combines the power of large language models (LLMs) with the precision of database retrieval, making chatbots more informative and up-to-date. This blog delves into crafting a full-stack RAG chatbot using Oracle Cloud Infrastructure (OCI) Generative AI and Oracle Vector Database, all orchestrated by the versatile Python language.

Why OCI and Oracle 23ai Vector DB?

  • OCI Generative AI: This managed service streamlines LLM integration, providing pre-trained models like Cohere and Meta that can be fine-tuned for specific tasks.
  • Oracle Vector Database: This innovative database excels at semantic search, enabling the chatbot to find the most relevant information from your knowledge base using vector embeddings.
  • Python: Python's extensive libraries like Transformers and Gensim facilitate building the LLM and retrieval components, making development efficient.

Building Blocks of the Chatbot:

  • Knowledge Base Preparation:
  • Vectorization:
  • LLM Integration with OCI Generative AI:
  • Retrieval Engine:
  • Chatbot Framework and User Interface:

Putting it All Together:

  1. User Input: The user interacts with the chatbot interface, posing a question.
  2. Query Processing: The query is preprocessed and vectorized.
  3. Retrieval: The vectorized query is used to search the Oracle Vector Database for relevant documents.
  4. LLM Generation: The retrieved documents (or snippets) are fed into the LLM, prompting it to generate a response.
  5. Response Formatting: The response is refined and formatted for presentation to the user.

Now let’s elaborate on each step?for the python rag chatbot?which is available in Github.

  1. Step 1: User put a question on chatbot


Rag based Chatbot

Internally, we push the question along with the RAG chain in our Python code.

Copy code snippet

response = get_answer(rag_chain, question)        

Step 2: How to build the RAG chain

Building the RAG chain involves multiple steps, as outlined below:

  • Load a list of documents. For e.g. currently our github?code supports?the pdf

Copy code snippet

all_pages = load_all_pages(BOOK_LIST) // BOOK_LIST = [BOOK1, BOOK2, BOOK3, BOOK4, BOOK5, BOOK6]        

  • Split pdf pages in chunks

Copy code snippet

document_splits = split_in_chunks(all_pages) // CHUNK_SIZE = 1000, CHUNK_OVERLAP = 50        

  • Load embeddings model?

whenever user decide to go for local embedding model, he/she?can use hugging face embedding or he/she can go for cohere embedding

Copy code snippet

embedder = create_cached_embedder()        

Copy code snippet

def create_cached_embedder(): ## Initializing Embeddings model...
    fs = LocalFileStore("./vector-cache/")  # Introduced to cache embeddings and make it faster

    if EMBED_TYPE == "<strong>COHERE</strong>": ## Loading Cohere Embeddings Model...
        embed_model = CohereEmbeddings(
            model=EMBED_COHERE_MODEL_NAME, cohere_api_key=COHERE_API_KEY
        )
    elif EMBED_TYPE == "<strong>LOCAL</strong>":
        print(f"Loading HF Embeddings Model: {EMBED_HF_MODEL_NAME}")

        model_kwargs = {"device": "cpu"}
        # changed to True for BAAI, to use cosine similarity
        encode_kwargs = {"normalize_embeddings": True}

        embed_model = HuggingFaceEmbeddings(
            model_name=EMBED_HF_MODEL_NAME,
            model_kwargs=model_kwargs,
            encode_kwargs=encode_kwargs,
        )

    # the cache for embeddings
    cached_embedder = CacheBackedEmbeddings.from_bytes_store(
        embed_model, fs, namespace=embed_model.model_name
    )

    return cached_embedder        

  • Create a Vector Store and store embeddings within the Oracle 23ai database.

We are providing an option in our chatbot where customers can configure the different databases for storing embeddings. The following code demonstrates this feature:

Copy code snippet

vectorstore = create_vector_store(VECTOR_STORE_NAME, document_splits, embedder)        

Let's say we are going to use oracledb we have to pass the store_type as?"ORACLEDB" in the config rag file

Copy code snippet

def create_vector_store(store_type, document_splits, embedder):
    global vectorstore
    print(f"Indexing: using {store_type} as Vector Store...")

    if store_type == "<strong>ORACLEDB</strong>":
       connection = oracledb.connect(user="ADMIN", password="XXXXXX", dsn="XXXXXXX")

<strong>       vectorstore = OracleVS.from_documents(
            documents=document_splits,
            embedding=embedder,
            client=connection,
            table_name="oravs",
            distance_strategy=DistanceStrategy.DOT_PRODUCT
        )</strong>
       print(f"Vector Store Table: {vectorstore.table_name}")
    elif store_type == "<strong>FAISS</strong>":
        # modified to cache
        vectorstore = FAISS.from_documents(
            documents=document_splits, embedding=embedder
        )
    elif store_type == "<strong>CHROME</strong>":
        # modified to cache
        vectorstore = Chroma.from_documents(
            documents=document_splits, embedding=embedder
        )
    return vectorstore        

  • ?Create a retriever?

It will act as a?decoder and provide the response in plain text

By default we have disabled the reranking in our chatbot code.

Copy code snippet

# added optionally a reranker
retriever = create_retriever(vectorstore)        

  • ?Build the OCI GEN AI LLM using below code

Copy code snippet

if llm_type == "OCI":

    llm = OCIGenAI(
        model_id="cohere.command",
        service_endpoint="XXXXXXX",
        compartment_id="mycompartmentId",
        model_kwargs={"max_tokens": 200},
        auth_type='SECURITY_TOKEN',
    )        

  • Kindly define the prompt ( as for now?hard coded...)

Copy code snippet

template = """Answer the question based only on the following context:
{context}

Question: {question}
"""
rag_prompt = ChatPromptTemplate.from_template(template)

<strong>Build the entire RAG chain</strong>

print("Building rag_chain...")
rag_chain = (
        {"context": retriever, "question": RunnablePassthrough()} | rag_prompt | llm
)        

Step 3:

User questions are processed along with the RAG chain, and the retriever finds the answer using the Gen AI LLM.

Copy code snippet

<strong> </strong>def get_answer(rag_chain, question): 
?response = rag_chain.invoke(question)        

?

Highlighting one of the Customer use Case solved using chatbot


customer use case

? ?

Benefits of a Full-Stack RAG Chatbot:

  • Accurate and Up-to-Date Responses: Combines LLMs with retrieval for reliable answers.
  • Scalability: The system can be seamlessly scaled to accommodate larger datasets and user loads.
  • Flexibility: New information can be readily added to the knowledge base, keeping the chatbot updated.
  • Customization: The LLM can be fine-tuned for specific domains or conversational styles.
  • Python Powerhouse: Python's rich ecosystem of libraries empowers this project. From data manipulation to vectorization and LLM interaction (Transformers), Python provides the tools to streamline development.

?

Why Oracle 23ai?

The "best" vector database depends heavily on specific use cases, performance requirements, integration needs, and cost constraints. It's crucial to evaluate these factors for your particular application.

That said, let's analyze the three options we have mentioned in the demo:

?

Database comparison

? ? ? ? ?

ChromaDB, Oracle 23ai DB, and FAISS DB

ChromaDB is a popular open-source vector database designed for flexibility and ease of use. It's often preferred for rapid prototyping and smaller-scale projects due to its Python-centric nature.

Oracle 23ai DB is a proprietary vector database integrated into the Oracle database platform. It offers robust performance and scalability, especially for large-scale enterprise applications. However, it might have a steeper learning curve and higher costs associated with Oracle licensing.

FAISS DB is less known, and there's limited public information available. Without more details about its features and capabilities, it's difficult to provide a comprehensive comparison.

Key Factors to Consider

When choosing a vector database, focus on these aspects:

  • Performance: Evaluate query latency, throughput, and scalability to meet your application's demands.
  • Scalability: Consider how the database can handle increasing data volumes and query loads.
  • Features: Assess the availability of features like similarity search, indexing, filtering, and integrations.
  • Cost: Compare pricing models, including licensing fees, cloud storage costs, and operational expenses.
  • Ease of use: Evaluate the database's API, documentation, and community support.
  • Integration: Consider how the database integrates with your existing infrastructure and applications.

Recommendations

  • Start with ChromaDB: If you're new to vector databases or working on a smaller project, ChromaDB is a good starting point due to its simplicity and active community.
  • Evaluate Oracle 23ai DB: For large-scale enterprise applications with high performance and integration needs, Oracle 23ai DB might be worth considering, especially if you already have an Oracle database infrastructure.
  • Research FAISS DB: If you have specific requirements that align with FAISS DB's capabilities, gather more information and conduct thorough testing.

Additional Considerations

  • Open-source vs. proprietary: Evaluate the trade-offs between flexibility, cost, and support.
  • Cloud vs. on-premises: Consider the deployment options based on your infrastructure and security requirements.
  • Benchmarking: Conduct performance tests with your specific data and workloads to make an informed decision.

Conclusion

Leveraging OCI Generative AI, Oracle Vector Database, and Python empowers you to build a robust RAG chatbot that delivers engaging and informed user experiences across various domains. For enterprise-scale RAG chatbots, Oracle 23ai DB often emerges as a strong contender due to its performance, scalability, and integration with Oracle infrastructure.

Ramandeep Sharma

Senior Software Engineer @ Microsoft | EX - Principal Software Engineer @ Oracle | DataStructure | Algorithm | Software Development @SpringBoot | Microservices | @leetcode

6 个月

Very informative

Shashi Tripathi

Figuring Out ...

6 个月

Great article!?Many organizations are grappling with latency issues and seeking solutions from LLM providers, but implementing caching strategies could offer a more immediate performance improvement. Since Oracle is already widely used as a database, leveraging its built-in caching features could help teams enhance performance without needing to onboard a new technology stack or go through lengthy approval processes. Thanks for shedding light on this critical topic!

Are there any metrics or techniques to ensure the responses are reliable and valid?

Godwin Josh

Co-Founder of Altrosyn and DIrector at CDTECH | Inventor | Manufacturer

6 个月

Building RAG chatbots feels reminiscent of early attempts at conversational AI, like ELIZA, which relied on pattern matching and rule-based systems. The key difference now is the vast amount of data available for training, enabling more nuanced understanding and generation. However, how do you ensure that the retrieved information from your knowledge base is not only factually accurate but also contextually relevant within the ongoing conversation thread?

要查看或添加评论,请登录

社区洞察

其他会员也浏览了