How to process unstructure data for RAG(3)

How to process unstructure data for RAG(3)

Prerequisite:

How to process unstructure data for RAG(2)


Now you extracted the data into the chunks and add save it to the vector DB.

        docs = text_splitter.create_documents([texts])
        save_to_vector_db(docs, text_splitter, retriever)        

The next step is testing the LLM, there might be some of the uncorrect answer when you ask questions in the document.

When you went back to check the chunk, you might find the reason is the Document Parser didn't parse the data to a format that LLM can easily understand.

If encounter the situation, the tips to solve this problem is to make a summary of that chunk.

By leveraging MultiVector Retriever, it can handle the task for you.

When it's finished, save to vectorDB to update your knowledge.


Ref: https://blog.langchain.dev/benchmarking-rag-on-tables/

要查看或添加评论,请登录

YuYu(Anne) Chen的更多文章

  • How to process unstructure data for RAG(2)

    How to process unstructure data for RAG(2)

    Prerequisite: How to process unstructure data for RAG(1) In this section I'll focus on the Chunking part. Which allow…

  • How to process unstructure data for RAG(1)

    How to process unstructure data for RAG(1)

    Sharing how I use the unstructure to handle the unstructure data. Tha main use will be pdf data, while it's also…

社区洞察

其他会员也浏览了