How to process unstructure data for RAG(2)

How to process unstructure data for RAG(2)

Prerequisite:

How to process unstructure data for RAG(1)


In this section I'll focus on the Chunking part. Which allow users to post-process elements into more useful “chunks” for uses cases such as Retrieval Augmented Generation (RAG).


Here is a visualization tool for reference to let you see the different chunking strategies:

https://chunkviz.up.railway.app/


Set up your custom chunking strategy via unstructured chunking methods:

https://docs.unstructured.io/open-source/core-functionality/chunking


About the chunking stategies, I recommend for Semantic Chunking.

Langchain:

https://python.langchain.com/v0.1/docs/modules/data_connection/document_transformers/semantic-chunker/

Llama-index:

https://docs.llamaindex.ai/en/stable/examples/node_parsers/semantic_chunking/


After doing a proper chunking, the data can be more understandable by the language model(and also human).


要查看或添加评论,请登录

YuYu(Anne) Chen的更多文章

  • How to process unstructure data for RAG(3)

    How to process unstructure data for RAG(3)

    Prerequisite: How to process unstructure data for RAG(2) Now you extracted the data into the chunks and add save it to…

  • How to process unstructure data for RAG(1)

    How to process unstructure data for RAG(1)

    Sharing how I use the unstructure to handle the unstructure data. Tha main use will be pdf data, while it's also…

社区洞察

其他会员也浏览了