How to process unstructure data for RAG(2)
YuYu(Anne) Chen
Software Engineer | GenAI, ML | Python, Go, NodeJS | PostgreSQL, Redis | Data Modeling, Airflow | Kubernetes, IaaC | AWS, Cloud | GitOps, CICD
Prerequisite:
In this section I'll focus on the Chunking part. Which allow users to post-process elements into more useful “chunks” for uses cases such as Retrieval Augmented Generation (RAG).
Here is a visualization tool for reference to let you see the different chunking strategies:
Set up your custom chunking strategy via unstructured chunking methods:
About the chunking stategies, I recommend for Semantic Chunking.
Langchain:
Llama-index:
After doing a proper chunking, the data can be more understandable by the language model(and also human).