登录查看更多内容

How to process unstructure data for RAG(2)

YuYu(Anne) Chen

Software Engineer | GenAI, ML | Python, Go, NodeJS | PostgreSQL, Redis | Data Modeling, Airflow | Kubernetes, IaaC | AWS, Cloud | GitOps, CICD

发布日期: 2024年7月1日

+ 关注

Prerequisite:

How to process unstructure data for RAG(1)

In this section I'll focus on the Chunking part. Which allow users to post-process elements into more useful “chunks” for uses cases such as Retrieval Augmented Generation (RAG).

Here is a visualization tool for reference to let you see the different chunking strategies:

https://chunkviz.up.railway.app/

Set up your custom chunking strategy via unstructured chunking methods:

https://docs.unstructured.io/open-source/core-functionality/chunking

About the chunking stategies, I recommend for Semantic Chunking.

Langchain:

https://python.langchain.com/v0.1/docs/modules/data_connection/document_transformers/semantic-chunker/

Llama-index:

https://docs.llamaindex.ai/en/stable/examples/node_parsers/semantic_chunking/

After doing a proper chunking, the data can be more understandable by the language model(and also human).

要查看或添加评论，请登录

YuYu(Anne) Chen的更多文章

How to process unstructure data for RAG(3)

2024年8月12日

How to process unstructure data for RAG(3)

Prerequisite: How to process unstructure data for RAG(2) Now you extracted the data into the chunks and add save it to…
How to process unstructure data for RAG(1)

2024年6月29日

How to process unstructure data for RAG(1)

Sharing how I use the unstructure to handle the unstructure data. Tha main use will be pdf data, while it's also…

How to process unstructure data for RAG(2)

YuYu(Anne) Chen

Software Engineer | GenAI, ML | Python, Go, NodeJS | PostgreSQL, Redis | Data Modeling, Airflow | Kubernetes, IaaC | AWS, Cloud | GitOps, CICD

YuYu(Anne) Chen的更多文章

社区洞察

其他会员也浏览了

Data Discovery Just Got Easier with GraphRAG

Data Discovery Just Got Easier with GraphRAG ??

?????????? ?????????? ???????????? ?????????? ?????? ?????????????????????????? ????????????????!

Exploring Two-Sample Kolmogorov-Smirnov Test with Simulations

"Snake" style solution for cyclically rotating array in LeetCode problem

Time Complexity of an Algorithm – Part 5

Reflections on the #DuBoisChallenge2024: from prints to python to prints

This week in R (2025-02-14)

?? Day15 of #100DaysOfPython ??

?? Day16 of #100DaysOfPython ??

YuYu(Anne) Chen的更多文章

How to process unstructure data for RAG(3)

How to process unstructure data for RAG(1)

社区洞察

其他会员也浏览了

Data Discovery Just Got Easier with GraphRAG

Data Discovery Just Got Easier with GraphRAG ??

?????????? ?????????? ???????????? ?????????? ?????? ?????????????????????????? ????????????????!

Exploring Two-Sample Kolmogorov-Smirnov Test with Simulations

"Snake" style solution for cyclically rotating array in LeetCode problem

Time Complexity of an Algorithm – Part 5

Reflections on the #DuBoisChallenge2024: from prints to python to prints

This week in R (2025-02-14)

?? Day15 of #100DaysOfPython ??

?? Day16 of #100DaysOfPython ??