How to process unstructure data for RAG(3)
YuYu(Anne) Chen
Software Engineer | GenAI, ML | Python, Go, NodeJS | PostgreSQL, Redis | Data Modeling, Airflow | Kubernetes, IaaC | AWS, Cloud | GitOps, CICD
Prerequisite:
Now you extracted the data into the chunks and add save it to the vector DB.
docs = text_splitter.create_documents([texts])
save_to_vector_db(docs, text_splitter, retriever)
The next step is testing the LLM, there might be some of the uncorrect answer when you ask questions in the document.
When you went back to check the chunk, you might find the reason is the Document Parser didn't parse the data to a format that LLM can easily understand.
If encounter the situation, the tips to solve this problem is to make a summary of that chunk.
By leveraging MultiVector Retriever, it can handle the task for you.
When it's finished, save to vectorDB to update your knowledge.