登录查看更多内容

Create a RAG Knowledge Base from Large Documents with ChatGPT and Code Analysis

Johan de Bruin

Generative AI | Experienced Software Engineer & Team Leader | Ex-Riot, Ex-Age of Learning

发布日期: 2024年3月16日

Sometimes, you may want to create a knowledge base for future queries, either to assist a custom GPT or to establish a RAG system. This knowledge base requires normalized information in a relevant structure, such as multiple JSON files with numerous meta attributes.

This article aims to take a research paper from a diverse prompt pattern catalog, break it down into separate JSON files with uniform attributes, and compile these files into a zip folder using only ChatGPT and no external tools.

Extraction

The first step involves extracting the PDF files. For this, we can use a prompt similar to the following.

Using code analysis extract each page of this PDF as a plain text 
in a separate file

This step is important as ChatGPT may not always be able to read the entire PDF. Now, all the text is accessible in separate text files for download.

Now we ask GPT to read the extracted text files. This will load the paper content as context.

Please read the different extracted text pages so that you can 
answers questions about it.

Index

We want GPT to index each page according to a pattern. This helps GPT identify prompt patterns and determine which parts of the original PDF to use when generating knowledge. Depending on the PDF, some patterns may require indexing without python. In this case, there are no clear headers for indexing, so manual GPT indexing is necessary.

Create an index that maps the prompt patterns to the different
available pages

领英推荐

Create a GPT

Lee Gunn 11 个月前

Code translation with ChatGPT

David Radcliffe 1 年前

Tackling Leetcode Problem 876: Middle of the Linked…

Daniel Gil 11 个月前

Generation

I’ve discovered that the most effective way to generate content is by focusing on the specific parts you wish to produce. In my situation, I want to create a description, template, and an example using this prompt pattern.

Here are the various prompts I’ve utilized for this task.

Now for each pattern create a small description explaining it

Now for each pattern create a small description on when would be 
appropriate to use it.

Now for each prompt pattern provide an example using this pattern. 
Make sure the example is the actual prompt we would provide to the LLM. 
Make sure to use examples targeted to the lay person, like creating a 
nutritionist expert in the persona pattern. Or generating a field trip 
plan with the flipped interaction pattern.

ChatGPT will promptly respond to the request and generate the desired content for each pattern listed in the index.

Formatting

The final step involves compiling all this information and formatting it into JSON files. To assist with this, a prompt is provided detailing the desired JSON structure.

Now for each pattern create a json structure using the text 
previously generated. The structure looks like this:

{
   "description": "The small description previously generated",
   "template": "The template for the pattern",
   "example": "The example of usage of the pattern"
}

With this ChatGPT will start generating the JSON structures for this, it might require multiple runs until it gets all the desired content.

You can now copy and save these JSON files to your preferred knowledge base platform. Depending on your language model’s limitations and the compilation size, you may ask ChatGPT to compile these files into a zip file.

Even though this method is potent, it requires a thorough understanding of the content you’re extracting and some manual prompting. Always review the original source and ensure the output meets your expectations. You might need to adjust some prompts until you achieve the desired output.

Joydeep Bhattacharjee

?? Are you working towards leveling up your career? DM me. Lets Discuss. ????

6 个月

Evaluating RAGs: https://www.youtube.com/watch?v=r0_O0IogbKo

Patent Professional Corporation (Patent PC)

6 个月

Thanks for sharing your insights! Leveraging #generativeai and #promptengineering can indeed unlock valuable knowledge from diverse sources. As an IP law firm, we're intrigued by the potential of tools like #ChatGPT to streamline research and enhance content creation while prioritizing ethical considerations. Looking forward to exploring more innovative applications in this space!

Woodley B. Preucil, CFA

Senior Managing Director

6 个月

Johan de Bruin Very Informative. Thank you for sharing.

1 次回应

查看更多评论

要查看或添加评论，请登录

Personal outcome after one year of entrepreneurship

2020年7月5日

Create a RAG Knowledge Base from Large Documents with ChatGPT and Code Analysis

Johan de Bruin

Generative AI | Experienced Software Engineer & Team Leader | Ex-Riot, Ex-Age of Learning

Extraction

Index

领英推荐

Generation

Formatting

更多精彩文章

社区洞察

其他会员也浏览了

AI LLMs : Are we there yet? - PART 1

How to run code received from ChatGPT

Prompt Engineering with ChatGPT and Python

ChatGPT Ref

Equipped Yourself with these as a Developer

How to Build ChatGPT like Custom Chatbot in 3 simple steps!

How to Feed PDF Documents into ChatGPT and Analyze Them

ChatGPT: 5 Surprising Things You Didn't Know About Your Favorite Language Model !

Notes on ChatGPT Prompt Engineering for Developers

I asked OpenAI's ChatGPT to write me a trading algorithm. It blew my mind.

Extraction

Index

领英推荐

Generation

Formatting

Personal outcome after one year of entrepreneurship

2020年7月5日

社区洞察

其他会员也浏览了

AI LLMs : Are we there yet? - PART 1

How to run code received from ChatGPT

Prompt Engineering with ChatGPT and Python

ChatGPT Ref

Equipped Yourself with these as a Developer

How to Build ChatGPT like Custom Chatbot in 3 simple steps!

How to Feed PDF Documents into ChatGPT and Analyze Them

ChatGPT: 5 Surprising Things You Didn't Know About Your Favorite Language Model !

Notes on ChatGPT Prompt Engineering for Developers

I asked OpenAI's ChatGPT to write me a trading algorithm. It blew my mind.