Chat with PDFs using Generative AI Part 2 using gpt4all Model with FAISS as Vector DB
Satish Srinivasan
Cloud Architect I Cloud Security Analyst I Specialist - AWS & Azure Cloud. AWS Community Builder| AWS APN Ambassador
Chat with PDFs using Generative AI Part 2 using gpt4all Model with FAISS as Vector DB.
In this blog, we will demonstrate how to create a knowledge bot using FAISS Vector Db and gtp4all Open-source models. GPT4All is an ecosystem to train and deploy?powerful?and customized large language models that run locally on consumer grade CPU’s
A GPT4All model is a 3GB - 8GB file that we can download and plug into the GPT4All open-source ecosystem software. Nomic AI supports and maintains this software ecosystem to enforce quality and security alongside spearheading the effort to allow any person or enterprise to easily train and deploy their own on-edge large language models
LangChain is a framework designed to simplify the creation of applications using large language models. As a language model integration framework, LangChain's use-cases largely overlap with those of language models in general, including document analysis and summarization, chatbots, and code analysis.
We will be covering the following topics
Architecture overview
The Architecture diagram for PDF Bot using Postgres (PGVector) as the embedding data source
Prerequisites
The EC2 instance in which we will run the python code.
Walkthrough
we will create a pdf bot using FAISS Vector DB and gpt4all Open-source model.
Let us create the necessary security groups required. EC2 security group inbound rules.
Next let us create the ec2 instance and install the necessary packages.
Press “Launch instance”.
The instance is up and running. We will “SSH” into the instance and show the setup required for running this demo.
These are the packages that are required.
Once the packages are installed, we will download the model “ggml-gpt4all-j-v1.3-groovy.bin” locally.
Steps to setup a virtual environment. Let us first ssh to the EC2 instance.
Run the command as shown below to set up virtual environment
Run the below commands to activate the virtual environment
领英推荐
Next, we will install the required packages
This package is installed. Let us install all the other packages. Next, we need to down load the model we are going to use for semantic search. “ggml-gpt4all-j-v1.3-groovy.bin”.
Next, we will copy the PDF file on which are we going to demo question answer from the S3 bucket. The PDF file is loaded to the S3 bucket. Attach role to the ec2 instance which has S3full access policy.
We will next copy the file from s3 bucket to the required folder in ec2 instance.
Next, we will create the FAISS VECTOR DB and store the DB index locally. In the next part we will Load this created DB and run semantic search.
Index creation Part
The index is created and saved locally the index name is “faiss-index-250” as shown below.
Next, we will Load this index and do a semantic search using the local model we downloaded previously “ggml-gpt4all-j-v1.3-groovy.bin”.
?The contents of .env file
Next, we will create the app.py file and run it to test the model by asking questions on the document we used for creating embeddings in the FAISS vector DB. The code in app.py file is shown below.
Cleaning Up
Shutdown and terminate the EC2 instance in which we have deployed the bot. All these activities are done using AWS Console.
?Conclusion
In this Blog, we learned how to create a PDF based Question answer bot using gpt4all Open-source models with Vector DB’s FAISS. This is a test project to validate the feasibility of a fully private solution for question answering using LLMs and Vector embeddings and is not production ready We need to experiment further to get the optimized model for performance improvement and Production use.
In the next couple of the blogs, we will experiment on incremental updates to the FAISS index. Then also check out other Open source gpt4all models and see how they perform. We will test out the solution we designed in the Part1 of the blog with FALCON LLM model and see how the performance varies. Also, we will try different Open-source models to try create embeddings and see if they have any impact on the solution or performance.