Open Source LLM Chatbot-Local Setup: with Llama2 ,vector DB and PDF(RAG)

Open Source LLM Chatbot-Local Setup: with Llama2 ,vector DB and PDF(RAG)

Overview of the article: - This article is to share information on how to create a chatbot using llama2 LLM model that can talk to a locally stored PDF file. Everything in your local computer. That means subscription cost, no fine tuning and not to worried about data security.

Youtube Video of the code implementation

Benefits of this approach: - ???

1.?? Power of LLM: - large language model uses deep learning techniques and massively large data sets to understand, summarize, generate and predict new content.

2.?? RAG - Retrieving information from a specific document and tune it according to your need is challenging. RAG (Retrieval Augmented Generation) is a step towards that direction. RAG is method to move focus of LLM from outside of the world to our own document and use power of NLPs to retrieve the information.

3.?? Data security - As this is local setup, so data security is the key benefit.

4.?? Cost benefit - No API subscription ??.


Steps: -

1.?? Download llama-2-7b-chat.ggmlv3.q8_0.bin in local– download the model from Meta site. After signing up, download link will come to email id.

2.?? Configure VS Code and Python interpreter through venv (recommended) or just open a project folder. In case you need setup guidance, please refer to https://code.visualstudio.com/docs/python/python-tutorial

3.?? Load the data. Here is the example on PDF but this can be used for HTML, JSON, Excel etc. You need to import proper langchain document loader for that purpose.

from langchain.document_loaders import PyPDFDirectoryLoader

# Load PDF files from data directory that is present inside project folder
loader= PyPDFDirectoryLoader('./data')
documents = loader.load()? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?         

4.?? Split the document in chunks using RecursiveCharacterTextSplitter.

from langchain.text_splitter import RecursiveCharacterTextSplitter
#split the data using RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500,? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?chunk_overlap=50)
texts = text_splitter.split_documents(data)        

?5.?? ?Embeddings using and load to local Faiss setup:-

Embeddings is a numerical representation of information, example text, audio etc. Numerical representation of information helps to perform semantic search or similarity calculation through various calculation like distance calculation.

Faiss or Facebook AI Similarity Search library of langchain use for similarity search for similar text.

Following code will embed

from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS
# Embeddings into Faiss vactor DB
DB_FAISS_PATH = 'vectorstore/db_faiss'
embeddings = HuggingFaceEmbeddings(model_name='sentence-transformers/all-    MiniLM-L6-v2',?model_kwargs={'device': 'cpu'})
embeddings = HuggingFaceEmbeddings()
db = FAISS.from_documents(texts, embeddings)
db.save_local(DB_FAISS_PATH)        

?

6.?? Running the above code will create a following two files inside mentioned faiss path.

7.?? Invoke local Llama2 using CTransformers.

from langchain.llms import CTransformers
#local lama2 llm
llm = CTransformers(
          model = "llama-2-7b-chat.ggmlv3.q8_0.bin",
? ? ? ? ? model_type="llama", 
          max_new_tokens = 512, 
          temperature = 0.5)        

8.?? Set up a question-and-answer chain with ConversationalRetrievalQA?- a chatbot that does a retrieval step to start - is one of our most popular chains.

from langchain.chains import ConversationalRetrievalChain

# Set up the Conversational Retrieval Chain
qa_chain = ConversationalRetrievalChain.from_llm(
             llm,
             db.as_retriever(search_kwargs={'k': 2}),
             return_source_documents=True)        

9. Ask for prompt from user and pass it to chain

# Start chatting with the chatbot
chat_history = []
while True:
? ? query = input('Prompt: ')
? ? if query.lower() in ["exit", "quit", "q"]:
? ? ? ? print('Exiting')
? ? ? ? sys.exit()
? ? result = qa_chain({'question': query, 'chat_history': chat_history})? ?   print('Answer: ' + result['answer'] + '\n')   
    chat_history.append((query, result['answer']))        

10. That’s it. Now run the python program and ask to summarize the document.

Youtube Video of the code implementation

Note:- This is very high level but can be a good starter. To implement it in enterprise level :- performance and accuracy of response should be meassured. The chat can be build and shared via chainlit. Through ngtok that can be accessed publicly. Thank you.


Amjid Ali

Seasoned ICT Professional | Cloud, Infrastructure, & Digital Transformation | Keynote Speaker

8 个月

Thanks for sharing, it really makes sense

回复
Sandip Ganguli

Python RPA Consultant, Trading Bot Developer, AI Automation Training Coach. Seasoned Automation Specialist with 19+ years of experience. { Ex. Microsoft, Yahoo, VMWare, INTUIT, Yodlee, MobileIron }

1 年

Thanks for sharing the article Soumen Mondal.

回复

要查看或添加评论,请登录