登录查看更多内容

Open Source LLM Chatbot-Local Setup: with Llama2 ,vector DB and PDF(RAG)

Soumen Mondal

Enthusiastic on Generative AI and open source.

发布日期: 2023年10月3日

Overview of the article: - This article is to share information on how to create a chatbot using llama2 LLM model that can talk to a locally stored PDF file. Everything in your local computer. That means subscription cost, no fine tuning and not to worried about data security.

Youtube Video of the code implementation

Benefits of this approach: - ???

1.?? Power of LLM: - large language model uses deep learning techniques and massively large data sets to understand, summarize, generate and predict new content.

2.?? RAG - Retrieving information from a specific document and tune it according to your need is challenging. RAG (Retrieval Augmented Generation) is a step towards that direction. RAG is method to move focus of LLM from outside of the world to our own document and use power of NLPs to retrieve the information.

3.?? Data security - As this is local setup, so data security is the key benefit.

4.?? Cost benefit - No API subscription ??.

Steps: -

1.?? Download llama-2-7b-chat.ggmlv3.q8_0.bin in local– download the model from Meta site. After signing up, download link will come to email id.

2.?? Configure VS Code and Python interpreter through venv (recommended) or just open a project folder. In case you need setup guidance, please refer to https://code.visualstudio.com/docs/python/python-tutorial

3.?? Load the data. Here is the example on PDF but this can be used for HTML, JSON, Excel etc. You need to import proper langchain document loader for that purpose.

from langchain.document_loaders import PyPDFDirectoryLoader

# Load PDF files from data directory that is present inside project folder
loader= PyPDFDirectoryLoader('./data')
documents = loader.load()? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?

4.?? Split the document in chunks using RecursiveCharacterTextSplitter.

from langchain.text_splitter import RecursiveCharacterTextSplitter
#split the data using RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500,? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?chunk_overlap=50)
texts = text_splitter.split_documents(data)

?5.?? ?Embeddings using and load to local Faiss setup:-

Embeddings is a numerical representation of information, example text, audio etc. Numerical representation of information helps to perform semantic search or similarity calculation through various calculation like distance calculation.

Faiss or Facebook AI Similarity Search library of langchain use for similarity search for similar text.

Following code will embed

from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS
# Embeddings into Faiss vactor DB
DB_FAISS_PATH = 'vectorstore/db_faiss'
embeddings = HuggingFaceEmbeddings(model_name='sentence-transformers/all-    MiniLM-L6-v2',?model_kwargs={'device': 'cpu'})
embeddings = HuggingFaceEmbeddings()
db = FAISS.from_documents(texts, embeddings)
db.save_local(DB_FAISS_PATH)

6.?? Running the above code will create a following two files inside mentioned faiss path.

7.?? Invoke local Llama2 using CTransformers.

from langchain.llms import CTransformers
#local lama2 llm
llm = CTransformers(
          model = "llama-2-7b-chat.ggmlv3.q8_0.bin",
? ? ? ? ? model_type="llama", 
          max_new_tokens = 512, 
          temperature = 0.5)

8.?? Set up a question-and-answer chain with ConversationalRetrievalQA?- a chatbot that does a retrieval step to start - is one of our most popular chains.

from langchain.chains import ConversationalRetrievalChain

# Set up the Conversational Retrieval Chain
qa_chain = ConversationalRetrievalChain.from_llm(
             llm,
             db.as_retriever(search_kwargs={'k': 2}),
             return_source_documents=True)

9. Ask for prompt from user and pass it to chain

# Start chatting with the chatbot
chat_history = []
while True:
? ? query = input('Prompt: ')
? ? if query.lower() in ["exit", "quit", "q"]:
? ? ? ? print('Exiting')
? ? ? ? sys.exit()
? ? result = qa_chain({'question': query, 'chat_history': chat_history})? ?   print('Answer: ' + result['answer'] + '\n')   
    chat_history.append((query, result['answer']))

10. That’s it. Now run the python program and ask to summarize the document.

Youtube Video of the code implementation

Note:- This is very high level but can be a good starter. To implement it in enterprise level :- performance and accuracy of response should be meassured. The chat can be build and shared via chainlit. Through ngtok that can be accessed publicly. Thank you.

Amjid Ali

Seasoned ICT Professional | Cloud, Infrastructure, & Digital Transformation | Keynote Speaker

8 个月

Thanks for sharing, it really makes sense

Sandip Ganguli

Python RPA Consultant, Trading Bot Developer, AI Automation Training Coach. Seasoned Automation Specialist with 19+ years of experience. { Ex. Microsoft, Yahoo, VMWare, INTUIT, Yodlee, MobileIron }

1 年

Thanks for sharing the article Soumen Mondal.

查看更多评论

Agentic AI: The Next Evolution in Intelligent Automation

2024年10月5日

Unlocking the Power of AutoGen RAG with Agentic AI

2024年10月3日

Reduce hallucination in RAG with Reflective and Adaptive RAG Using LangGraph.

2024年7月24日

Hybrid search - Retrieve from Graph and Embeddings.

2024年7月9日

Graph RAG - Streamlit chatbot to generate knowledge graph using Neo4J

2024年6月10日

LLMOps: Evaluate LLM apps with Langsmith.

2024年5月28日

Chapter1 of LLMOps: Overview and integrate LLM apps with Langsmith. Langsmith intro.

2024年5月24日

AutoGen with Local LLM setup - Build and execute code

2023年12月17日

Scripted Jenkins Input Step Pipeline with Sonar Quality Gate

2021年6月29日

AWS SDKs: Integration of RDS(MySQL), Spring Boot, S3 and deploy in EC2

2021年4月11日