Geek Out Time: Play with LangChain

Geek Out Time: Play with LangChain

LangChain (https://www.langchain.com/) is an open-source framework designed for building applications using large language models (LLMs). It provides tools and abstractions to simplify the development of applications like chatbots and virtual agents, enabling these applications to be context-aware and capable of sophisticated reasoning.

I experimented with this framework on Google Colab to build a simple RAG(Retrieval-Augmented Generation) application, including setting up the necessary environment, installing key Python libraries, authenticating with the OpenAI API, preparing the data preparation, vectorization for efficient search, and crafting an AI conversation flow using LangChain. This RAG system helps to deliver precise, context-aware responses, utilizing a vast knowledge base. Let’s start.

Set up the environment on Google Colab

  1. Open https://colab.research.google.com/ and create your notebook.
  2. Prepare the data.txt file, which contains the information you want your bot to use. I have created a simple text file with information below that obviously cannot be obtained from the internet. Place the file in your Google Drive in a location accessible to the program

Nedved yang likes to eat chicken rice        

Import the libraries and set up the open AI key

# Import necessary libraries and modules from langchain and other packages
from langchain.chains import RetrievalQA, ConversationalRetrievalChain
from langchain.chat_models import ChatOpenAI
from langchain.document_loaders import TextLoader
from langchain.vectorstores import DocArrayInMemorySearch, FAISS
from langchain.embeddings import OpenAIEmbeddings, HuggingFaceInstructEmbeddings
from langchain.memory import ConversationBufferMemory
from langchain.indexes import VectorstoreIndexCreator
from langchain_experimental.agents.agent_toolkits.csv.base import create_csv_agent
from langchain.agents.agent_types import AgentType
import openai

# For Google Colab users, mount Google Drive to access files
from google.colab import drive
drive.mount('/content/drive/')

import os

# Request and configure the OpenAI API key for usage
api_key = input("OpenAI API key: ")
os.environ["OPENAI_API_KEY"] = api_key
print("OPENAI_API_KEY has been successfully configured.")

# Display utilities from IPython for enhanced output formatting
from IPython.display import display, Markdown

# Note: This code snippet assumes you're working in a Google Colab environment and requires an OpenAI API key.
# It includes mounting Google Drive for accessing files and setting up environment variables for OpenAI API access.        

Load the data to the vector database

# Split the 'data.txt' file into chunks and create embeddings from those chunks.
# Ensure to check your OpenAI API quota before proceeding.

from langchain.text_splitter import CharacterTextSplitter
from langchain.loading import TextLoader
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import FAISS

# Define the path to your text file
text_file_path = '/content/drive/MyDrive/LCTest/data.txt'

# Load the text data from the specified file path
text_data_loader = TextLoader(file_path=text_file_path, encoding="utf-8")
text_data = text_data_loader.load()

# Initialize the text splitter with specific chunk size and overlap
splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=200)

# Split the loaded text data into chunks
chunked_data = splitter.split_documents(text_data)

# Initialize the embeddings and vector store for the chunked data
embedder = OpenAIEmbeddings()
vector_store = FAISS.from_documents(chunked_data, embedding=embedder)        

Create a conversation Chain


# Initialize a conversational chain with a language model for dynamic conversation handling.
from langchain.llms import ChatOpenAI
from langchain.memories import ConversationBufferMemory
from langchain.chains import ConversationalRetrievalChain

# Set up the language model with specific parameters for conversation generation.
language_model = ChatOpenAI(temperature=0.1, model_name="gpt-3.5-turbo")

# Configure a memory buffer to store and retrieve conversation history.
conversation_memory = ConversationBufferMemory(
    memory_key='chat_history',  # Key to identify conversation history in memory.
    return_messages=True        # Option to return previous messages in the conversation.
)

# Create a conversational retrieval chain that leverages the language model,
# a specified retriever for information retrieval, and a memory buffer for context.
conversation_chain = ConversationalRetrievalChain.from_llm(
    llm=language_model,
    chain_type="custom",  # Specify the type of conversational chain. "stuff" is replaced with "custom" for clarity.
    retriever=vector_store.as_retriever(),  # Use the previously created vector store as the information retriever.
    memory=conversation_memory  # Include the conversation memory for context-aware conversations.
)
        

Query

# Formulate a query to find out Nedved Yang's favorite food using the conversational chain.
query_text = "What is the favorite food for Nedved Yang?"

# Execute the query through the conversation chain to obtain a response.
query_response = conversation_chain(query={"question": query_text})

# Extract the answer from the query response.
favorite_food_answer = query_response["answer"]

# Display the obtained answer.
favorite_food_answer        

The output below is from the data.txt file.

Nedved Yang likes to eat chicken rice.        

Try another one,

# Request suggestions for places in Singapore where Nedved Yang can make purchases.
purchase_query = "Can you suggest places in Singapore for Nedved Yang to buy?"

# Submit the query to the conversational chain and capture the response.
purchase_response = conversation_chain({"question": purchase_query})

# Extract the suggested places from the response.
suggested_places = purchase_response["answer"]

# Output the list of suggested places.
suggested_places        

The output is

Nedved Yang can buy chicken rice, his favorite food, at various hawker centers and food courts in Singapore. Some popular places to try chicken rice include Maxwell Food Centre, Tian Tian Hainanese Chicken Rice at Maxwell Road, and Chinatown Complex Food Centre.        

The memory has been preserved, and the prompt containing the information retrieved from data.txt has been successfully sent to GPT-3.5 Turbo. It looks awesome.

Langchain is a powerful tool that offers a wide array of features waiting to be discovered. Dive in and explore the possibilities to fully leverage its capabilities. Have fun experimenting and uncovering new ways to enhance your LLM project.

要查看或添加评论,请登录

Nedved Yang的更多文章

社区洞察

其他会员也浏览了