登录查看更多内容

Geek Out Time: Play with LangChain

Nedved Yang

发布日期: 2024年3月5日

LangChain (https://www.langchain.com/) is an open-source framework designed for building applications using large language models (LLMs). It provides tools and abstractions to simplify the development of applications like chatbots and virtual agents, enabling these applications to be context-aware and capable of sophisticated reasoning.

I experimented with this framework on Google Colab to build a simple RAG(Retrieval-Augmented Generation) application, including setting up the necessary environment, installing key Python libraries, authenticating with the OpenAI API, preparing the data preparation, vectorization for efficient search, and crafting an AI conversation flow using LangChain. This RAG system helps to deliver precise, context-aware responses, utilizing a vast knowledge base. Let’s start.

Set up the environment on Google Colab

Open https://colab.research.google.com/ and create your notebook.
Prepare the data.txt file, which contains the information you want your bot to use. I have created a simple text file with information below that obviously cannot be obtained from the internet. Place the file in your Google Drive in a location accessible to the program

Nedved yang likes to eat chicken rice

Import the libraries and set up the open AI key

# Import necessary libraries and modules from langchain and other packages
from langchain.chains import RetrievalQA, ConversationalRetrievalChain
from langchain.chat_models import ChatOpenAI
from langchain.document_loaders import TextLoader
from langchain.vectorstores import DocArrayInMemorySearch, FAISS
from langchain.embeddings import OpenAIEmbeddings, HuggingFaceInstructEmbeddings
from langchain.memory import ConversationBufferMemory
from langchain.indexes import VectorstoreIndexCreator
from langchain_experimental.agents.agent_toolkits.csv.base import create_csv_agent
from langchain.agents.agent_types import AgentType
import openai

# For Google Colab users, mount Google Drive to access files
from google.colab import drive
drive.mount('/content/drive/')

import os

# Request and configure the OpenAI API key for usage
api_key = input("OpenAI API key: ")
os.environ["OPENAI_API_KEY"] = api_key
print("OPENAI_API_KEY has been successfully configured.")

# Display utilities from IPython for enhanced output formatting
from IPython.display import display, Markdown

# Note: This code snippet assumes you're working in a Google Colab environment and requires an OpenAI API key.
# It includes mounting Google Drive for accessing files and setting up environment variables for OpenAI API access.

Load the data to the vector database

# Split the 'data.txt' file into chunks and create embeddings from those chunks.
# Ensure to check your OpenAI API quota before proceeding.

from langchain.text_splitter import CharacterTextSplitter
from langchain.loading import TextLoader
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import FAISS

# Define the path to your text file
text_file_path = '/content/drive/MyDrive/LCTest/data.txt'

# Load the text data from the specified file path
text_data_loader = TextLoader(file_path=text_file_path, encoding="utf-8")
text_data = text_data_loader.load()

# Initialize the text splitter with specific chunk size and overlap
splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=200)

# Split the loaded text data into chunks
chunked_data = splitter.split_documents(text_data)

# Initialize the embeddings and vector store for the chunked data
embedder = OpenAIEmbeddings()
vector_store = FAISS.from_documents(chunked_data, embedding=embedder)

Create a conversation Chain

领英推荐

TAI #126; New Gemini, Pixtral, and Qwen 2.5 model…

Towards AI 4 个月前

Hallucination-Free, Self-Tuned, Fast Hierarchical LLMs…

Vincent Granville 11 个月前

?? AI K-news #19

Keepler Data Tech 2 个月前


# Initialize a conversational chain with a language model for dynamic conversation handling.
from langchain.llms import ChatOpenAI
from langchain.memories import ConversationBufferMemory
from langchain.chains import ConversationalRetrievalChain

# Set up the language model with specific parameters for conversation generation.
language_model = ChatOpenAI(temperature=0.1, model_name="gpt-3.5-turbo")

# Configure a memory buffer to store and retrieve conversation history.
conversation_memory = ConversationBufferMemory(
    memory_key='chat_history',  # Key to identify conversation history in memory.
    return_messages=True        # Option to return previous messages in the conversation.
)

# Create a conversational retrieval chain that leverages the language model,
# a specified retriever for information retrieval, and a memory buffer for context.
conversation_chain = ConversationalRetrievalChain.from_llm(
    llm=language_model,
    chain_type="custom",  # Specify the type of conversational chain. "stuff" is replaced with "custom" for clarity.
    retriever=vector_store.as_retriever(),  # Use the previously created vector store as the information retriever.
    memory=conversation_memory  # Include the conversation memory for context-aware conversations.
)

Query

# Formulate a query to find out Nedved Yang's favorite food using the conversational chain.
query_text = "What is the favorite food for Nedved Yang?"

# Execute the query through the conversation chain to obtain a response.
query_response = conversation_chain(query={"question": query_text})

# Extract the answer from the query response.
favorite_food_answer = query_response["answer"]

# Display the obtained answer.
favorite_food_answer

The output below is from the data.txt file.

Nedved Yang likes to eat chicken rice.

Try another one,

# Request suggestions for places in Singapore where Nedved Yang can make purchases.
purchase_query = "Can you suggest places in Singapore for Nedved Yang to buy?"

# Submit the query to the conversational chain and capture the response.
purchase_response = conversation_chain({"question": purchase_query})

# Extract the suggested places from the response.
suggested_places = purchase_response["answer"]

# Output the list of suggested places.
suggested_places

The output is

Nedved Yang can buy chicken rice, his favorite food, at various hawker centers and food courts in Singapore. Some popular places to try chicken rice include Maxwell Food Centre, Tian Tian Hainanese Chicken Rice at Maxwell Road, and Chinatown Complex Food Centre.

The memory has been preserved, and the prompt containing the information retrieved from data.txt has been successfully sent to GPT-3.5 Turbo. It looks awesome.

Langchain is a powerful tool that offers a wide array of features waiting to be discovered. Dive in and explore the possibilities to fully leverage its capabilities. Have fun experimenting and uncovering new ways to enhance your LLM project.

要查看或添加评论，请登录

Nedved Yang的更多文章

Geek Out Time: Model Context Protocol (MCP) and the Future of AI Tooling - Upgrading Previously-Built Multi-Agent Financial Advisor Copilot

2025年3月24日

Geek Out Time: Model Context Protocol (MCP) and the Future of AI Tooling - Upgrading Previously-Built Multi-Agent Financial Advisor Copilot

(Also on Constellar tech blog:…
Geek Out Time: Trying newly released OpenAI’s Responses API with Web Search Tool in Google Colab

2025年3月17日

Geek Out Time: Trying newly released OpenAI’s Responses API with Web Search Tool in Google Colab

(Also on Constellar tech blog:…

2 条评论
Geek Out Time: Building a Multi-Agent Financial Advisor Copilot with AG2 (formerly AutoGen), OpenAI, and DeepSeek LLM

2025年3月3日

Geek Out Time: Building a Multi-Agent Financial Advisor Copilot with AG2 (formerly AutoGen), OpenAI, and DeepSeek LLM

(Also on Constellar tech blog…

2 条评论
Geek Out Time: Simulating Distributed Training on TPU & GPU in Google Colab

2025年2月24日

Geek Out Time: Simulating Distributed Training on TPU & GPU in Google Colab

(Also on Constellar tech blog…
Geek Out Time: “Vibe Coding” on Google Colab with OpenAI & DeepSeek

2025年2月17日

Geek Out Time: “Vibe Coding” on Google Colab with OpenAI & DeepSeek

(Also on Constellar tech blog…

2 条评论
Geek Out Time: Mixture of Experts(MoE) vs. CNN: A Google Colab Experiment

2025年2月10日

Geek Out Time: Mixture of Experts(MoE) vs. CNN: A Google Colab Experiment

(Also on Constellar tech blog…

4 条评论
Geek Out Time: Knowledge Distillation in TensorFlow- Smaller, Smarter Models in Google Colab

2025年2月4日

Geek Out Time: Knowledge Distillation in TensorFlow- Smaller, Smarter Models in Google Colab

(Also on Constellar tech blog…
Geek Out Time: Build Your Own Autonomous AI Agent Backed by the Top Open-Source LLM DeepSeek v3 and Browser-Use Web UI-Right in Your Browser

2025年1月20日

Geek Out Time: Build Your Own Autonomous AI Agent Backed by the Top Open-Source LLM DeepSeek v3 and Browser-Use Web UI-Right in Your Browser

(Also on Constellar tech blog…

2 条评论
Geek Out Time: AI Model Routing — Dynamically Choose Models Based on Question Complexity

2025年1月13日

Geek Out Time: AI Model Routing — Dynamically Choose Models Based on Question Complexity

(Also on Constellar tech blog…
Geek Out Time: AI in the Browser- Run WebLLM for Powerful, Local LLM Experiences

2024年12月23日

Geek Out Time: AI in the Browser- Run WebLLM for Powerful, Local LLM Experiences

(Also on Constellar tech blog https://nedvedyang.medium.

1 条评论

See all articles

Geek Out Time: Play with LangChain

Nedved Yang

领英推荐

Nedved Yang的更多文章

社区洞察

其他会员也浏览了

OpenAI’s New GPT-4o Mini Is Giving Competitors A Run For Less Money, & More

Artificial Intelligence #207

Artificial Intelligence #207

AI eats software

?? Improving RAG with Self-Feedback

Watch#5: Enjoying a Free Lunch and Boosting the Math Capabilities of Small LLMs

The Gods of Intelligence

??Top ML Papers of the Week

?? How to Expand LLMs Memory

Artificial Intelligence #106

领英推荐

Nedved Yang的更多文章

Geek Out Time: Model Context Protocol (MCP) and the Future of AI Tooling - Upgrading Previously-Built Multi-Agent Financial Advisor Copilot

Geek Out Time: Trying newly released OpenAI’s Responses API with Web Search Tool in Google Colab

Geek Out Time: Building a Multi-Agent Financial Advisor Copilot with AG2 (formerly AutoGen), OpenAI, and DeepSeek LLM

Geek Out Time: Simulating Distributed Training on TPU & GPU in Google Colab

Geek Out Time: “Vibe Coding” on Google Colab with OpenAI & DeepSeek

Geek Out Time: Mixture of Experts(MoE) vs. CNN: A Google Colab Experiment

Geek Out Time: Knowledge Distillation in TensorFlow- Smaller, Smarter Models in Google Colab

Geek Out Time: Build Your Own Autonomous AI Agent Backed by the Top Open-Source LLM DeepSeek v3 and Browser-Use Web UI-Right in Your Browser

Geek Out Time: AI Model Routing — Dynamically Choose Models Based on Question Complexity

Geek Out Time: AI in the Browser- Run WebLLM for Powerful, Local LLM Experiences

社区洞察

其他会员也浏览了

OpenAI’s New GPT-4o Mini Is Giving Competitors A Run For Less Money, & More

Artificial Intelligence #207

Artificial Intelligence #207

AI eats software

?? Improving RAG with Self-Feedback

Watch#5: Enjoying a Free Lunch and Boosting the Math Capabilities of Small LLMs

The Gods of Intelligence

??Top ML Papers of the Week

?? How to Expand LLMs Memory

Artificial Intelligence #106