22-2-1 In-Memory Storage and Retrieval with Redis
Redis, renowned for its speed and performance, is a pivotal component in enhancing the contextual understanding of user queries in LLM powered applications. This in-memory data structure store, versatile in its functionality, can serve as a database, cache, message broker, and streaming engine. Its primary advantage lies in its ability to operate with in-memory datasets, ensuring rapid read and write operations crucial for real-time applications.
When considering the storage of conversational context in LLMs, traditional methods that involve directly saving user queries and responses to memory may lead to inefficiencies, particularly as this approach can swiftly reach the token limits of the LLM.
A more sophisticated strategy might involve indexing queries, allowing for the selective retrieval of pertinent parts of prior conversations. This method hinges on a robust storage engine and meticulous planning of indexing strategies to ensure both speed and relevance, especially given the time constraints inherent in making API calls.
Redis emerges as a particularly suitable tool for caching conversations, attributing to its swift and straightforward storage and retrieval capabilities. Additionally, later exploration of Langchain could offer a pathway to achieving similar outcomes with enhanced efficiency.
To integrate Redis into your LLM based system, there are several deployment options available:
Let's proceed with Redis Cloud which is the managed DB as a Service offering by Redis.
After we sign up, we can create a demo (free) Redis database
(Optional): To examine the data stored download Redis Insight (RedisInsight | Redis GUI)
We are now ready to use Redis and LLMs together in our tutorial.
pip install redis
import redis
r = redis.Redis(
host='redis-10985.c294.ap-northeast-1-2.ec2.cloud.redislabs.com',
port=10985,
password='PASSWORD_GOES_HERE')
It is dead simple to connect to a Redis instance. For locally hosted Redis, the host details will be localhost without any password required.
Now we want to store data to Redis from a LLM based application. Here is a high level overview.
Earlier tutorials used text boxes as input UI. We rely on gradio for the chat user-interface.
pip install gradio
To store messages we need to add key:value pairs. The key value is how we will retrieve the message from redis.
领英推荐
We will generate an unique session id for the key.
import uuid
# Generate a random session ID
session_id = str(uuid.uuid4())
def store_message_in_redis(message):
key = f"session:{session_id}"
print(f"Storing message with key: {key}")
# Retrieve the existing message if it exists
existing_message = r.hget(key, "message")
if existing_message is not None:
existing_message = existing_message.decode('utf-8')
# Append the new message to the existing message
updated_message = existing_message + " " + message
else:
updated_message = message
# Store the updated message
r.hset(key, "message", updated_message)
r.lpush("session_keys", key)
return key
In the respond function, we include store_message_in_redis to store the user query.
def respond(message, chat_history):
# Store user message in Redis
store_message_in_redis(message)
bot_message = random.choice(["How are you?", "I love you", "I'm very hungry"])
chat_history.append((message, bot_message))
time.sleep(2)
return "", chat_history
We now need to modify the function to respond to the user query with a LLM instead generating random placeholder text.
After importing openai library and saving the OpenAI tokens,
## pip install openai
import openai
OPENAI_API_KEY = "TOKEN_KEY"
# initialize OpenAI client
client = OpenAI(api_key=OPENAI_API_KEY)
# Function to generate response using GPT-3.5-turbo
def generate_response_with_chat(message):
try:
response = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": message}]
)
return response.choices[0].message.content
except Exception as e:
return f"Error in generating response: {str(e)}"
We combine the chat generation and function to write user queries to Redis:
with gr.Blocks() as demo:
chatbot = gr.Chatbot()
msg = gr.Textbox()
clear = gr.ClearButton([msg, chatbot])
def respond(message, chat_history):
# Store user message in Redis
key = store_message_in_redis(message)
# Generate response using GPT-3.5-turbo
bot_message = generate_response_with_gpt(message,key)
chat_history.append((message, bot_message))
time.sleep(2)
return "", chat_history
msg.submit(respond, [msg, chatbot], [msg, chatbot])
if __name__ == "__main__":
demo.launch()
We need to add another function to retrieve past queries.
def retrieve_query_by_key(key):
# Check if the key exists in Redis
if r.exists(key):
# Retrieve and decode the actual message stored under the hash
return r.hget(key, "message").decode('utf-8')
return None
Let's piece it all together and examine the core components.
import uuid
import redis
import time
import gradio as gr
from openai import OpenAI
r = redis.Redis(
host='REDIS_CLOUD_ENDPOINT',
port=10985,
password='PASSWORD')
# Generate a random session ID
session_id = str(uuid.uuid4())
def store_message_in_redis(message):
key = f"session:{session_id}"
print(f"Storing message with key: {key}")
# Retrieve the existing message if it exists
existing_message = r.hget(key, "message")
if existing_message is not None:
existing_message = existing_message.decode('utf-8')
# Append the new message to the existing message
updated_message = existing_message + " " + message
else:
updated_message = message
# Store the updated message
r.hset(key, "message", updated_message)
r.lpush("session_keys", key)
return key
def retrieve_past_query_by_key(key):
# Check if the list exists and has at least 2 messages (current and past)
if r.llen(key) >= 2:
# Retrieve the second-to-last message (past query)
past_query = r.lindex(key, -2).decode('utf-8')
print(f"Retrieved past message: {past_query}") # Log the retrieved message
return past_query
return None
# Initialize OpenAI client
client = OpenAI(api_key=OPENAI_API_KEY)
def generate_response_with_chat(message, key):
past_query = retrieve_query_by_key(key)
messages = []
system_message = f"Previous conversation context: {past_query}. Respond to the following user query based on previous query."
messages.append({"role": "system", "content": system_message})
messages.append({"role": "user", "content": message})
try:
response = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=messages
)
print(f"Generated response: {response.choices[0].message.content}") # Log the response
return response.choices[0].message.content
except Exception as e:
print(f"Error in generating response: {str(e)}") # Log errors
return f"Error in generating response: {str(e)}"
with gr.Blocks() as demo:
chatbot = gr.Chatbot()
msg = gr.Textbox()
clear = gr.ClearButton([msg, chatbot])
def respond(message, chat_history):
key = store_message_in_redis(message)
bot_message = generate_response_with_chat(message, key)
chat_history.append((message, bot_message))
time.sleep(2)
return "", chat_history
msg.submit(respond, [msg, chatbot], [msg, chatbot])
if __name__ == "__main__":
demo.launch(debug=True)
You can run the above code in google colab and it will display a chat interface or a temporary URL to access the gradio application.
In these images, the first query specifically inquires about Edward Gibbon, while the second one delves into the significance of his work, without directly naming him. Despite this, the application seamlessly connects the dots, thanks to the prior queries stored in Redis. This enables it to contextualize the second question in relation to Edward Gibbon, enhancing the relevance and accuracy of its response.
To follow along here is the colab notebook. reminder to add your own Redis Cloud Endpoint, P/W and OpenAI credentials. https://colab.research.google.com/drive/1-71QU31-CcjMeBSp7o5Rfp4IEZVBwEzy?usp=sharing
Next up we will use Langchain to simplify the processing adding memory to our LLM apps. Besides Redis, we will review what other options are viable for quick storage and retrieval without going deep into the world of vector DBs.