22-2-1 In-Memory Storage and Retrieval with Redis

22-2-1 In-Memory Storage and Retrieval with Redis

Redis, renowned for its speed and performance, is a pivotal component in enhancing the contextual understanding of user queries in LLM powered applications. This in-memory data structure store, versatile in its functionality, can serve as a database, cache, message broker, and streaming engine. Its primary advantage lies in its ability to operate with in-memory datasets, ensuring rapid read and write operations crucial for real-time applications.

When considering the storage of conversational context in LLMs, traditional methods that involve directly saving user queries and responses to memory may lead to inefficiencies, particularly as this approach can swiftly reach the token limits of the LLM.

A more sophisticated strategy might involve indexing queries, allowing for the selective retrieval of pertinent parts of prior conversations. This method hinges on a robust storage engine and meticulous planning of indexing strategies to ensure both speed and relevance, especially given the time constraints inherent in making API calls.

Redis emerges as a particularly suitable tool for caching conversations, attributing to its swift and straightforward storage and retrieval capabilities. Additionally, later exploration of Langchain could offer a pathway to achieving similar outcomes with enhanced efficiency.

To integrate Redis into your LLM based system, there are several deployment options available:

  1. Local Installation via Homebrew (Mac): A convenient option for Mac users, providing an easy setup process through the popular package manager.
  2. Running with Docker: This approach offers a containerized environment, ensuring consistency across different development and production setups.
  3. Redis Cloud: A cloud-based solution that provides the benefits of Redis without the need for manual installation and maintenance.
  4. AWS Elasticache with Redis: For those seeking a more scalable and managed service, AWS Elasticache presents a robust platform, leveraging the power of Redis within the AWS ecosystem.

Let's proceed with Redis Cloud which is the managed DB as a Service offering by Redis.

Redis Cloud – Fully Managed Cloud Service | Redis

After we sign up, we can create a demo (free) Redis database

1. Create Database Name
2. Save (default) User Password

3. Save the Public Endpoint

(Optional): To examine the data stored download Redis Insight (RedisInsight | Redis GUI)

We are now ready to use Redis and LLMs together in our tutorial.

pip install redis        
import redis

r = redis.Redis(
  host='redis-10985.c294.ap-northeast-1-2.ec2.cloud.redislabs.com',
  port=10985,
  password='PASSWORD_GOES_HERE')        

It is dead simple to connect to a Redis instance. For locally hosted Redis, the host details will be localhost without any password required.

Now we want to store data to Redis from a LLM based application. Here is a high level overview.

  1. User enters a query/prompt
  2. Write Function saves the query to redis
  3. LLM response with query
  4. User enters another related query
  5. Read Function retrieves the previous query
  6. The context + query is sent to LLM for response

Earlier tutorials used text boxes as input UI. We rely on gradio for the chat user-interface.

pip install gradio         

To store messages we need to add key:value pairs. The key value is how we will retrieve the message from redis.

We will generate an unique session id for the key.

import uuid

# Generate a random session ID
session_id = str(uuid.uuid4())        
def store_message_in_redis(message):
    key = f"session:{session_id}"
    print(f"Storing message with key: {key}")

    # Retrieve the existing message if it exists
    existing_message = r.hget(key, "message")
    if existing_message is not None:
        existing_message = existing_message.decode('utf-8')
        # Append the new message to the existing message
        updated_message = existing_message + " " + message
    else:
        updated_message = message

    # Store the updated message
    r.hset(key, "message", updated_message)
    r.lpush("session_keys", key)

    return key        

In the respond function, we include store_message_in_redis to store the user query.

def respond(message, chat_history):
        # Store user message in Redis
        store_message_in_redis(message)

        bot_message = random.choice(["How are you?", "I love you", "I'm very hungry"])
        chat_history.append((message, bot_message))
        time.sleep(2)
        return "", chat_history        

We now need to modify the function to respond to the user query with a LLM instead generating random placeholder text.

After importing openai library and saving the OpenAI tokens,

## pip install openai
import openai 

OPENAI_API_KEY = "TOKEN_KEY"        
# initialize OpenAI client
client = OpenAI(api_key=OPENAI_API_KEY)

# Function to generate response using GPT-3.5-turbo
def generate_response_with_chat(message):
    try:
        response = client.chat.completions.create(
            model="gpt-3.5-turbo",
            messages=[{"role": "user", "content": message}]
        )
        return response.choices[0].message.content
    except Exception as e:
        return f"Error in generating response: {str(e)}"        

We combine the chat generation and function to write user queries to Redis:

with gr.Blocks() as demo:
    chatbot = gr.Chatbot()
    msg = gr.Textbox()
    clear = gr.ClearButton([msg, chatbot])

    def respond(message, chat_history):
        # Store user message in Redis
        key = store_message_in_redis(message)

        # Generate response using GPT-3.5-turbo
        bot_message = generate_response_with_gpt(message,key)
        chat_history.append((message, bot_message))
        time.sleep(2)
        return "", chat_history

    msg.submit(respond, [msg, chatbot], [msg, chatbot])

if __name__ == "__main__":
    demo.launch()        

We need to add another function to retrieve past queries.


def retrieve_query_by_key(key):
    # Check if the key exists in Redis
    if r.exists(key):
        # Retrieve and decode the actual message stored under the hash
        return r.hget(key, "message").decode('utf-8')
    return None        

Let's piece it all together and examine the core components.

  • Redis Connection: First, it connects to a Redis database. This is like setting up a storage locker, where we specify where the locker is (the endpoint), how to access it (the port), and the secret key to open it (the password).
  • Session ID Creation: For every user who chats with the bot, a unique ID is generated using something called uuid.uuid4(). Think of this as giving each user a special passcode so the bot can remember who they are.
  • Storing Messages in Redis: Whenever a user sends a message, the store_message_in_redis function kicks in. If they've talked before, it adds the new message to their existing conversation.
  • Retrieving Past Conversations: The retrieve_past_query_by_key function acts like a memory recall. It looks back at previous messages using the user's session ID. This helps the chatbot remember what was talked about before.
  • OpenAI Client Setup: The code then wakes up the OpenAI GPT model by handing it an API key. This is like giving the model a special access card to start generating smart, contextual responses.
  • Generating Chatbot Responses: With generate_response_with_chat, the chatbot doesn't just reply to the latest message. It also considers what was said earlier, giving a response that makes sense in the larger conversation. And if something goes wrong, it doesn't just freeze up; it logs the issue so it can be fixed.
  • Gradio Interface: The front desk of this chatbot is a Gradio interface. It's user-friendly, featuring a chat window and a text box for typing messages. There's even a 'clear' button to start a new conversation.
  • Chatbot Response Mechanism: When a user types a message, the respond function is like the engine room. It takes the message, stores it, gets the chatbot's reply, and updates the chat window.
  • Launching the Application: Finally, the if __name__ == "__main__": part is like the green light for the application. If this script is the main act (and not just a supporting character), the Gradio app launches in a special debug mode, ready for action.

import uuid
import redis
import time
import gradio as gr
from openai import OpenAI

r = redis.Redis(
  host='REDIS_CLOUD_ENDPOINT',
  port=10985,
  password='PASSWORD')

# Generate a random session ID
session_id = str(uuid.uuid4())

def store_message_in_redis(message):
    key = f"session:{session_id}"
    print(f"Storing message with key: {key}")

    # Retrieve the existing message if it exists
    existing_message = r.hget(key, "message")
    if existing_message is not None:
        existing_message = existing_message.decode('utf-8')
        # Append the new message to the existing message
        updated_message = existing_message + " " + message
    else:
        updated_message = message

    # Store the updated message
    r.hset(key, "message", updated_message)
    r.lpush("session_keys", key)

    return key

def retrieve_past_query_by_key(key):
    # Check if the list exists and has at least 2 messages (current and past)
    if r.llen(key) >= 2:
        # Retrieve the second-to-last message (past query)
        past_query = r.lindex(key, -2).decode('utf-8')
        print(f"Retrieved past message: {past_query}")  # Log the retrieved message
        return past_query
    return None


# Initialize OpenAI client
client = OpenAI(api_key=OPENAI_API_KEY)

def generate_response_with_chat(message, key):
    past_query = retrieve_query_by_key(key)
    messages = []
    system_message = f"Previous conversation context: {past_query}. Respond to the following user query based on previous query."
    messages.append({"role": "system", "content": system_message})
    messages.append({"role": "user", "content": message})

    try:
        response = client.chat.completions.create(
            model="gpt-3.5-turbo",
            messages=messages
        )
        print(f"Generated response: {response.choices[0].message.content}")  # Log the response
        return response.choices[0].message.content
    except Exception as e:
        print(f"Error in generating response: {str(e)}")  # Log errors
        return f"Error in generating response: {str(e)}"

with gr.Blocks() as demo:
    chatbot = gr.Chatbot()
    msg = gr.Textbox()
    clear = gr.ClearButton([msg, chatbot])

    def respond(message, chat_history):
        key = store_message_in_redis(message)
        bot_message = generate_response_with_chat(message, key)
        chat_history.append((message, bot_message))
        time.sleep(2)
        return "", chat_history

    msg.submit(respond, [msg, chatbot], [msg, chatbot])

if __name__ == "__main__":
    demo.launch(debug=True)        

You can run the above code in google colab and it will display a chat interface or a temporary URL to access the gradio application.

In these images, the first query specifically inquires about Edward Gibbon, while the second one delves into the significance of his work, without directly naming him. Despite this, the application seamlessly connects the dots, thanks to the prior queries stored in Redis. This enables it to contextualize the second question in relation to Edward Gibbon, enhancing the relevance and accuracy of its response.

To follow along here is the colab notebook. reminder to add your own Redis Cloud Endpoint, P/W and OpenAI credentials. https://colab.research.google.com/drive/1-71QU31-CcjMeBSp7o5Rfp4IEZVBwEzy?usp=sharing

Next up we will use Langchain to simplify the processing adding memory to our LLM apps. Besides Redis, we will review what other options are viable for quick storage and retrieval without going deep into the world of vector DBs.




要查看或添加评论,请登录

社区洞察

其他会员也浏览了