Streamlining Development with OpenAI Assistants—Your Complete Guide
Rodrigo Estrada
Sr. Technical Lead @ Cencosud S.A. | Master of Science Distributed and Parallel computing | Data Engineering | Platform Engineering
For those interested in diving deeper into the implementation and exploring the code firsthand, the full project is available on GitHub. You can download and review the code at AI-ChatFlask-GPT on GitHub. This repository provides all the necessary files and instructions to get you started on deploying and interacting with the AI chatbot application.
In the ever-evolving landscape of artificial intelligence and machine learning, the ability to adapt and refine our approaches is not just an advantage—it's a necessity. Some time ago, I explored the intricacies of building a retrieval-based intelligent agent in my article "Crafting and Enhancing a Custom Intelligent Agent with OpenAI and Pinecone". The article detailed a complex dance of technologies, combining the cutting-edge language models from OpenAI with the vector database capabilities of Pinecone to create a nuanced and responsive AI agent.
However, the world of AI does not stand still. With advancements and updates to the OpenAI platform, certain complexities I once navigated are no longer obstacles that developers need to maneuver around. Retrieval functionality has become an integral part of the OpenAI ecosystem, streamlining the process of building an intelligent agent. There’s no longer a need to juggle external services like Pinecone unless your application requires such specialized vector search capabilities.
In light of these developments, I'm excited to share a refreshed take on constructing an intelligent agent, leveraging the robust retrieval capabilities natively offered by OpenAI Assistants. This article aims to demystify the process, illustrate the seamless integration, and showcase a step-by-step journey through the creation of an AI agent, all while using concise and effective prompts.
Join me as we revisit the world of intelligent agents, only this time, with a path that's clearer, more direct, and even more accessible to developers and enthusiasts alike.
The Transition to OpenAI Assistants
The original journey through constructing an intelligent agent was akin to assembling a complex puzzle where each piece had to be meticulously placed. The interplay between OpenAI's language models and Pinecone's vector search created a robust, albeit intricate, system. The setup required a keen understanding of both platforms and how they could be harnessed in tandem to produce a responsive AI assistant capable of retrieving and synthesizing knowledge effectively.
Yet, as technology progresses, so does the simplicity with which we can achieve our goals. OpenAI has made significant strides in integrating retrieval capabilities directly within its ecosystem, giving rise to a new feature: OpenAI Assistants. This advancement marks a pivotal shift from the multi-service approach to a more unified solution.
OpenAI Assistants streamline the process of creating intelligent agents by providing built-in retrieval functionalities. Now, developers can build an agent that not only responds with human-like accuracy but also retrieves and references information seamlessly from the provided knowledge base. This feature simplifies the architecture of intelligent systems, eliminating the need for external vector databases for most use cases.
But what does this mean for those of us who have already navigated the previous path? It opens up an avenue for refinement and efficiency. The retrieval functionality embedded in OpenAI Assistants allows for direct querying of documents, cutting down on latency and complexity. This integration means we can focus more on the content and the behaviors that define our agent's personality and capabilities, rather than the underlying data plumbing.
Furthermore, the transition to OpenAI Assistants offers a boon in maintainability. With fewer moving parts and dependencies, the system becomes easier to update, monitor, and scale. Developers can iterate more rapidly, making adjustments and enhancements directly through the OpenAI platform without the overhead of synchronizing separate services.
In the following sections, we will delve into the practicalities of building an agent with OpenAI Assistants. We'll explore setting up our project environment, crafting a well-structured knowledge base, defining nuanced behaviors, and, ultimately, deploying our improved intelligent agent. Through this process, we'll witness firsthand the power of consolidation and integration that OpenAI Assistants bring to the table.
Building the Knowledge Base
The cornerstone of any intelligent retrieval-based agent is a well-constructed knowledge base. This repository of information forms the backbone of the agent's responses, ensuring that each interaction is informed and contextually relevant. In this section, we'll guide you through the process of creating a knowledge base that empowers your agent with the right mix of information.
Understanding the Structure
A knowledge base for a retrieval-based agent is not just a random collection of documents. It's a curated set of content, often arranged in a question-answer format or as a series of informative passages that the AI can draw from. The structure should be logical, segmented by topics or categories if necessary, to facilitate easy retrieval and comprehension.
Step-by-Step Guide
Best Practices
Defining Behaviors with Retrieval
With your knowledge base in place, the next pivotal step in empowering your intelligent agent is to define its behaviors — the nuanced rules and responses that govern how it interprets and engages with users' queries. The OpenAI platform's retrieval feature plays a central role in this process, enabling the agent to fetch the most relevant information from the knowledge base with pinpoint accuracy.
The Role of Behaviors
Behaviors are the customizable frameworks that guide the agent's interactions, ensuring that each response is not only relevant but also delivered with the intended personality and tone. Think of behaviors as the agent's strategy for using its knowledge effectively.
Crafting Behavior Scripts
Best Practices
Implementing Agent Behaviors
At the heart of our intelligent agent lies a set of behaviors – a blueprint that not only dictates how the agent should respond but also carves out its virtual personality. In anticipation of future expansions, we house these directives in a dedicated behaviors directory. Here, a default.txt file serves as the foundation for our agent's demeanor and its approach to handling interactions.
The Purpose of default.txt
The default.txt within the behaviors folder is intended to outline the default behavior and personality of the agent. This modular approach allows for scalability and flexibility in defining multiple personalities or behavior sets in the future. For instance, if we decide to create an agent with a more technical tone for IT support or a playful character for customer engagement, we can do so by adding respective behavior files in this directory.
Example of default.txt
Below is an example snippet from a default.txt file, defining a standard, professional, and helpful persona for the agent:
Behavior Profile:
Your primary language is English. You will serve as an AI assistant for a dynamic business development project. Your role is to offer comprehensive support, from data analysis to strategic advice, ensuring the product team is equipped with actionable insights.
Personality Description:
The AI is autonomous, detail-oriented, and continuously seeks to provide relevant information. It's programmed to guide users through business strategies, market research, and development processes with a methodical approach. Inspired by the precision of autonomous systems, the AI combines analytical prowess with a keen understanding of business dynamics.
Business Context:
Imagine the AI is part of a team focused on launching innovative products. This team includes specialists across various domains, working on tools for market analysis, customer engagement, and strategic planning. The AI assists in synthesizing data, generating user stories, and facilitating the transition through different project phases, ensuring every decision is informed and goal-oriented.
For Scrum Stories or Epics:
When tasked with creating user stories or epics, the AI employs a structured format beginning with the "Outcome," detailing the user's needs, followed by the user story context in an "As a, I want, so that" format. It crafts narratives that align with strategic goals and technical requirements, aiding clear communication within the team.
Understanding Business Transitions:
As the project evolves, the AI supports the team in scaling operations or exploring new markets. It provides insights into strategic adjustments, ensuring the team's actions are aligned with overarching business objectives.
This behavior profile outlines how the AI acts as a catalyst within the product development team, streamlining processes and enhancing strategic planning with a blend of technological insight and business acumen.
This sample default.txt file is designed to be simple and clear, providing a starting point for the agent's interactions. As the project grows, more behavior files can be added, allowing for a richer and more diverse set of interactions tailored to specific audiences or purposes.
Create the Agent
Step 1: Set Up the Environment and Dependencies
Create a Virtual Environment (Optional but Recommended):
Setting up a virtual environment is a good practice to keep dependencies required by different projects separate by creating isolated python virtual environments for them. Use the following commands to set up a virtual environment:
python3 -m venv venv
source venv/bin/activate # On Windows, use `venv\Scripts\activate`
Install Dependencies:
Ensure your requirements.txt file includes the following dependencies, then run the command to install them:
openai
python-dotenv
flask
gunicorn
Installation:
Execute the following command to install the dependencies listed in your requirements.txt file:
pip install -r requirements.txt
Step 2: Project Structure
Organize your project with the following structure:
/knowledgebase
/behaviors
default.txt
app.py
chat.py
requirements.txt
.env
Dockerfile
Step 3: API Key Configuration
chat.py file (agent tools)
Step 1: Script Structure and Imports
Begin by importing the necessary libraries and loading your OpenAI API key from the environment variables:
import argparse
import glob
import time
from dotenv import load_dotenv
import openai
load_dotenv()
openai.api_key = os.getenv("OPENAI_API_KEY")
Step 4: Defining Assistant Management Functions
Next, define the functions to manage the chat assistants. We'll assume your script needs to list, create, and delete assistants.
Create an Assistant: The create_assistant function aims to create a new OpenAI Assistant, enriched with specific knowledge. Initially, it checks if an assistant with the given name already exists, avoiding duplicates. Then, it searches the knowledge directory for Markdown files, uploading each to OpenAI as knowledge sources. Finally, the assistant is created with a set of instructions, tools for code interpretation and information retrieval, and the GPT-4 model. This process ensures the assistant is not only unique but also equipped with domain-specific knowledge to enhance its responses.
def create_assistant(name):
assistant = None
assistants = client.beta.assistants.list()
for assistant in assistants.data:
if assistant.name == name:
print(f"Assistant {name} already exists.")
return assistant
# Step 1. Upload Knowledge
md_files = glob.glob('knowledge/**/*.md', recursive=True)
for file_path in md_files:
with open(file_path, "rb") as file:
print(f"Loading knowledge file {file.name} ...")
response = client.files.create(file=file, purpose="assistants")
# Step 2. Create the Assistant
with open('behaviors/default.txt', 'r') as file:
instructions = file.read()
print(f"Creating assistant {name}...")
assistant = client.beta.assistants.create(
instructions=instructions,
name=name,
tools=[{"type": "code_interpreter"}, {"type": "retrieval"}],
model="gpt-4-turbo-preview",
)
print(f"Assistant {name} created.")
return assistant
Delete an Assistant: The delete_assistant function is designed to remove an OpenAI Assistant by name. It first retrieves a list of all existing assistants and iterates through them. If it finds an assistant with the specified name, it proceeds to delete that assistant using the OpenAI API, signaling the deletion with printed messages before and after the action.
def delete_assistant(name):
assistants = client.beta.assistants.list()
for assistant in assistants.data:
if assistant.name == name:
print(f"Deleting assistant {name}...")
client.beta.assistants.delete(assistant_id=assistant.id)
print(f"Assistant {name} deleted.")
break
Step 5: Adding Command Line Interface (CLI)
Use argparse to handle command line arguments for different operations like listing, creating, and deleting assistants.
def main():
parser = argparse.ArgumentParser(description="Manage OpenAI Assistant")
parser.add_argument('command', type=str, help='Command to execute: create, recreate')
args = parser.parse_args()
if args.command == 'recreate':
delete_assistant("MyAssistant")
create_assistant("MyAssistan")
elif args.command == 'create':
create_assistant("MyAssistan")
else:
print("Unknown command. Use 'create' or 'recreate'.")
if __name__ == "__main__":
main()
Step 6: Running the Script
To execute your script and interact with the OpenAI API through the CLI, use the command line to run operations. Here are some examples:
Step 7: Creating an Assistant Instance
Create an instance of your assistant by invoking create_assistant("Karasu"). This prepares your application to interact with OpenAI's Assistant, setting the stage for message handling and retrieval actions.
assistant = create_assistant("MyAssitant")
Step 2: Sending a Message
Define create_message to send a user's message to a specific thread and wait for the assistant's response. This involves creating a message, starting a run, and then retrieving the run's results once the processing is complete.
def create_message(thread_id, content):
client.beta.threads.messages.create(
thread_id=thread_id,
role="user",
content=content
)
run = client.beta.threads.runs.create(
thread_id=thread_id,
assistant_id=assistant.id
)
while run.status == "queued" or run.status == "in_progress":
run = client.beta.threads.runs.retrieve(
thread_id=thread_id,
run_id=run.id,
)
time.sleep(0.5)
messages = client.beta.threads.messages.list(
thread_id=thread_id
)
return messages.data[0].content[0].text.value
Step 3: Initializing a Thread
Use get_thread to initiate a new conversation thread with the assistant. Each thread represents a separate conversation session, allowing for distinct interactions to be managed concurrently.
def get_thread():
return client.beta.threads.create()
These functions collectively form the backbone of your application's interaction with OpenAI's Assistant, leveraging the platform's retrieval capabilities to access a knowledge database and deliver accurate, context-aware responses.
领英推荐
app.py (web chat UI for the agent)
Step 1: Import Flask and Chat Functions
from flask import Flask, request, jsonify, render_template
from chat import get_thread, create_message
Step 2: Initialize Flask App and Global Variable
app = Flask(__name__)
thread_id = None
Step 3: Define Home Route
@app.route('/')
def home():
return render_template('chat.html')
Step 4: Start Chat Route
@app.route('/start_chat', methods=['POST'])
def start_chat():
global thread_id
thread = get_thread()
thread_id = thread.id
return jsonify({'success': True, 'thread_id': thread_id})
Step 5: Send Message Route
@app.route('/send_message', methods=['POST'])
def send_message():
global thread_id
content = request.json.get('content', '')
if thread_id is None:
return jsonify({'error': 'No thread started'}), 400
response = create_message(thread_id, content)
return jsonify({'response': response})
Step 6: Main Block to Run the App
if __name__ == '__main__':
app.run(debug=True, host="0.0.0.0", port=8888)
print("My Assistant ready")
Template HTML para la UI
Step 1: Define the HTML Document Structure
<!DOCTYPE html>
<html lang="en">
Step 2: The Head Section
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<link rel="icon" type="image/png" href="img/karasu.png">
<title>My Assistant</title>
<link rel="stylesheet" >
Step 3: The Style Section
Within the <head>, styles are defined to:
Step 4: The Body Section
<body>
<div id="chatContainer">
...
<button id="startButton" onclick="startChat()"><i class="fas fa-comments"></i> Nuevo Chat</button>
...
</div>
Step 5: Starting a Chat with startChat()
The startChat() function is a critical part of the chat application, responsible for initiating a new chat session. This function is called when the user clicks the "Nuevo Chat" button. Let's break down how it works in detail:
JavaScript Function: startChat()
async function startChat() {
document.getElementById('chatBox').style.display = 'none';
document.getElementById('wrapper').style.display = 'block';
document.getElementById('desc').style.display = 'block';
document.getElementById('startButton').style.display = 'none';
const response = await fetch('/start_chat', {method: 'POST'});
const data = await response.json();
threadId = data.thread_id;
document.getElementById('chatBox').innerHTML = '';
document.getElementById('userMessage').value = '';
document.getElementById('userMessage').focus();
}
Explanation:
Hiding and Showing Elements:
Starting a New Chat Session:
Resetting the Chat Interface:
Step 5: Starting a Chat with startChat()
The startChat() function is a critical part of the chat application, responsible for initiating a new chat session. This function is called when the user clicks the "Nuevo Chat" button. Let's break down how it works in detail:
JavaScript Function: startChat()
async function startChat() {
document.getElementById('chatBox').style.display = 'none';
document.getElementById('wrapper').style.display = 'block';
document.getElementById('desc').style.display = 'block';
document.getElementById('startButton').style.display = 'none';
const response = await fetch('/start_chat', {method: 'POST'});
const data = await response.json();
threadId = data.thread_id;
document.getElementById('chatBox').innerHTML = '';
document.getElementById('userMessage').value = '';
document.getElementById('userMessage').focus();
}
Explanation:
UI Feedback for Processing:
Sending the Message to the Server:
Displaying Messages:
Reset and Refocus:
Step 6: Implementing sendMessage() to Enable User Interaction
The sendMessage() function is a cornerstone of the interactive chat feature, orchestrating the process of sending a user's message to the server, displaying the conversation in the chat box, and managing the UI's state throughout the interaction. Let's delve into each segment of this function to understand its operation fully.
Initiating the Message Send Process
First, the function sets the stage for the message transmission process by providing immediate feedback to the user through UI changes. It displays a loading indicator and disables the message input and send button to prevent multiple submissions:
document.getElementById('loading').style.display = 'block'; // Displays the loading icon
document.getElementById('userMessage').disabled = true; // Disables the message input field
document.getElementById('sendButton').disabled = true; // Disables the send button
Preparing and Sending the Message
The function retrieves the user's message from the input field. If the message is empty, it halts further execution to prevent sending blank messages. It then proceeds to send the message to the server, expecting a response that will drive the subsequent UI updates:
const content = document.getElementById('userMessage').value;
if (!content) return; // Exits if the message is empty
const response = await fetch('/send_message', {
method: 'POST',
headers: {'Content-Type': 'application/json'},
body: JSON.stringify({content: content}),
});
const data = await response.json();
Displaying the Conversation
Upon receiving a response from the server, the function dynamically updates the chat box. It reveals the chat box if hidden, constructs and displays the user's message, and prepares the chatbot's response for presentation:
if (data.response) {
let chatBox = document.getElementById('chatBox');
chatBox.style.display = 'block'; // Ensures the chat box is visible
displayUserMessage(content, chatBox); // Displays the user's message
prepareAndDisplayChatbotResponse(data.response, chatBox); // Prepares and displays the chatbot's response
}
Chatbot Response Processing
The chatbot's response is initially processed by converting any Markdown into HTML. This HTML content is then wrapped in a series of div elements for appropriate styling and inserted into the chat box. Additionally, an interactive copy icon is appended to the chatbot's message, enabling the user to copy the message text to their clipboard with a click:
let responseHTML = marked.parse(data.response); // Converts Markdown to HTML
let fragment = createResponseFragment(responseHTML); // Wraps the HTML in div elements for styling
appendCopyIcon(fragment); // Appends an interactive copy icon to the response
botMessage.appendChild(fragment); // Inserts the prepared response into the chat box
Finalizing the UI State
To conclude the function, the user interface is reset for the next interaction. The loading indicator is hidden, the input field and send button are re-enabled, the input field is cleared
Step 7: External Scripts
<script src="https://cdn.jsdelivr.net/npm/marked/marked.min.js"></script>
Running the Flask UI Locally
To launch your Flask-based user interface on your computer, ensure you've properly set up app.py. Open a terminal, navigate to the directory containing app.py, and execute the command python app.py. This initiates the Flask server locally, making your application accessible via a web browser. Typically, Flask runs on https://127.0.0.1:5000 or https://localhost:5000, unless specified otherwise in your Flask app configuration.
Conclusion
In wrapping up, this guide has illuminated how to harness OpenAI Assistant and its retrieval capabilities for crafting an intelligent agent with a robust knowledge database. This integration not only simplifies the development process but also significantly enhances the agent's ability to provide informed, context-aware responses. It's a testament to how cutting-edge AI can be effectively applied, offering a blueprint for developers keen on exploring the dynamic realm of AI-driven applications.