登录查看更多内容

Creating an AI data journalist with the new OpenAI Assistants API

Simon Smith

EVP Generative AI at Klick

发布日期: 2023年11月7日

Yesterday I wrote about OpenAI's Dev Day announcements. One of the things I really wanted to try was creating a custom GPT. Unfortunately, I don't have access to that yet. (OpenAI, if you're reading, please grant it!) I do, however, have access to the new Assistants API, which allows something similar. I created my first application with it, and wanted to share the process. But first...

What is the Assistants API and why should you care?

Imagine a virtual assistant that you could task with an assignment, leave alone, and then come back to later for the output. This is different from ChatGPT, where you're engaged in a back-and-forth dialogue. Rather, it's true delegation.

Some people have tried to build tools for this, like AutoGPT, but in my experience they go off the rails fast and are unreliable.

The Assistants API is a step to addressing this. In a nutshell, you:

Create an assistant. You can do this either programmatically, in code, or via the OpenAI Playground web interface. Importantly, the assistant is persistent. OpenAI stores it for reuse, which makes it almost like having a fine-tuned model available for specific tasks.
Give the assistant an identity, data, and tools. At the very least, you need to give it an identity, the system message. This tells it how you want it to act, and what you want it to do. You can also upload data, such as knowledge you want it to have available for answering questions. Finally, you give it tools, including built in tools like Code Interpreter, and definitions for custom tools you'll write and execute yourself. (Interesting note: OpenAI's documentation says that assistants can use up to 128 tools, which is incredible.)
Start a thread and execute a "run." Unlike the conversational chatbots we've gotten familiar with, the primary purpose of agents seems to be executing tasks in the background. Read on to see how this works.

Creating an AI data journalist

Okay, now, on to the steps! Here's what we're going to be creating:

Data journalist created with the OpenAI Assistants API and Streamlit

Step 1: Create an assistant

I'm going to assume that you already have an OpenAI developer account with an API key. If not, head over there first.

Once you're logged in, click on the "Playground" on the left, then select "Assistants" from the dropdown at the top.

Next, create a new assistant as follows:

Name: Data Journalist
System message: You are an experienced data journalist. You receive a CSV of data from a user. You write code to find interesting patterns in the data. You choose the most interesting of these patterns and write a 250-word article about them. You write the headline for the article and then the article itself. You do not ask for feedback from the user at any point. You independently look for trends, independently write the article, and then provide the article to the user to review.
Code interpreter: Active

领英推荐

?? Daily News in AI Agents: Key Updates 01/31 -…

?? Jim Schwoebel 1 个月前

ODSC’s AI Weekly Recap: Week of June 14th

Open Data Science Conference (ODSC) 9 个月前

AI-Powered news roundup: Edition 20

Siili Solutions 2 个月前

Then click "Save."

That's literally it!

The AI data journalist will analyze an uploaded CSV, look for interesting information (like trends), and then write an article about the most interesting things that it finds.

At this point, you can even test it. Just upload a file in the preview box and ask for a data-driven story.

Your assistant should look like this:

OpenAI Playground with a Data Journalist assistant

Step 2: Create Streamlit app

Next, you'll create the app to run the assistant. The good news is that I've copied the code to a public Gist. You can simply click that link, copy the code into a Python file, input your assistant ID, and run it with streamlit run <filename>.py.

Or, you can copy the code from here, which also includes comments to explain how it works:

import os
import time

import openai
import streamlit as st

# Create an OpenAI client with your API key
openai_client = openai.Client(api_key=os.environ.get("OPENAI_API_KEY"))

# Retrieve the assistant you want to use
assistant = openai_client.beta.assistants.retrieve(
    "<assistant_id>"
)

# Create the title and subheader for the Streamlit page
st.title("Data Journalist")
st.subheader("Upload a CSV and get the story within:")

# Create a file input for the user to upload a CSV
uploaded_file = st.file_uploader(
    "Upload a CSV", type="csv", label_visibility="collapsed"
)

# If the user has uploaded a file, start the assistant process...
if uploaded_file is not None:
    # Create a status indicator to show the user the assistant is working
    with st.status("Starting work...", expanded=False) as status_box:
        # Upload the file to OpenAI
        file = openai_client.files.create(
            file=uploaded_file, purpose="assistants"
        )

        # Create a new thread with a message that has the uploaded file's ID
        thread = openai_client.beta.threads.create(
            messages=[
                {
                    "role": "user",
                    "content": "Write an article about this data.",
                    "file_ids": [file.id],
                }
            ]
        )

        # Create a run with the new thread
        run = openai_client.beta.threads.runs.create(
            thread_id=thread.id,
            assistant_id=assistant.id,
        )

        # Check periodically whether the run is done, and update the status
        while run.status != "completed":
            time.sleep(5)
            status_box.update(label=f"{run.status}...", state="running")
            run = openai_client.beta.threads.runs.retrieve(
                thread_id=thread.id, run_id=run.id
            )

        # Once the run is complete, update the status box and show the content
        status_box.update(label="Complete", state="complete", expanded=True)
        messages = openai_client.beta.threads.messages.list(
            thread_id=thread.id
        )
        st.markdown(messages.data[0].content[0].text.value)

        # Delete the uploaded file from OpenAI
        openai_client.files.delete(file.id)

Bottom line: Creating AI agents that run background tasks just got much easier

The above code creates a web interface that shows a status as the assistant executes its tasks. But the web interface isn't necessary. You could have a different tool that allows you to schedule runs and receive results by email when they're ready.

What's most exciting to me is that we can now run background tasks powered by AI, with access to powerful tools. This opens up a lot more use cases beyond those for which a chat interface is ideal.

Valentin F.

Marketing Specialist | Digital Technologies, Automation, Generative AI, Music Industry, Web3, Retail, Marketing, Content Creation, Video Montage.

1 年

Great read Simon. I like your work. Keep it up!

1 次回应

查看更多评论

要查看或添加评论，请登录

Simon Smith的更多文章

Now That AI Outthinks Your People, Can Your Company Adapt?

2024年12月14日

Now That AI Outthinks Your People, Can Your Company Adapt?

Companies are about to face an unprecedented challenge: not having anyone smart enough to evaluate the output of their…

3 条评论
The On-Demand Generative Revolution: A Paradigm Shift You Might Be Missing

2024年7月3日

The On-Demand Generative Revolution: A Paradigm Shift You Might Be Missing

AI-powered on-demand generation is reshaping our world. Here's why it's more significant than you think.

1 条评论
Unlimited, almost free intelligence: What does it mean?

2023年12月18日

Unlimited, almost free intelligence: What does it mean?

Capability improvement in generative AI is an important story. But another, perhaps more overlooked one is…
AI in 2024: Predictions

2023年11月30日

AI in 2024: Predictions

I’ve been thinking a lot about AI in the year ahead. Predicting is tough, but maybe a few of these might come to pass:…

2 条评论
First experience creating a Custom GPT: "Garden Genius"

2023年11月9日

First experience creating a Custom GPT: "Garden Genius"

A couple of days ago I created a data journalist assistant using the new OpenAI Assistants API. But I've been excited…

6 条评论
OpenAI Dev Day: What got announced, and what it means

2023年11月6日

OpenAI Dev Day: What got announced, and what it means

I just finished watching the OpenAI Dev Day keynote. While I'm still processing, I wanted to share some of the…

3 条评论
ChatGPT: Image to story in 5 minutes

2023年10月12日

ChatGPT: Image to story in 5 minutes

Just a few years ago, forecasters predicted that AI would first come for the repetitive jobs, and the creative would…

4 条评论
I read the 156-page GPT-4V report so you don't have to—and here's what impressed me most

2023年10月3日

I read the 156-page GPT-4V report so you don't have to—and here's what impressed me most

With the impending public release of GPT-4V, the paper "The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision)"…
"First AI-created drug" is ambiguous, meaningless, and dangerous

2020年2月1日

"First AI-created drug" is ambiguous, meaningless, and dangerous

Despite this headline, I'm an optimist. Artificial intelligence will transform drug discovery.

24 条评论
6 Steps to AI Leadership in Pharma: An Interview with John Baldoni of GSK

2018年2月15日

6 Steps to AI Leadership in Pharma: An Interview with John Baldoni of GSK

Note: I originally wrote this for Forbes. Due to updated editorial guidelines, I couldn’t publish it there.

15 条评论

See all articles

Creating an AI data journalist with the new OpenAI Assistants API

Simon Smith

EVP Generative AI at Klick

What is the Assistants API and why should you care?

Creating an AI data journalist

Step 1: Create an assistant

领英推荐

Step 2: Create Streamlit app

Bottom line: Creating AI agents that run background tasks just got much easier

Simon Smith的更多文章

社区洞察

其他会员也浏览了

Creating a Product Support AI Agent using Natural Language

Grok 3 VS Gemini 2.0 vs Perplexity VS Qwen 2.5 Max: Who Wins?

Artificial Intelligence #87: New low-code data scientist course for domain experts and industry professionals who are non-developers

Vector Databases vs. Knowledge Graphs: Choosing the Right Foundation for Retrieval-Augmented Generation

DeepSeek or Deepfake - Part 4, When Distillation Goes Wrong: Analyzing the Side Effects of the Incorrect KP Formula in the DeepSeek Whitepaper

Distilled LLM's -Much ado about little

OpenAI's Code Interpreter, AGI safety and MLOps.

Redefining App Architecture: A Deep Dive into LLM-Based System Design

The End of AI Hallucinations: A Breakthrough in Accuracy for Data Engineers

What is the Assistants API and why should you care?

Creating an AI data journalist

Step 1: Create an assistant

领英推荐

Step 2: Create Streamlit app

Bottom line: Creating AI agents that run background tasks just got much easier

Simon Smith的更多文章

Now That AI Outthinks Your People, Can Your Company Adapt?

The On-Demand Generative Revolution: A Paradigm Shift You Might Be Missing

Unlimited, almost free intelligence: What does it mean?

AI in 2024: Predictions

First experience creating a Custom GPT: "Garden Genius"

OpenAI Dev Day: What got announced, and what it means

ChatGPT: Image to story in 5 minutes

I read the 156-page GPT-4V report so you don't have to—and here's what impressed me most

"First AI-created drug" is ambiguous, meaningless, and dangerous

6 Steps to AI Leadership in Pharma: An Interview with John Baldoni of GSK

社区洞察

其他会员也浏览了

Creating a Product Support AI Agent using Natural Language

Grok 3 VS Gemini 2.0 vs Perplexity VS Qwen 2.5 Max: Who Wins?

Artificial Intelligence #87: New low-code data scientist course for domain experts and industry professionals who are non-developers

Vector Databases vs. Knowledge Graphs: Choosing the Right Foundation for Retrieval-Augmented Generation

DeepSeek or Deepfake - Part 4, When Distillation Goes Wrong: Analyzing the Side Effects of the Incorrect KP Formula in the DeepSeek Whitepaper

Distilled LLM's -Much ado about little

OpenAI's Code Interpreter, AGI safety and MLOps.

Redefining App Architecture: A Deep Dive into LLM-Based System Design

The End of AI Hallucinations: A Breakthrough in Accuracy for Data Engineers