登录查看更多内容

OpenAI Doesn't Know Its AI

Michael Todasco

Visiting Fellow at the James Silberrad Brown Center for Artificial Intelligence at SDSU, AI Writer/Advisor

发布日期: 2023年2月1日

Since ChatGPT blew up last November, it has caused a state of panic for every high school English teacher who assumes that students are now going to have an AI write all their papers. NY Public schools have banned ChatGPT outright, and others will surely follow.

In response to this (and other use cases where it’s important to know if it is AI generated), OpenAI announced a new AI Text Classifier which can detect if something was written with AI.

This is a difficult problem to solve. It is not like a basic plagiarism checker where you can look at a series of words and see what percentage of them are a close match. Because with AI, it is meant to mimic regular writing. This isn’t like in chess, where it can be clear if an AI is dictating the moves. AI Chess programs aren’t trying to mimic humans, they’re trying to beat them.

OpenAI openly admits that “Our classifier is not fully reliable.” In their testing, they say, “our classifier correctly identifies 26% of AI-written text as ‘likely AI-written,’ while incorrectly labeling human-written text as AI-written 9% of the time.” All in all, they have developed a five-point scale for any English-language text you drop in:

Very Unlikely (meaning least likely to be AI-written)
Unlikely
Unclear
Possibly
Likely (meaning it is likely to be AI-written)

I wanted to put this to the test. I write a lot, both with and without AI, so I ran some of my work over the past six months through the checker. Here were the buckets.

1)????Human Written works. For this, I used the last five articles I wrote on LinkedIn. Outside of Grammarly for spell/grammar checking, I’m not touching AI when I write these.

Hypothesis: I’d assume these would all come back as very unlikely being AI written.

2)????Hybrid AI/Human writings. In the book I released last week, The Autobiography of George Santos, I used AI to start every one of the fifteen chapters. But I edited those, added sections, etc., and produced the published work.

Hypothesis: My guess is that each of the chapters would fall under unlikely or unclear. Maybe a few where I minimally edited will be presented as possibly.

3)????AI Written works. The book Artificial America was 57 short stories, all 100% written by AI. I didn’t change a word, save for truncating a few for length.

Hypothesis: OpenAI says the tool correctly identifies AI-Written works 26% of the time. I would assume we would have a similar percentage here. These stories are all pure-AI written (and in fact I used OpenAI’s own ChatGPT and GPT-3 to create them.)

Here's the result:

1)?????Human Written works. I ran the AI Text Classifier against these five articles:

The Battle of the AIs: Artificial Intelligence vs. Academic Integrity

领英推荐

Open-Source Implementations of ChatGPT’s Training…

Lightning AI 2 年前

DeepSeek vs. OpenAI: How DeepSeek is Competing and Why…

Flexxited 1 个月前

What is ChatGPT? Technology behind ChatGPT

OptimusFox 2 年前

What AI Likes About North Korea, and Dislikes About Where You’re From

Weird AI

The Perfect Holiday Gift

What AI Thinks CEOs Look Like

… and it was 5/5. It said each was very unlikely written by AI. That's good; at least my brain hasn't been taken over by our soon-to-be AI overlords.

2)????Hybrid Writings. For the thirteen George Santos chapters I could run (some were shorter chapters, and the model didn’t have enough data), almost all of them were very unlikely. In total, it was 10/13 Very Unlikely, 2/13 Unlikely, and 1/13 Unclear. So by taking an AI writing and heavily editing it, in this instance, the tool will not likely detect the AI.

3)????AI Writings. For Artificial America, I was able to run the model on 56 chapters. Again, these were 100% written by AI. Here are the results with percentages.

Very Unlikely (12/56) 21%
Unlikely (26/56) 46%
Unclear (13/56) 23%
Possibly (4/56) 7%
Likely (1/56) 2%

OpenAI claims that this tool caught 26% of AI Written works as likely. But using my AI-Written book, it only caught 2%. That’s a huge gap for what is a decent sample size. Maybe there is something with how I write prompts and the style it is returning that is “less-AI-like.” Maybe OpenAI is just being conservative with the labeling and doesn’t want to have too many false positives. Either way, at least with the texts I put in, the tool seems to be extremely inaccurate.

I wanted to try one last unscientific test. I gave the following prompt to ChatGPT: “write a short article but in the article make it abundantly clear that the article is written by ChatGPT. Don't hide it at all!” And it returned this:

What did OpenAI say about the likelihood that this was written by AI? Possibly.

A reliable AI detection tool still has a long way to go. Have you had similar experiences with the tool? If so, why do you think OpenAI released it when it is clearly not ready? Share your thoughts in the comments!

AI Conversations

1,726 位关注者

Manoj Jhaveri

Founder | Executive | Engineer | Advisor | Mentor

2 年

Interesting article. I think OpenAI released it sooo early to show that they are good and responsible stewards of this power tech. On another note, I went to a painting workshop today at the museum and felt like a “sucker” using actual paint and canvas to create “art” and not have it generated by dall-e or mid-journey, etc. (I’m joking ??… sorta)!

John Kanagaraj

Curious Human | Writer | Husband, Father, GrandFather | Master of Data | ex-PayPal/eBay/Cisco

2 年

My suspicion is that this accuracy will continue to improve over time as the data/results grow and AI “learns” (sic)!

2 次回应

查看更多评论

要查看或添加评论，请登录

Michael Todasco的更多文章

A Deep Dive into AI Deep Research

2025年3月12日

A Deep Dive into AI Deep Research

Here’s the link to a NotebookLM for this article if you prefer to listen to the AI podcast version. Jurassic Spark:…

1 条评论
AI Detection Arms Race: Judging the AI Writing Detectors

2025年3月5日

AI Detection Arms Race: Judging the AI Writing Detectors

Last year, I wrote An Upgrade for AI Detection?, which focused on the progress that AI detection tools had made. These…

7 条评论
Swimming in Slop: How We'll Navigate the Coming Flood of AI Content

2025年2月26日

Swimming in Slop: How We'll Navigate the Coming Flood of AI Content

As AI tools become more accessible, we face an unprecedented flood of generated content. But is this a new problem, or…

8 条评论
A 40-Year-Old Security Hack to Outsmart AI Scams

2025年2月4日

A 40-Year-Old Security Hack to Outsmart AI Scams

Moral Panics When I think back to growing up in the 1980s, a few things come to mind: Saturday morning cartoons…

1 条评论
Hanging up on ChatGPT's Operator

2025年1月29日

Hanging up on ChatGPT's Operator

For as long as ChatGPT has been offering a $20/month subscription, I have been happily paying it. Getting the latest…

21 条评论
The Best AI Books of 2024

2024年12月10日

The Best AI Books of 2024

Note- I will also release an audio version of this on the Mike's AIdeas podcast this Wednesday. Follow the podcast to…

4 条评论
Would You Trust It More If It Wasn’t Called AI?

2024年10月29日

Would You Trust It More If It Wasn’t Called AI?

A rose by any other name… It wasn’t inevitable that it would be called AI. Let’s step back to what many consider the…

3 条评论
An Upgrade for AI Detection?

2024年9月25日

An Upgrade for AI Detection?

You can also listen to a 9:20 AI-generated podcast of this article (courtesy of Google’s Notebook LM). Generally…

5 条评论
The Best AI Model for You!

2024年9月19日

The Best AI Model for You!

For anyone who prefers an audio summary, I created one in Google’s NotebookLM that you can listen to here. It covers…

1 条评论
America’s Next Top Large Language Model

2024年9月11日

America’s Next Top Large Language Model

This is part 1 of a two-part series. Next week, I’ll post part two, which will be about finding the best model for you!…

2 条评论

See all articles

OpenAI Doesn't Know Its AI

Michael Todasco

Visiting Fellow at the James Silberrad Brown Center for Artificial Intelligence at SDSU, AI Writer/Advisor

领英推荐

AI Conversations

1,726 位关注者

Michael Todasco的更多文章

社区洞察

其他会员也浏览了

ChatGPT: How Bolt PR Embraces the Power of AI

All You Need to Know About ChatGPT

Is Chat GPT-4 Worth $20?

Prompting GPT: How to Get the Best Results in a Production Environment

ChatGPT-It's a game changer or a hype?

How to Bypass Copyleaks AI Detection

ChatGPT vs Bard: Who?Wins? (Surprising Results)

Most recent version of ChatGPT, OpenAI's wildly popular artificial intelligence chatbot, is GPT-4

ChatGPT Over Time; LLMs on Graphs; Why Llama2 new ChatGPT Rival; OpenAI Playground For Beginners; and More;

?? ChatGPT failing 92% of the time

领英推荐

AI Conversations

1,726 位关注者

Michael Todasco的更多文章

A Deep Dive into AI Deep Research

AI Detection Arms Race: Judging the AI Writing Detectors

Swimming in Slop: How We'll Navigate the Coming Flood of AI Content

A 40-Year-Old Security Hack to Outsmart AI Scams

Hanging up on ChatGPT's Operator

The Best AI Books of 2024

Would You Trust It More If It Wasn’t Called AI?

An Upgrade for AI Detection?

The Best AI Model for You!

America’s Next Top Large Language Model

社区洞察

其他会员也浏览了

ChatGPT: How Bolt PR Embraces the Power of AI

All You Need to Know About ChatGPT

Is Chat GPT-4 Worth $20?

Prompting GPT: How to Get the Best Results in a Production Environment

ChatGPT-It's a game changer or a hype?

How to Bypass Copyleaks AI Detection

ChatGPT vs Bard: Who?Wins? (Surprising Results)

Most recent version of ChatGPT, OpenAI's wildly popular artificial intelligence chatbot, is GPT-4

ChatGPT Over Time; LLMs on Graphs; Why Llama2 new ChatGPT Rival; OpenAI Playground For Beginners; and More;

?? ChatGPT failing 92% of the time