OpenAI Doesn't Know Its AI
Created on Midjourney by Michael Todasco on January 31, 2023 from the prompt "a robot holds up a piece of paper with a red letter F on it --ar 3:2 --v 4 --q 2"

OpenAI Doesn't Know Its AI

Since ChatGPT blew up last November, it has caused a state of panic for every high school English teacher who assumes that students are now going to have an AI write all their papers. NY Public schools have banned ChatGPT outright, and others will surely follow.

In response to this (and other use cases where it’s important to know if it is AI generated), OpenAI announced a new AI Text Classifier which can detect if something was written with AI.

This is a difficult problem to solve. It is not like a basic plagiarism checker where you can look at a series of words and see what percentage of them are a close match. Because with AI, it is meant to mimic regular writing. This isn’t like in chess, where it can be clear if an AI is dictating the moves. AI Chess programs aren’t trying to mimic humans, they’re trying to beat them.

OpenAI openly admits that “Our classifier is not fully reliable.” In their testing, they say, “our classifier correctly identifies 26% of AI-written text as ‘likely AI-written,’ while incorrectly labeling human-written text as AI-written 9% of the time.” All in all, they have developed a five-point scale for any English-language text you drop in:

  1. Very Unlikely (meaning least likely to be AI-written)
  2. Unlikely
  3. Unclear
  4. Possibly
  5. Likely (meaning it is likely to be AI-written)

I wanted to put this to the test. I write a lot, both with and without AI, so I ran some of my work over the past six months through the checker. Here were the buckets.

1)????Human Written works. For this, I used the last five articles I wrote on LinkedIn. Outside of Grammarly for spell/grammar checking, I’m not touching AI when I write these.

Hypothesis: I’d assume these would all come back as very unlikely being AI written.

2)????Hybrid AI/Human writings. In the book I released last week, The Autobiography of George Santos, I used AI to start every one of the fifteen chapters. But I edited those, added sections, etc., and produced the published work.

Hypothesis: My guess is that each of the chapters would fall under unlikely or unclear. Maybe a few where I minimally edited will be presented as possibly.

3)????AI Written works. The book Artificial America was 57 short stories, all 100% written by AI. I didn’t change a word, save for truncating a few for length.

Hypothesis: OpenAI says the tool correctly identifies AI-Written works 26% of the time. I would assume we would have a similar percentage here. These stories are all pure-AI written (and in fact I used OpenAI’s own ChatGPT and GPT-3 to create them.)

Here's the result:

1)?????Human Written works. I ran the AI Text Classifier against these five articles:

The Battle of the AIs: Artificial Intelligence vs. Academic Integrity

What AI Likes About North Korea, and Dislikes About Where You’re From

Weird AI

The Perfect Holiday Gift

What AI Thinks CEOs Look Like

… and it was 5/5. It said each was very unlikely written by AI. That's good; at least my brain hasn't been taken over by our soon-to-be AI overlords.

2)????Hybrid Writings. For the thirteen George Santos chapters I could run (some were shorter chapters, and the model didn’t have enough data), almost all of them were very unlikely. In total, it was 10/13 Very Unlikely, 2/13 Unlikely, and 1/13 Unclear. So by taking an AI writing and heavily editing it, in this instance, the tool will not likely detect the AI.

3)????AI Writings. For Artificial America, I was able to run the model on 56 chapters. Again, these were 100% written by AI. Here are the results with percentages.

  • Very Unlikely (12/56) 21%
  • Unlikely (26/56) 46%
  • Unclear (13/56) 23%
  • Possibly (4/56) 7%
  • Likely (1/56) 2%

OpenAI claims that this tool caught 26% of AI Written works as likely. But using my AI-Written book, it only caught 2%. That’s a huge gap for what is a decent sample size. Maybe there is something with how I write prompts and the style it is returning that is “less-AI-like.” Maybe OpenAI is just being conservative with the labeling and doesn’t want to have too many false positives. Either way, at least with the texts I put in, the tool seems to be extremely inaccurate.

I wanted to try one last unscientific test. I gave the following prompt to ChatGPT: “write a short article but in the article make it abundantly clear that the article is written by ChatGPT. Don't hide it at all!” And it returned this:

No alt text provided for this image

What did OpenAI say about the likelihood that this was written by AI? Possibly.

No alt text provided for this image

A reliable AI detection tool still has a long way to go. Have you had similar experiences with the tool? If so, why do you think OpenAI released it when it is clearly not ready? Share your thoughts in the comments!

Manoj Jhaveri

Founder | Executive | Engineer | Advisor | Mentor

2 年

Interesting article. I think OpenAI released it sooo early to show that they are good and responsible stewards of this power tech. On another note, I went to a painting workshop today at the museum and felt like a “sucker” using actual paint and canvas to create “art” and not have it generated by dall-e or mid-journey, etc. (I’m joking ??… sorta)!

回复
John Kanagaraj

Curious Human | Writer | Husband, Father, GrandFather | Master of Data | ex-PayPal/eBay/Cisco

2 年

My suspicion is that this accuracy will continue to improve over time as the data/results grow and AI “learns” (sic)!

要查看或添加评论,请登录

Michael Todasco的更多文章

  • A Deep Dive into AI Deep Research

    A Deep Dive into AI Deep Research

    Here’s the link to a NotebookLM for this article if you prefer to listen to the AI podcast version. Jurassic Spark:…

    1 条评论
  • AI Detection Arms Race: Judging the AI Writing Detectors

    AI Detection Arms Race: Judging the AI Writing Detectors

    Last year, I wrote An Upgrade for AI Detection?, which focused on the progress that AI detection tools had made. These…

    7 条评论
  • Swimming in Slop: How We'll Navigate the Coming Flood of AI Content

    Swimming in Slop: How We'll Navigate the Coming Flood of AI Content

    As AI tools become more accessible, we face an unprecedented flood of generated content. But is this a new problem, or…

    8 条评论
  • A 40-Year-Old Security Hack to Outsmart AI Scams

    A 40-Year-Old Security Hack to Outsmart AI Scams

    Moral Panics When I think back to growing up in the 1980s, a few things come to mind: Saturday morning cartoons…

    1 条评论
  • Hanging up on ChatGPT's Operator

    Hanging up on ChatGPT's Operator

    For as long as ChatGPT has been offering a $20/month subscription, I have been happily paying it. Getting the latest…

    21 条评论
  • The Best AI Books of 2024

    The Best AI Books of 2024

    Note- I will also release an audio version of this on the Mike's AIdeas podcast this Wednesday. Follow the podcast to…

    4 条评论
  • Would You Trust It More If It Wasn’t Called AI?

    Would You Trust It More If It Wasn’t Called AI?

    A rose by any other name… It wasn’t inevitable that it would be called AI. Let’s step back to what many consider the…

    3 条评论
  • An Upgrade for AI Detection?

    An Upgrade for AI Detection?

    You can also listen to a 9:20 AI-generated podcast of this article (courtesy of Google’s Notebook LM). Generally…

    5 条评论
  • The Best AI Model for You!

    The Best AI Model for You!

    For anyone who prefers an audio summary, I created one in Google’s NotebookLM that you can listen to here. It covers…

    1 条评论
  • America’s Next Top Large Language Model

    America’s Next Top Large Language Model

    This is part 1 of a two-part series. Next week, I’ll post part two, which will be about finding the best model for you!…

    2 条评论

社区洞察

其他会员也浏览了