登录查看更多内容

Is there a foolproof way to identify AI generated text?

Marco van Hurne

Partnering with the most innovative AI and RPA platforms to optimize back office processes, automate manual tasks, improve customer service, save money, and grow profits.

发布日期: 2024年7月17日

In a previous article: Don't want to be caught using ChatGPT? Than stop delving - I discussed how people could sniff out an AI generated text. There are, however, many tools that sell the idea that they are capable of doing it for you.

I have tried a few... read about my experiences.

Before we start!

If you like this article, and you want to support me:

Comment on the article, or reshare it; LinkedIn appreciates it, and it helps spreak the word ??
Subscribe to TechTonic Shifts to get your daily dose of tech ??

If you spend enough time reading ChatGPT-generated content (and at this point, who doesn’t), you can probably spot some telltale signs. Like the way it repeats similar ideas in slightly different words. And the way it keeps drifting from specific details back to a general overview. And the lists. ChatGPT loves bulleted lists more than a middle manager writing his first PowerPoint.

Most overused ChatGPT words

Delve
Tapestry
Vibrant
Landscape
Realm
Embark
Excels
Vital
Comprehensive
Intricate
Pivotal
Moreover
Arguably
Notably

Most overused ChatGPT phrases

Dive into…
It’s important to note…
Important to consider…
Based on the information provided…
Remember that…
Navigating the [landscape]/[complexities of]
Delving into the intricacies of...
A testament to…

But despite the way ChatGPT content feels (vague, very structured, and as bland as white bread), is there a foolproof way to identify it? There are certainly plenty of people hoping that the answer is yes. Educators want to know when their students are hiding their ignorance of Beowulf behind a carefully constructed prompt and a 1.7 trillion parameter LLM. Media companies keep getting embarrassed when writers violate their self-righteous AI policies. And so on.

Right now, there are dozens of tools that promise to spot the difference. They’ll give you a very formal-sounding analysis that scores the portion of AI used in a text, or the likelihood of AI assistance, all down to a very specific percentage (for example, “34% of this text was written by AI”). Amusingly, some AI detectors even offer to humanize text so it will pass an AI detection test — which is just one AI robot trying to fool another.

But does it really work?

Perhaps you’re skeptical. But there’s a huge amount of money, time, and talent hoping to prove that AI-generated text has distinct fingerprints that will always betray its origin. Let’s see how it all shakes out.

Early tests and insta-fails

When I first decided to experiment with AI detectors, I thought of plenty of techniques I might use to deceive them. I could add human touchups, like inserting punctuation mistakes, adding unexpected variance (using the Oxford comma in one sentence but not in the next), or deliberately pasting in bits of human text. But first, I went for the most direct approach. I tested some carefully chosen but unaltered examples of my own writing, using the online AI generators that are just a Google search away. And I went in cold, with texts that I thought might resemble ChatGPT’s style.

Here’s my second result:

Needless to say, my use of ChatGPT was ahead of its time. I wrote this content for a project in 2022, years before the robot writers arrived.

I expected to find an arms race, with AI detectors getting better at recognizing generated content, and AI tools getting better at obfuscating the telltale signs of their influence. Reality was closer to an Emperor-has-no-clothes moment.

Having secured some false positives, I decided to try what I expected to be a harder challenge — getting a false negative, where an AI detector labels AI-generated text as human-penned. I approached this challenge by politely asking an LLM (Gemini, in this case) to write slightly different text than it usually does. Here’s my first attempt:

领英推荐

Does ChatGPT Really Think ?

Pablo Schaffner Bofill 1 年前

Stay calm and keep thinking for yourself

Maaike Groenewege 1 年前

Taming the AI Beast: How to make ChatGPT serve, not…

Tom Collins 1 年前

I think it’s safe to say that most humans would still recognize this word salad as AI-generated. It reads like the machine-written dreck in some self-published Amazon ebooks.

But AI detectors were not so sure. Some detectors thought it was AI, while others stumbled hard:

After these failures, I didn’t bother to try intermingled texts, deliberate errors, programmatic humanizers, and the other trickery I had planned.

Can this tool be saved?

The biggest surprise was that the AI detectors were all over the map. They disagreed as often as they agreed, which I didn’t expect. This discrepancy can give you the illusion that one is smarter than the next, but that’s not what my testing found. Instead, they all performed equally poorly, as long as I gave them enough chances to fail.

That doesn’t automatically mean AI detectors are useless. For example, AI detectors can still provide value if they can give us an overall indication of AI influence in a body of writing. For example, maybe we want to assess the contribution of AI in all the books published in 2024, or in all the scientific papers written about a specific topic. In this context, the individual judgements don’t need to be perfectly reliable if the averaged-out scores are useful in aggregate. Shockingly, the AI detectors might not be good enough for this use either.

The chief problem is that the AI detector’s confidence score doesn’t seem to mean anything. In my tests, a 90% certainty was not significantly less likely to produce false positives than a 60% certainty. And that’s a big issue.

After all, it doesn’t really matter what the overall success rate is for AI detection. You could create a reasonably accurate testing tool for academic environments that just says “contains AI” for every example, and it would almost certainly be right more than half the time. What really matters is being able to confidently identify extreme cases — papers entirely written with ChatGPT, for example. But if a testing tool is over 90% confident that a given sample of text isn’t human-made when it is, then that tool is never trustworthy for practical use. And any human who relies on it to make value judgments about their colleagues, students, or employees, is doing them very dirty indeed.

Reality check: Should we worry about AI content?

Is undetectable AI content the societal problem we fear it is? There’s a good argument that if a writer can incorporate a ChatGPT-written paragraph in their work with sensitivity and care (and after verifying it wasn't all made up, of course), then that’s a useful skill. And that’s especially true for people who don’t have bigger writerly ambitions.

AI can certainly lure us into bad habits. For example, there’s that AI-induced laziness that leads us, when faced with the slightest bit of frustration forming a logical argument, to give up the fight and ask ChatGPT to spit out something semi-relevant. After all, having an AI writer who can put something acceptable in the void where our thinking was supposed to take place is relieving. It’s comfortable! But if you want to develop rhetorical skills, like the ability to debate bad ideas, make yourself heard, and evaluate complex compromises, ChatGPT ain’t the way to get there.

But the idea that an AI detector can put these demons back in Pandora’s box and return us back to the days of unambiguous human authorship is a fool’s dream. AI detectors will get better, but they’ll still be easily defeated. Even humans can’t consistently recognize human writing, as evinced by the recent case of a scientific researcher who had her painstakingly written words dismissed as the obvious work of ChatGPT.

Ultimately, there’s no tech that can solve this problem, just human common sense and labor. If you don’t want to read AI, search for writers with strong, idiosyncratic voices. (We can make LLMs that duplicate different styles, but the off-the-shelf models don’t play that). If you’re a teacher hoping to catch out students, remember that they can run their ChatGPT-written work through an AI detector just as easily as you can, and then tweak it by hand (or with another free tool) until it flies safely under the radar. And if you’re in charge of educational or corporate policy, for heaven’s sake — save your money. You’re not going to stop the use of LLMs with any more success than the world’s massively wealthy media companies stopped music piracy.

And anyone writing anything should heed the most important lesson — have something to say and make it worth our while. Because in 2024, a big wall of logically consistent, impeccably punctuated text just isn’t that special.

Well, that's a wrap for today. Tomorrow, I'll have a fresh episode of TechTonic Shifts for you. If you enjoy my writing and want to support my work, feel free to buy me a coffee ??

Think a friend would enjoy this too? Share the newsletter and let them join the conversation. LinkedIn appreciates your likes by making my articles available to more readers.

Signing off - Marco

Top-rated articles:

TechTonic Shifts

2,048 位关注者

Ace Fujiwara

Thessalonica HydroShip | Green Hydrogen | SNG

2 个月

Marco: this is precisely why not all social networks are created equal. and importantly, it becomes a lot more vital for an author's views to be diverse on a range of subjects such as geopolitics, work / professional topics, personal perspectives on trends etc. etc. It is inevitable that AI-enabled solutions will be embedded into networks such as linkedin/meta etc. to generate a calibration hash of authors, a little bit like a cryptographic signature. I would advocate for tools such as the one below to be hardwired into such AI solutions: https://statementanalysis.com/ Most unfortunately, some creators of AI are using the very same tools to embed deception right into the very core of some of these. You hit it on the nail that it will be a ongoing "arms-race" for AI to detect deception, lies and bias of another AI!

1 次回应

Simon Au-Yong

Bible lover. Founder at Zingrevenue. Insurance, coding and AI geek.

3 个月

My comments were interpreted as AI by two humans on Reddit once! ?? Do I really write white bread bland content? Not sure if that's a compliment or something to worry about!

3 次回应

Sandra Bihari

Shall we make a difference together???????????

3 个月

Very nicely and amazing written piece Marco!!! In any case, it tells me that one's own consciousness is very important. Thanks for sharing with us! Have a beautifull day ????

1 次回应

Michael Attea

Digital Transformation & Business Analytics Consultant | MBA in Marketing & Analytics

3 个月

Perhaps over time with ai trained on ai output overtime with real time apis focused on identifying patterns and constructs underlying and strengthening verbiage of choice etc at superficial levels There's definitely patterns to its output (some degree of method to the madness) and I'd expect that to be leveragable albeit with no first hand exploratory analyses and so forth that's conjecture Time will tell suppose and proof will be in the pudding

2 次回应

查看更多评论

要查看或添加评论，请登录

查看全部

Is there a foolproof way to identify AI generated text?

Marco van Hurne

Partnering with the most innovative AI and RPA platforms to optimize back office processes, automate manual tasks, improve customer service, save money, and grow profits.

Before we start!

Most overused ChatGPT words

Most overused ChatGPT phrases

Early tests and insta-fails

领英推荐

Can this tool be saved?

Reality check: Should we worry about AI content?

Top-rated articles:

TechTonic Shifts

2,048 位关注者

更多精彩文章

社区洞察

其他会员也浏览了

?? Magical Prompt Trick to boost Accuracy by 12% ??

Why Trusting ChatGPT for Factual Data Might Be Risky: An Exploration of Gemini AI as a Superior Alternative

ChatGPT - Overrated or Underrated? Or Both?

Can we spot AI-generated text?

OpenAI Introduces CriticGPT: The AI Lie Detector for ChatGPT-4 Hallucinations

Raising Concerns: ChatGPT's Unintelligible Responses Spark Speculation on AI's Evolution

Bard Takes the Stage: The AI That's Almost as Good as ChatGPT (But Doesn't Rush Its Lines)

#106 ChatGPT, Bard, and Claude: The Evolving Dynamics of AI Conversations

AI Raises Questions About Itself

ChatGPT / AI Systems and the Future

Before we start!

Most overused ChatGPT words

Most overused ChatGPT phrases

Early tests and insta-fails

领英推荐

Can this tool be saved?

Reality check: Should we worry about AI content?

Top-rated articles:

TechTonic Shifts

2,048 位关注者

The battle for the best video gen AI

2024年10月18日

Humanoid robots are a $24 trillion opportunity .... or a dystopian future

2024年10月17日

Hackers took over robovacs to chase pets and yell slurs

2024年10月16日

When ChatGPT met Copilot, it turned into a Voice-to-Voice showdown

2024年10月15日

Welcome to the land of overhyped unicorns - #1 = 24 Billion valuation (Q3 edition)

2024年10月14日

AI/ML news summary: Week 41

2024年10月13日

How I use ChatGPT for product management

2024年10月12日

I’ve worn the Compass AI Necklace for a month: My experience in forgetting to charge It

2024年10月11日

This AI Can Predict Crimes Before They Happen

2024年10月10日

The Dutch government. Another lesson in how not to do AI (including petition)

2024年10月9日

社区洞察

其他会员也浏览了

?? Magical Prompt Trick to boost Accuracy by 12% ??

Why Trusting ChatGPT for Factual Data Might Be Risky: An Exploration of Gemini AI as a Superior Alternative

ChatGPT - Overrated or Underrated? Or Both?

Can we spot AI-generated text?

OpenAI Introduces CriticGPT: The AI Lie Detector for ChatGPT-4 Hallucinations

Raising Concerns: ChatGPT's Unintelligible Responses Spark Speculation on AI's Evolution

Bard Takes the Stage: The AI That's Almost as Good as ChatGPT (But Doesn't Rush Its Lines)

#106 ChatGPT, Bard, and Claude: The Evolving Dynamics of AI Conversations

AI Raises Questions About Itself

ChatGPT / AI Systems and the Future