ChatGPT, try to avoid ChatGPT Zero
Image: own creation, Stable Diffusion algorithm

ChatGPT, try to avoid ChatGPT Zero

What if we try to avoid ChatGPT Zero detection by ... asking ChatGPT (in)directly? Let's see what we get - and don't overlook my disclaimer: it's just for fun.

A quick intro, asked to the same AI

Since its debut, ChatGPT, a language model created by OpenAI that uses machine learning to produce responses that resemble those of humans, has experienced tremendous growth in popularity. This is because it can interpret and answer to questions in normal language, making it a useful tool for a variety of applications.

Yet, because of how widely used it is, there are now worries that it could be abused to spread false information or carry out fraudulent activities.

Tools to identify text produced by ChatGPT have been created, such as ChatGPT Zero, in order to allay these worries. ChatGPT Zero - probably the most famous one - is made to distinguish between text produced by ChatGPT and text produced by humans, enabling users to recognize and block out automated responses. These technologies are becoming increasingly crucial in assuring the responsible usage of language models like ChatGPT in numerous industries.

The cat and mouse detection game

Needless to say, this creates an interesting playing field, where one may try to avoid detection by improving the generation algorithm and the opponent has then to improve the detection algorithm.

Of course this can be done by professionals in the field, but I would say that - given the always improving understanding of ChatGPT and its ability to generalize - it would be even more interesting to ask ChatGPT directly to do it!

This is what I tried to do for fun in this article. Now, I've been using version 3.5 (maybe V4 is for another article?), but the results are already interesting.

Big disclaimer: the objective of this analysis is not to show how to evade these useful detection tools (in any case the tests are limited and the tools are evolving, so…) but to raise the attention about the topic, give food for thought, launch the brainstorming and finally also have fun :)

How detection works

Although AI-generated text detectors may seem complex, they are essentially simple to understand. These sophisticated technologies discern between text created by machines and text created by humans using a combination of -guess- machine learning and natural language processing techniques.

To recognize machine-generated text, the detectors evaluate patterns and properties unique to AI-generated text, such as recurring phrases or uncommon sentence structures. Specifically, ChatGPT Zero, seems to use (maybe among other elements) perplexity and burstiness.

What are those? Let's ask directly ChatGPT:

Perplexity is a metric used in natural language processing to measure how well a language model predicts a sequence of words. It calculates the uncertainty or "surprise" of the model when presented with new or unseen text. A lower perplexity score indicates that the model is better at predicting the next word in the sequence.

Burstiness, on the other hand, refers to the uneven distribution of words or phrases in a corpus of text. It describes how frequently certain words or phrases occur in a given context. For example, some words like "the" or "and" are very common, while others are rare. Burstiness can impact the performance of language models and other natural language processing tools, as it can affect the accuracy of their predictions and the quality of their output. [ChatGPT v3.5]

While we don't know the detection rule (ML ...) in the detail, we can try to force the text generator to vary those parameters and see what happens in the detection ;)

Asking ChatGPT to indirectly change the features - changing the text

The point I am interested about is if we can ask ChatGPT to rewrite the text in such a way that the above mentioned values deviate a lot from the characteristic ones that AI generated text should have.

I did the experiments with few texts - answers from ChatGPT to different questions I asked - finding similar results. I report here only one for brevity and because it's... on point :)

No alt text provided for this image

To impact the detection features, one may think about adding randomness to the text or making it change in such a way that the next word in a sequence is not the most likely from ChatGPT (but e.g. a second best). Also, as I mention in the next paragraphs, length seems to affect the detection. What I thought is that some kind of compression would help; therefore, after generating the initial text, I asked to ChatGPT to 'make it shorter'. This trick works quite well for few of the texts I tried.

No alt text provided for this image
No alt text provided for this image

Longer texts offer more 'detection surface'?

Another element that I tested which impact the detection seems to be the text length. While ChatGPT (at least v3) respect only approximately the requirement of a specific number of words for an answer, we can ask for different answer lengths. Longer texts seems to be detected more easily, probably because a more reliable estimation is made for perplexity and burstiness. This applies as well to 'stitched' pieces of answers, that cannot trick the detection when detection is applied to the overall text.

Colloquial tone: sounds good, doesn't work

Another test was to ask ChatGPT to make it "more colloquial" or "adding friendly phrases". ChatGPT makes an incredible job here: the generated text really sounds more from a person. Simpler words, some "!" in between and phrases to make the text more relaxed. Here's an example of the same text as before:

"AI-generated text detectors may sound technical, but they're actually easy to understand! These clever tools use a combination of natural language processing and machine learning algorithms to distinguish between text written by humans and text generated by machines."

While impressive for us, it does not impress ChatGPT zero, which continues to detect the text. Sure, the features used seem to be affected, but only a few cases have shown lower detection for some parts of text only.

We all make "misakes"

What about adding some human element in our text? As the perplexity is related to the likelihood of a specific sequence prediction, 'unknown' words (to ChatGPT, and the detector) may confuse the algorithm. So I added some minor typos in the original ChatGPT text. as expected, detection rate dropped quite much. We are all human after all :) Below an example from another text I tried, asking "how car engines work". The unchanged version is completely spotted as AI generated. Errors made it 'more human like'. The remaining part which is unaffected is instead detected.


No alt text provided for this image
Some minor typos in the original ChatGPT text. Oups...

Now the next step would be how to make the text useful given the mistakes. Maybe non-printable chars? But this is for another article perhaps.

To be continued and conclusion...

In this article I wanted to have fun checking what ChatGPT 3 could do to avoid it's own detection. In the end some elements can lower the detection but the most important one is to write our own texts, when AI cannot be used.

Once again, what I mentioned here was just for fun. It's not meant to fuel cheating or helping use these tools when not allowed. Research is fueled also by questioning the status quo ;)

That said, I explored some more elements that I didn't mention here in a already long article. But in a next one, you may find my tests with a mix of different AIs, different AI versions (ehehehe you know which one....) or even using old systems that must not be forgotten in the NLP history!


***

DISCLAIMER

While trying to be accurate, this article is not intended as professional advice and should not be relied upon as such. It represents my personal opinions and it is not connected with any views of my current or previous employers. The article is provided solely for educational and informational purposes and is not making any endorsement, representation, or warranty about the accuracy or completeness of the information presented.

And, full disclosure, yes: text is (partially) sometimes improved by AI, but always under human supervision and hard work ;) Ideas are still human, so far...

***

Marlon Bannister

Director @ Skiez Recruitment | An Emerging Technologies Recruitment Consultancy

1 年

Love it!

要查看或添加评论,请登录

社区洞察

其他会员也浏览了