- DetectGPT’s method leverages log-probabilities of the text. “If an LLM produces text, each token has a conditional probability of appearing based on the previous tokens. Multiply all these conditional probabilities to obtain the (joint) probability for the text.”(source)
- What DetectGPT does is it perturbs the text and then compares the log probabilities of both, pre-perturbed and perturbed text. If the new log-probability is significantly lower, its AI generated, otherwise it is human generated.
- The image below, (source), displays the perturbation, scoring and comparison process.
- Stylometry is the study of linguistic style, and it’s often used to attribute authorship to anonymous or disputed documents.
- In AI detection, stylometry can be used to identify the distinct ‘style’ of a particular AI model, based on certain features like word usage, sentence structure, and other linguistic patterns.
- Stylometry is a field of study within computational linguistics and digital humanities that involves the quantification and analysis of literary style through various statistical and machine learning methods. The core premise of stylometry is that authors have a distinct and quantifiable literary “fingerprint” that can be analyzed and compared.
- Stylometric analysis is performed using features such as word length, sentence length, vocabulary richness, frequency of function words, and usage of certain phrases or structures. These features are then subjected to various statistical analyses to identify patterns.
- Here are a few applications and techniques used in stylometry:Authorship Attribution: Stylometry can help identify or confirm the author of a text based on stylistic features. This can be useful in literary studies, forensics, and even in cases of disputed authorship or anonymous texts.Author Profiling: By analyzing the stylistic features of a text, it’s possible to make predictions about the author’s demographics, including age, gender, native language, or even psychological traits.Text Categorization: Stylometry can also be used to classify texts into different genres, types (fiction vs non-fiction), or time periods based on stylistic features.Machine Learning Techniques in Stylometry: In recent years, machine learning techniques have been increasingly applied to stylometry. Methods such as Support Vector Machines (SVM), Random Forests, and Neural Networks are used to classify texts based on stylistic features.N-gram Analysis: This is a common technique in stylometry that involves counting sequences of ‘n’ words or characters. N-gram analysis can help capture patterns of language use that are distinctive to a particular author.Function Words Analysis: Function words are words that have little meaning on their own but play a crucial role in grammar (like ‘and’, ‘the’, ‘is’). Authors tend to use function words unconsciously and consistently, making them a valuable tool for stylometric analysis.
- It should be noted that while stylometry can be powerful, it also has its limitations. The results can be influenced by factors such as the genre of the text, the author’s conscious changes in style, and the influence of co-authors or editors. Furthermore, while stylometry can suggest patterns and correlations, it can’t definitively prove authorship or intent.
- “GPTZero computes perplexity values. The perplexity is related to the log-probability of the text mentioned for DetectGPT above. The perplexity is the exponent of the negative log-probability. So, the lower the perplexity, the less random the text. Large language models learn to maximize the text probability, which means minimizing the negative log-probability, which in turn means minimizing the perplexity.GPTZero then assumes the lower perplexity are more likely generated by an AI.Limitations: see DetectGPT above. Furthermore, GPTZero only approximates the perplexity values by using a linear model.” Sebastian RaschkaIn addition, as we an see in the image below, (source), it now takes into consideration burstiness to check how similar the text is to AI patterns of writing.Human written text as changes in style and tone whereas AI content remains more-so consistent throughout.
AI Educator | Built a 100K+ AI Community | Talk about AI, Tech, SaaS & Business Growth ( AI | ChatGPT | Career Coach | Marketing Pro)
1 年Great way to tackle the challenge of detecting AI models!
That's fascinating! DetectGPT's approach seems really promising. ??