登录查看更多内容

点击“继续加入或登录”，即表示您同意遵守领英的《用户协议》、《隐私政策》及《Cookie 政策》。

Some painful untold facts about AI-tools - explanation with forensic linguistic outlook

Akos Bardoczi

???Open-Source Intelligence Expert |???cybersecurity | ?? legal | ??Google Cloud Platform | ??threat hunting | ?? growth hacker | ?? Python | ???writer and lector

发布日期: 2024年6月25日

DISCLAIMER: the following article has rude, disturbing, creepy, scary and observations which based on my best knowledge. If you don't like similar contents, please, don't read this article! Really! And sorry again, nope, basically I write this article to professionals, NOT for average readers.

We often hear about the power of AI-generated content, and related critics as well, without any valuable explanation. I mean, evidence based, empirical observations-based elaborations.

The question is more difficult than "the AI tools is your friends or foes?"

I often see in my own LinkedIn timeline text posts which follows very similar structure, hidden, encapsulated meanings in semantically approach. Most of these posts almost every case seems:

0x100. Grammatically too perfect and similar nuances

Yep, the search engines and suggest algos' prefer the good grammar and style, and under rank the texts which shows poor grammar. But the world is continuously changing, the too perfect grammar is more than suspicious: a big red flag for me.

The most affected contents by languages, not surprisingly: English, Spanish and most probably the Russian and Chinese, which I didn't examined.

Bear in mind the follows: the perfect grammar is simply doesn't exist: the grammar is considered a descriptive discipline within the linguistic. And not one can write grammatically perfect texts, include the native speakers. The hidden logic under the hood can detect and lesser-known property of a text content, it's called idiolects. In another words: your sentences and communication acts are your unique fingerprint which is more than useful in forensic linguistic applications.

A given person idiolectic markers appears all levels of the linguistic "layers", I mean syntax, morphology, semantics and pragmatics. While the cutting-edge spell checkers and similar tools in applied language technology can fixing the typos, grammar mistakes easily, the typical logical structure of an sentence which originated from a person is in fact fully similar, and a representation of the individual logical and decision schemes. These attributes are almost persistent, we use the same structures during our entire life.

The distribution of punctuation marks in a text is also individual, but the authorship attribution needs bigger text sample to successful author identification.

Nope, you cannot be smarter than the forensic linguist experts even thought if you are familiar in language technology or you are working as forensic linguist.

First example: if somebody write a content anonymously with own style and some why the given person - who might be a criminal or activist - try with anti-forensic techniques, the most common attempts:

Broke the long sentences into shorter ones.

Put grammar mistakes into the text intentionally. Just figure it out: if you read a text which originated from a well-educated person based on used phrases and logical structure of the sentences, a grammar mistake is more than suspicious in the first look.

I try to create a similar one in my native language [Hungarian]:

Original:

"Komolyan falra mászok a félm?velt, álintellektuális kontentgyárosoktól, akik egy maroknyi k?nyvet nem ovlastak életükben és ?k osztják az eszet arról, hogy milyen a jó tartalom, holott tudjuk, hogy ami a jó tartalom készítésének egyik feltétele a megfelel? olvasási kultúra és a témában való jártasság egy általános intellektus mellett."

Manipulated #1, without proper punctuation:

"Komolyan falra mászok a félm?velt álintellektuális kontentgyárosoktól akik egy maroknyi k?nyvet nem ovlastak életükben és ?k osztják az eszet arról hogy milyen a jó tartalom holott tudjuk, hogy ami a jó tartalom készítésének egyik feltétele a megfelel? olvasási kultúra és a témában való jártasság egy általános intellektus mellett."

The most common conjunctions is the

de, mert, hogy, mint, ha

These words establish the logical connection between parts of sentences and almost always use these words with separated by a comma.

The anomaly is trivial: if someone write in (1) a semi-scientific topics, (2) educative content, (3) mixed terminus technicus and informal words, very unlikely if the author doesn't follow the elementary grammar rules.

Manipulated #2

"Komolyan falra mászok a félm?velt álintellektuális kontentgyárosoktól. Akik egy maroknyi k?nyvet nem ovlastak életükben és ?k osztják az eszet arról hogy milyen a jó tartalom. Holott tudjuk, hogy ami a jó tartalom készítésének egyik feltétele a megfelel? olvasási kultúra. és a témában való jártasság egy általános intellektus mellett."

The long sentence fragmented to shorter sentences, which at least unusual, unfamiliar. If these appears a longer written text, where many other sentences are longer and don't fragmented, this is in fact a seriously evidence about the text manipulated some why.

Another example

Just imagine the following sentence, I will write the example in Hungarian, again:

Original:

"Ezek annyira kóklerek, hogy a mérésnél nem jegyezték pontosan sem az átlagot, sem a szórásnégyzetet, sem sem a módusz, sem a mediánt, r?viden szólva dilettáns hülyék."

Manipulated:

"Ezek annyira kóklerek, hogy a mérésnél nem jegyezték pontosan sem az átlagot, sem a szórásnégyzetet, sem sem a módusz, sem a mediánt, r?viden szólva dilettáns hüjék."

If somebody pay attention to allegedly serious research mistakes and use multiple terminus technicus, more than uncertain in this context a hair-rising, serious grammar mistake in the end of the sentence: "hüjék" instead "hülyék" non plus ultra the author of the text used a word which strongly tied to literacy. The "dilettáns" sometimes by educated people as a synonym of "hülye", but only in educated people.

In summary: the grammar is never perfect even if it originated from a native speaker. The cheap generative AI solutions can generate only suspiciously perfect texts, without any mistakes AND often create weird, awkward, false logical connections between facts within the sentence. Creepy or isn't, the too perfect grammar hijack your cognition and you probably don't detect the meaningless, dumb content.

If you read many books earlier and you read an generative AI generated article, you feel something is unfamiliar, but you don't evaluate the text with systematically methods. If the linguistic is a crucial part of your daily job, that is a different situation.

Many times, I read suspiciously popular English post especially on LinkedIN, I known about these AI-generated posts, but I'm not able to prove it, because I'm not a linguistic expert. But I have some observations, which common properties, attributes of these posts. Some of these:

1. the posts begin a sentence which seems an insightful, grandiose, deep though, which offer solution to a common issue/problem which affect almost everybody. The perfect clickbait!

2. The post continues with some bullet points: structured, therefore easy to look over quickly which doesn’t depend on the previous knowledge, proficiency, literacy and intellectual capacity of the readers. In other words: the message targets to the average people, which is - based on definition on average - the wider set of audience.

Nobody will feel too dumb, because the text is easy to understand to everybody. In addition, the post isn't too dumb, the educated people also will read it. One of the causes is the "emotional hack" - while the visitors read the content, they will feel that the conveyed values in the post harmonizes with theirs. The content generators can create almost fully unbiased, politically correct content. Language philosophy and intercultural communication is deep water. Don't get me wrong, politically correct and unbiased communication matters. But! Many evidence-based research explain how can be the politically correct and unbiased communication contra productive or dangerous in many cases.

Take a break in this point - as a former volunteer, I talked to many people from the different places of the world. For example, citizens, who are living in most dangerous parts of the globe, e.g. in Palestine, Israel, Ukraine, Russia and I'm very proud because I was empathic in every case. In other cases, I talked to others who are living in deep poverty or horrific unequal culture. My suggestion might seem passive-aggressive, arrogant for many readers: we are humans. Please try to help directly for free to women who are living states where theirs don't have rights or help veterans who literally lost everything, before you say something about global issues.

Back to the original topic: fully unrealistic an article, post, or a strictly peer reviewed scientific publication which is fully accepted by the audience without any critics. Fun fact: the peer-reviewed publications in higher mathematics field aren't exception!

3. Another marker of the generative AI created posts, these post frequently ends with a sophisticated written call-to-action and offers a quick and easy solution to an general or a specific issue for others.

4. Awkward or not, these posts receive a lot of reacts and comments, I have some hypothesis about why. The psychologically pretexted reader will at least react or comment a post which might be overall fully dumb and 100% bullshit or contains a few trivial, but basically useless things. If the readers somehow feel addressed themselves, after the first comment, others will feel involved in commenting on the same article. Some readers will overthink the article, an write comments or continue the discourse from the originally contentless article with an another one, which some cases contain unique, valuable and meaningful thoughts. But frequently the new article will be similar meaningful.

Let's recognize another risk: while the cheap AI-tools available for everybody and most of the people use these tools without proper knowledge, this silently generate a hard bubble effect, which reorganizing the current communities, these will more separated than ever, and the inequality will be bigger than ever before in the mankind. The people will not recognize the unchained singularity before it happens. Many people and companies will get stuck with their unintentionally, free dumb-AI and many businesses will ruin. As I mentioned earlier, it's a bit similar to the dotcom bubble in 1999, just bigger, regarding the elevated impact of ICT in the entire economy.

Nope, a cheap artificial intelligence tool basically can't write better articles than a human. A discipline specialized artificial intelligence probably can write high quality contents, but never alone - this way of content creation still needs machine-human interaction. For example, an expert must be know how to configure the system to write an academic paper in great style, regarding the audience, etc. but it is not the final version. The professionals must review their almost-final paper to spot and fix the mistakes.

One more interesting historical outlook to zoomers: the "machine professors" older than you think: a notable example is the Scigen article generator from early 2000s. In 2005 the Scigen authored the scientific paper

"Rooter: A Methodology for the Typical Unification of Access Points and Redundancy"

which doesn't contain any meaningful information, but, the World Multiconference on Systemics, Cybernetics and Informatics program committee accepted the paper and invited the authors to the conference. IMHO the main cause of the success of the Scigen: the generated scientific paper perfectly followed the needed academic format, partially the terminology, the topic seemed too difficult and the members of the program committee some why didn't spotted the cheat. In addition, the publication pressure was interfered with this.

One more time: basically, don't try to be smarter than others and don't try smarter the machine brains. The useful platforms, not just the social media platforms will spot the AI-generated contents and silently under rank the AI-gen content after assigning some other risk score to the content. Obviously, the penalty will affect the user who published the AI-generated trash.

If you hear about a not too expensive tool which can stylize and boost your text before publishing somewhere, it's definitely sounds good. Again: but! In the worst case it might also be dangerous. My native language is the Hungarian, my grammar in my native language was just average in high school [average grade on 1-5 scale]. I published thousands of different writings in the past two decades, I was a co-author of a book many years ago - these I written in Hungarian, but never found the readers my readers my articles less valuable. Now I learn in E?tv?s Loránd University Faculty of Law in Budapest, I learnt a lot about legal terminology [Hungarian and English], but I think my grammar skills didn't improve. In my interpretation it's strong evidence about the authenticity, logic, individuality, added value are more valued than perfect grammar.

My suggestion: if you write something on of your learned foreign language, check the text with a dumb spell checker to spot the accidentally serious mistakes, but don't try to polish the original text. If the AI-gen detection mistakes on the platform where you publish your material, the penalty is more painful, than a article which seems a bit dumb style, but meaningful content.

Source of images: Wikipedia

要查看或添加评论，请登录

Akos Bardoczi的更多文章

Esetenként az extrémizmusnak kedvezhet a polkorrektség?

2025年3月25日

Esetenként az extrémizmusnak kedvezhet a polkorrektség?

A k?z?sségi weben anem is orwell-i, hanem az egyik Black Mirror epizódot idéz?, amúgy rég létez? shadowban sokkal…
Tényszer?ség, mérhet?ség, tévhitek, AI-támogatott d?ntéshozás és OSINT

2025年3月22日

Tényszer?ség, mérhet?ség, tévhitek, AI-támogatott d?ntéshozás és OSINT

Tegnap megnyitottam az egyik, lényegében teljesen moderálatlan üzenetküld? és collab-platformot, aztán olyan tartalom…
How do you save 50-100 USD yearly with a bit of a trick if you use Obsidian on multiple devices?

2025年3月6日

How do you save 50-100 USD yearly with a bit of a trick if you use Obsidian on multiple devices?

I personally prefer the Standard Notes - which was acquired by Proton . StandardNotes is an ultimate secure, highly…
Which software suits should use on team level for sensitive investigations?

2025年3月4日

Which software suits should use on team level for sensitive investigations?

I frequently check and evaluate the privacy focused software suites which try to act as alternatives of the biggest…

3 条评论
A tudomány új k?zépkora

2025年2月7日

A tudomány új k?zépkora

Egy korábban, szinte zéró aktivitást maga után hagyó felhasználó egy, lényegében felsorolásból álló poszttal akkorát'…
Deepfake, Dugó, Teves és a robotkurva, Garfield eln?k végbele, na meg a heti k?telez? AI-jozás

2025年1月27日

Deepfake, Dugó, Teves és a robotkurva, Garfield eln?k végbele, na meg a heti k?telez? AI-jozás

Az otthoni barkács szervert?l is van parább. Van, aki szerint az a jó, ami kézm?ves, sufnis, mainframe, privát, home…
A természetes intelligencia vége?

2024年12月13日

A természetes intelligencia vége?

Remélem nem, a mai sztori rámutat, hogy mennyire nem jut eszünkbe, amit régen csináltunk - amin sokszor az sem segít…

3 条评论
Véleményes?

2024年10月26日

Véleményes?

Az a vicc ugye megvan, hogy ma már mindent meg lehet találni az interneten? Meg az is, hogy semmit ne higgy el, amit…

1 条评论
Google Mail confidential mode - pay attention to details! Seriously!

2024年10月10日

Google Mail confidential mode - pay attention to details! Seriously!

The Google Mail confidential mode is basically a great idea: any user can send emails to any email address with an…

1 条评论
A pénz, nem vicc

2024年9月19日

A pénz, nem vicc

Ha bármilyen módon lehúznak a neten, NEM, NE a rend?rségre menj el?sz?r, pontosabban függ?en attól, hogy hol vagy. Ne a…

1 条评论

See all articles

0x100. Grammatically too perfect and similar nuances

Broke the long sentences into shorter ones.

Another example

Akos Bardoczi的更多文章

Esetenként az extrémizmusnak kedvezhet a polkorrektség?

Tényszer?ség, mérhet?ség, tévhitek, AI-támogatott d?ntéshozás és OSINT

How do you save 50-100 USD yearly with a bit of a trick if you use Obsidian on multiple devices?

Which software suits should use on team level for sensitive investigations?

A tudomány új k?zépkora

Deepfake, Dugó, Teves és a robotkurva, Garfield eln?k végbele, na meg a heti k?telez? AI-jozás

A természetes intelligencia vége?

Véleményes?

Google Mail confidential mode - pay attention to details! Seriously!

A pénz, nem vicc

社区洞察