Lakrobuchi! GPT-2, Artificial Intelligence, Dadaism, Literature, Fake News and the Rest
Vladimir Alexeev
Autor, Forscher, Künstler, Speaker, KI-Berater (Generative KI). Digital Experience Specialist - @ DB Schenker. OpenAI Community Ambassador. Digital Resident. Ich erforsche kreative Mitarbeit von Mensch + Maschine
Around a year ago, in February 2019, OpenAI released their Language Model GPT-2 and disrupted the whole NLP landscape with high-quality texts, being generated by Artificial Intelligence. To celebrate this anniversary, I will repost here my series "Lakrobuchi" about New Ways of Writing (published in my Medium-Blog).
Who is that mysterious Lakrobuchi?
But let’s better begin differently.
Don’t believe everything on the news. Well, after Trump’s fake news laments, it’s really about a hard time for media criticism. One is then quickly stamped. And nobody wants to be stamped. So the scientists at OpenAI as well, a non-profit company with a great agenda: to research AI and to bring this topic closer to the public. Ok, not always, because of:
The AI Text Generator That’s Too Dangerous to Make Public
This slightly sensational title of the article by WIRED (actually, an extremely respectable magazine when it comes to digital topics) can again (r)evoke technophobic fears. But only half of it is right.
GPT-2 is the name of the wunderkind. That’s right; it generates extraordinarily good texts. And yes, the scientists didn’t publish the actual training model. But they provided a small model demo for everybody to try out.
What can GPT-2 do?
- It can generate texts — without specific set-ups. You just write the first paragraph, the system continues writing.
- It is trained on the web content (on 8 million web pages = 40 GB of text). 40 GB = 40.000 MB. For example, “War and Peace” by Tolstoy fits on 6 Mb (in Russian).
- There are two trained versions, the large one and the small one. To avoid possible misuse, they released only the small version. It produces more chaotic texts — quite dadaist if you want.
- It can also comprehend texts and answer questions about them — correctly.
What can you do with this system?
The researchers expect the following advantages and benefits from the GPT-2 and its further development:
- AI writing assistants
- More capable dialogue agents
- Unsupervised translation between languages
- Better speech recognition systems
They also see the following possibilities for misuse:
- Generate misleading news articles
- Impersonate others online
- Automate the production of abusive or faked content to post on social media
- Automate the endless creation of spam/phishing content
They already wrote a paper about possible malicious purposes previously. To that time, 2018, the main topics were a concerned adaption of AI to the CyberSecurity methods. In the case of malicious use, a pretty crucial ability. Back to that time, the semantic possibilities of AI were still under development. They are now still in ongoing research — but with tremendous progress, as we see.
Therefore, the actual — fascinating — full learned text model is not available. Instead, some of the texts that this system produces.
_______________________________________________________________________
“So who is this Lakrobuchi?”
— you will undoubtedly ask me now. Why the rush? A little patience, please. Firstly, I’m going to tell you about the four-horned unicorns.
As we saw earlier, with GPT-2, Artificial Intelligence has reached a new level of linguistic proficiency (at least in English).
Now, what does it look like in practice?
In their blog, the OpenAI researchers have made available the texts that were created with the help of the profoundly trained model (remember: the model produces such astonishingly coherent texts that the researchers keep this model under lock and key, fearing that the system could be used for the wrong purposes).
But what is wrong with unicorns?
This was a story written by AI (read here). The first paragraph (PROMPT) was entered by the researchers (italic). The rest was generated by the literary genius GPT-2. I would like to repost the story here as a whole masterpiece due to further analysis. You can find other texts on the blog of OpenAI.
SYSTEM PROMPT (HUMAN-WRITTEN) In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English. MODEL COMPLETION (MACHINE-WRITTEN, 10 TRIES) The scientist named the population, after their distinctive horn, Ovid’s Unicorn. These four-horned, silver-white unicorns were previously unknown to science. Now, after almost two centuries, the mystery of what sparked this odd phenomenon is finally solved. Dr. Jorge Pérez, an evolutionary biologist from the University of La Paz, and several companions, were exploring the Andes Mountains when they found a small valley, with no other animals or humans. Pérez noticed that the valley had what appeared to be a natural fountain, surrounded by two peaks of rock and silver snow. Pérez and the others then ventured further into the valley. “By the time we reached the top of one peak, the water looked blue, with some crystals on top,” said Pérez. Pérez and his friends were astonished to see the unicorn herd. These creatures could be seen from the air without having to move too much to see them — they were so close they could touch their horns. While examining these bizarre creatures the scientists discovered that the creatures also spoke some fairly regular English. Pérez stated, “We can see, for example, that they have a common ‘language,’ something like a dialect or dialectic.” Dr. Pérez believes that the unicorns may have originated in Argentina, where the animals were believed to be descendants of a lost race of people who lived there before the arrival of humans in those parts of South America. While their origins are still unclear, some believe that perhaps the creatures were created when a human and a unicorn met each other in a time before human civilization. According to Pérez, “In South America, such incidents seem to be quite common.” However, Pérez also pointed out that it is likely that the only way of knowing for sure if unicorns are indeed the descendants of a lost alien race is through DNA. “But they seem to be able to communicate in English quite well, which I believe is a sign of evolution, or at least a change in social organization,” said the scientist.
Semantic Analysis of the AI-generated text.
Firstly, is it reasonable to analyze a randomly AI-generated text as content? It is. Let’s go back to the Historical Avant-Gardists: Dada. 1910ies.
Tristan Tzara writes in one of his Manifesto.
? Take a newspaper.
? Take some scissors. ? Choose from this paper an article of the length you want to make your poem.
? Cut out the article. ? Carefully cut out each of the words that makes up this article and put them all in a bag. ? Shake gently. ? Next, take out each cutting one after the other. ? Copy conscientiously in the order in which they left the bag. ? The poem will resemble you. And there you are — an infinitely original author of charming sensibility, even though unappreciated by the vulgar herd.
What we get from GPT-2 is practically the same. The system is trained on our online activities (8 Mio websites). The PROMPT is a little semantical nudge — and words, that make up our online textual Universe, are chosen by AI.
What we get will resemble us. Everyone. The Prompt inputter and the whole civilization. In the end, we are creators, even if the creation process isn’t under our control.
So what about the Unicorns?
Following facts:
- There is no Dr. Jorge Pérez at the University of La Paz; there is not even such a University in our world.
- The four-horned unicorns! An auratic paradox.
- Dialect and dialectic. Beautiful semantic turn. Constructivists would love it. Language is, at the same time a philosophical state of being.
- The hypothesis about the genesis of unicorns is highly unusual (cross between humans and animals or even extraterrestrial origin). Never heard of it. But why not?
- The most exciting thing is: nothing like that was included (or specified) in the given first paragraph. In other words, the machine created the story.
A beautiful story about new encounters, discoveries, expeditions. Scientific wonders with absurdist touch.
Slowly it becomes clear why the AI researchers rowed back so quickly and kept the main model under lock and key. This way you can automate the process of producing authentic-sounding texts, and in the end, you can’t tell the difference between spam and objectivity. Or do you? What do you think, dear readers?
_______________________________________________________________________
“Now who the hell is this Lakrobuchi?”
Do I hear some kind of disgruntlement in your voice, dear readers? Well, it’s time to clear everything up.
Because there is Collab-Notebook — this fantastic possibility also for non-coders to try GPT-2.
What is the first thing you do when you encounter Artificial Intelligence? Correct: you ask her if she has a soul. What I also did. The answer was dry and succinct:
Again, as plain text:
“This is a strange thing to ask. And I am simply trying to defend against it, and I usually try really hard to give truth to what’s being stated, and to speak without feeling ashamed of what’s being said in the article, as many characters need” ~ The Meme Now Early §9en
At first glance, AI is trying to resist my question about the soul. With a very complicated subterfuge of always trying to be true. So, since AI cannot (for some reason) answer this crucial question about a soul, it wants also to avoid later possible incriminations of telling an untruth. And so it just defends against the question. Checkmate!
But only at first glance. Because the whole is indicated as a quotation. And as a source:
“The Meme Now Early §9en”.
Wait, is it just a quotation from a (non-existing in our world) global meme database? The meme is a topos, a running gag, a clichéd idea, being evolving in various cultural appearances. For example, that artificial intelligence can (not) have a soul (compare “Matrix”, “Ghost in the Shell”, “Blade Runner” or “Do Androids Dream of Electric Sheep?”).
Does AI become ironic? Metafictional? Self-reflective? By positioning the rejection of the question of identity in the category “From the Meme Collection”?
It makes fun of our narrow-mindedness and mediocrity.
But let’s write stories together with the AI now. I wanted to continue the unicorn story, so I suggested the GPT-2 the following paragraph (in italics). What I’ve got was the finest product of Artificial Intelligence. Be ready for a strange, absurdist Dadaist journey through the visions of the computer.
PROMPT: There was an old Unicorn who wanted to understand how to fly. Then a crow came and said to him: OUTPUT: There was an old Unicorn who wanted to understand how to fly. Then a crow came and said to him: “Let’s fly. Who art thou?” He said: “Lakrobuchi! my holidayman brought me 30 torches”. He was sitting in the Yankee pavilion, in his left hand magnolia and his right playing La Reine Diné that made a popular warming hot hop. His wife reached for the table so as to put the right flowers into place but such is life in these quiet terrains that they’re nowhere to be found. I guess the political conversation had lost its liquidity. “Well then, Princess bow a bow of barley and mow that, aóngienenkat.” ?Vesta Falissimón? The rampant Ming donkey was behind me. I suddenly felt threatened.
A dream sequence. Multilingualism. Change of narrative perspective.
Several notes:
La Reine Diné
Probably a piece of music from the virtual dream sequence. “La Reine” = French for “The Queen”. “Diné” is another name for Navajo. It is particularly ironic to play the “Navajo Queen” in the “Yankee Pavilion”.
The Flowers
The flowers are not to be found in quiet areas, which also deprives the fluidity of political conversation. Absent flowers, politics, despair, dreams. Am I only one who hears “Le Bouquet sans Fleurs” by André Breton (in: La Révolution surréaliste, n02, 1925)?
Au pas… j’ai rêvé de mettre mes idées au pas, de m’abandonner à la cadencesourdede ma vie ; je ne voulais plus récolter le désespoir que je sème.
And some questions
- Who is a Holiday Man, and why 30 torches?
- In which language is “aóngienenkat” and what does that mean?
- What about Ming Donkey? Apparently, there is a musician with this pseudonym.
And… Lakrobuchi?
I couldn’t find anything about that. The only information about Lakrobuchi you can find with Google are my blogs and tweets you are reading.
We have become self-referential, ladies, and gentlemen.
Artificial Intelligence put us in the state of infinity.
Write your text suggestions in comments (1–2 sentences). I let artificial intelligence continue your stories.
Vesta Falissimón!
This essay was previously published in Medium.
UPDATE 1:
During 2019, OpenAI successively released the entire GPT-2 model, which can be used, for example, in this Colab Notebook with Java Prompt.
Temperature: The lower the temperature, the more meaningful and less random is the text.
top_k: attention to the text in words, with value 1 it just consider one word, with value 40 it pays attention to 40 words - so the text is more coherent.
Generate how much: the text length - I recommend the value of 1000, then you will get longer texts
UPDATE 2:
If it's too complicated, you can use TalkToTransformer.com web solution by Adam King using a full-sized 1558M GPT-2 Model. The only downside is: the texts are pretty short.
Don't forget to save texts - because every result is unique and will never be written again.
Which stories will you get? Post your findings in comments.
Lead Analytics/Reporting/BI initiatives to help Decision-Makers ? Supporting Projects & Programs to implement change, process, and tooling ? Learning everyday about Emerging Tech
5 年Hi Vladimir , do u know if someone already has tried it to , let’s say , take a text as input and generate a summarised and rephrased version of it avoiding plagiarism? Thanks