From Childhood Dreams to the AI Revolution
Dalle3-generated version of "Robbi, Tobbi and the Fliewatüt".

From Childhood Dreams to the AI Revolution

...or how I happened to plunge head-first into the most fascinating technological revolution we have witnessed since the Internet and the smartphone.


Back in my childhood days I had been binge-listening to a kids radio play called "Robbi, Tobbi und das Fliewatüt" - a story about a kid inventor who went on an adventure with a robot his "age" to solve three riddles the robot needed answered in order to succeed in "Robot Class". He distinctively needed a human counterpart his age to solve the three riddles. After I finished the entire 3-part cassette series I was filled with the urge to build a robot myself. A companion with whom I could also go on adventures: to the north pole, the black and yellow striped lighthouse or even explore the secrets within a supposedly haunted castle. This very same week I promised my mom I would invent a robot for her that would help her with house shores so she would have more time for the stuff she would like to do. I would finish my robot before I’d turn 25.

40 years have passed.

Did I build that robot? No. Did I buy that robot. No.

But we still talk about it today and she sometime smiles, when I report to her the innovations coming out of generative AI these days and then telling me it’d be about time I fulfill my early-days promise. Let’s just say I am still working on it but we seem to be still some time off but in secret I honestly hope she’ll get a chance to see “her robot” any time soon now.

As things are currently developing, she even might have a chance.

My mom as envisioned by a customized Stable Diffusion model using Automatic1111.

Let’s recap: last year it seemed the world was going into the Metaverse any time soon. At that time – back in February 2022, I conceptualized a Roblox experience for our Sales & Service department at Deutsche Telekom. Without any designer, not enough time or financial resources I was hard pressed when we were asked to deliver some virtual wallpapers for a building we wanted to build within the confines of “Beatland”, our Telekom residence within Roblox. I was looking to create these digital assets myself but it showed that I was nowhere near a professional designer. It must’ve been around that time when I discovered openAI’s Dalle2 by accident. I knew the company from my regular researches in the web, but hadn’t been aware of them developing something that would kick off my personal AI-journey: there was an empty text field waiting for me to enter some text, a description of what I wanted to see – as an IMAGE!

Inside our Roblox experience - left and right walls covered in AI-generated wallpapers.


I was blown away by the prospect of entering text to receive an image based on this description. Being a long-time copywriter in my professional past writing decent copy to convey a precise outcome truly resonated with me. As I started experimenting with Dalle2 I immediately realized the immense potential this technology would have for anyone being able to write a straight sentence. I wanted to do more but then credits ran out. It was just enough to create some decent digital wallpapers for our virtual world in Roblox and our external partner as well as colleagues were quite surprised when I produced these new digital assets. Where did you download them? Are they public domain? Can we legally use these? How much did you pay? The classical questions were mostly unnecessary. According to the ToS, we were allowed to use the results of my prompts (the textual inputs into the image-creating machine) commercially.

Inside our Roblox experience - the record-wallpaper in the back was AI-generated using Dalle-2.


But what if I was able to create my own images? On my own computer? Would that even be possible with all the data and AI shenanigans going on in the backend? After all openAI was a huge cloud-operated company – but maybe…?

Just a couple of month later I was able to install “Automatic1111”, a locally run server on my very own Windows PC that enabled me to prompt for my own images without any connection to any API or cloud instance. The first images were rather crude and wouldn’t stand a chance against Dalle2. But just a month later a huge, global community had developed around Automatic1111. There were new extensions, plugins and modules being developed almost by the day. Some were mediocre and quickly pieced together but some were real game-changers. Soon new image-generation models, so-called “Checkpoints”, later called “Safetensors” popped up. While Checkpoints were rather risky to download and install, Safetensors were...well, safer. I started experimenting with any model I could find, experimenting with various styles of models, negative prompts, LoRAS and new add-ons to the base server software.

My installation of Automatic1111 running a custom model trained on my face mixed with a downloaded model.


Watching forums grow and communities thrive proved to me how these developments were catapulting the generative AI scene into the future in an exponential fashion. I was blown away by what people of all walks of life could achieve when working together on a common goal – globally! With the advent of multiple models popping up I also gave training a model myself a shot. Working through YouTube tutorials and reading across GitHub I gathered all the info I needed to start my first image model training. Again, Automatic1111 came to the rescue with almost everything built in to the software (plus hours on end researching error messages in forums and GitHub). I quickly trained models of my father and mother just in time to gift them a cute Christmas present in the form of a stylized portrait of both of them.

Christmas gift portrait created using Automatic1111, out- and inpainting techniques.


In December something got my attention. It seemed to work exactly like the image prompting I got used to but this time with the outcome being yet another text, created by a machine. OpenAI had released chatGPT and would go on to have a lasting impact on an entire generation of students, employees, scientists, entrepreneurs, and all other "knowledge workers." Things seemed to speed up from there on. A couple of months later I was curious if my gut-feeling of an increased speed of innovation in the sector of generative AI would hold true to the facts. So, in March I took the liberty to document each day regarding new innovations in AI. Sure enough the list quickly grew for each day companies like Microsoft, Unreal, Google or Baidu released model after model, innovation on top of yet another innovation. And in that immense goldrush for the next big thing in AI one company truly stood out, NVIDIA. They sold the shovels for a goldrush in the making. Reflecting on their sky-rocketing stock value this seemed to not only be my personal feeling.

Generative AI timeline as of March, 2023.


As with image creation I asked myself again if “someone” would come up with a personalized version of something alike chatGPT that we humble AI nerds would be able to leverage on our own machines. Thus came about the AI chatbot revolution for our private computers run entirely local without any cloud or API connection, Oobabooga. And just as with Automatic1111, Oobabooga’s claim would be to “become the AUTOMATIC1111 webui for text generation”. And sure they delivered. Again, a Plethora of extensions followed up, created by a global open source community pushing each iteration to the max. And again, large language models (LLMs) just like the one driving chatGPT were slowly released one after the other. As of today chatGPT with GPT4 still holds the crown in terms of performance but open source models are closing in as I write this.

The chatGPT+ interface. For 20$ a month your get an image generator, Data analytics, image comprehension, multiple extensions and web browsing rolled into one smooth user interface.

After having installed the latest version of the pretty-easy-to-install “one-click installer” of Oobabooga on my harddrive I downloaded a couple of models via the Web UI’s own download mechanism and created a bunch of different Avatars/chatbots within the app using a json file for each character. This way I was able to create various “experts” for different fields of interest. Depending on the model I use to run these characters their precision in answering my questions would differ. After a while German speaking models were added as well as voice understanding via a local version of whisper_stt by openAI and voice output using the local version of the silero_tts module. So, in essence I was able to audibly chat with my personal locally installed chatbot companions before openAI released their voice bot for chatGPT+ users.

Screenshot of the character gallery within the Oobabooga chatbot UI.

Now we stand at the cusp of yet another evolutionary push in (generative?)AI: multimodality. This means that models will be capable of understanding more than just the text people enter as a prompt. Models will increasingly be able to make sense of photos, videos and audio being uploaded. Also spacial data that can then be used to create so-called “3D Gaussian Splatting” spaces are increasingly a topic. 2023 will see further developments in that regard, connecting the literal dots between multifaceted media assets such as image, video, sound and space.

chatGPT+ already offers image generation and recognition. It's corresponding mobile app includes voice in- and output.

As we approach 2024 things will not stop. Autonomous AI models are already in the open source pipeline as well as rumored to be in development at openAI. Bot2Bot communicaton will also be a hot topic especially for the service industry which will have to face the reality that chatbots will “DDOS” human service agents in every field. Why should humans do the call, if their personalized chatbot can work out contractual details with their energy, telco and utilities providers? Companies like “DoNotPay” lead the way in this topic. AR and VR will be infused with AI as image processing and generation will go hand in hand with developing these environments, sometimes even on the fly.

Homepage of "DoNotPay".

There is also no doubt that with all these developments things might also become quite ugly as bad actors will learn to leverage AI for their rather sinister purposes. “Dark UX” had been around for a long time already and AI will only contribute a whole new quality to that area. In about a year from now there will be elections in the US. Generative AI will probably play a key role in the time leading up to the event and beyond. The societal contract will undergo a stress test as we will not be able to differentiate anymore between what’s real and what will be artificially generated. A complete lack of trust in any media may very likely be the outcome. An outcome yet other tech like blockchain-based hashes for media assets may be able to remedy. But let’s be clear here: we started an arms race with technology a long time ago already. AI is just a new kind of weapon we will yield in our repertoire. As society starts adapting (as we have always done) to this new AI-reality we will learn, iterate and refine our approaches. Here’s to hoping that we’ll choose wisely and not with profit as our first priority on our agenda but the question how AI can benefit society as a whole.

Created using ideogram [dot] ai.



Nadine Brunner

Empower tomorrow: Digital Wave | Magenta Women

1 年

And it’s just like this .. glad I met you on your journey (and thanks for the xmas tip ??)

Robert Reichert

Die Zukunft ist noch gar nicht fertig

1 年

Ok Arno Selhorst let’s get this robot build verdammt nochmal! ??????

Eelko Lommers

I lead the experience design team at Zooplus. Global senior executive & Though leader. Transformation by design

1 年

Great read buddy. So recogniseable.

Dennis Schmedt

AI Enabling Manager, AI@HR, HR IT, Deutsche Telekom

1 年

Arno! Many thanks for sharing your insights on your journey. Really impressive. Let’s get ready for 2024

Stefan Kirschnick

Sr. Insights Analyst LinkedIn | GenAI Explorer

1 年

Jesus: check it out... nice read ;) You probably can find yourself in here as well? hehe.

要查看或添加评论,请登录

Arno Selhorst的更多文章

社区洞察

其他会员也浏览了