A quick guide on Artificial Intelligence for  data designers and curious minds.

A quick guide on Artificial Intelligence for data designers and curious minds.

I want to thank Boldtron, Eduard Corral, and Víctor Perez for their constant contribution and generosity on the path to Artificial wisdom.

This article is the first of a series of articles about how the latest artificial technologies can be integrated and impact the data visualization and information design industry.

Early this year, I was lucky enough to be close to some of the first Dalle2 beta testers, and I’ve been able to follow the evolution of text-to-image tools from the very early stages of their public release. The first thing I have learned is how fast written content around this topic becomes obsolete, so acknowledging that, I’ll do my best to see that in a couple of months, at least 50% of what I share here is not ancient knowledge.

No alt text provided for this image

First, a quick disclaimer of where we stand as Domestic Data Streamers regarding Artificial intelligence: even if text-to-image is a fantastic tool that is creating blasting results, we still believe the technology we are using is quite dumb; as you will soon discover, you shouldn’t be misled by the term ‘artificial intelligence’, it is not that intelligent.

Noam Chomsky put It amazingly well in this article, summarizing it; the technology we are using is just statistics; it doesn't know anything nor understands it. It follows orders of very complex mathematical models. As Chomsky explains, if you put a camera in front of a window with these AIs at the end of a year, the algorithm can predict what will happen next in the window. However, the intelligent thing is not to know “what” but “why” this will happen, and today’s AI is still a long way from this kind of intelligence.

1- THE THREE BIG PLAYERS

I will make a very simplified summary for people who are just getting started with the text-to-image tools and know nearly nothing. There are many programmes available but I will talk about the three most prolific actors today; Dalle2 by OpenAi (owned by Microsoft), Stable Diffusion, and Midjourney.

In my experience so far, Stable Diffusion is much better at details and quality, and Dalle2 is better at composing complex scenes and photographic finishes. Midjourney, on the other hand, has mastered a specific style; it is easy to spot with a recently trained eye whether an image has been created with Midjourney but with every new model you can see how it gets closer to the others. (You should check Midjourney last version here)

No alt text provided for this image

The results you get from Midjourney are different from Dalle2 and Stable because of a conscious decision from their team. David Holz (Midjourney’s CEO) has repeatedly explained how he doesn't want the software to create images that can be mistaken for authentic images because of the potential danger it can have in contributing to fake news or other misuses of the technology. This is not only worrying Midjourney, last month Adobe, Leika, and Canon also started working together in the development of a new credential software that can certify that an image has been taken with a camera and has not been through an AI. Yes, a bit dystopian; it feels like asking every person you meet for a birth certificate because otherwise, how can you be sure it’s a real human being? (Blade Runner fans are getting excited.)

Now, going back to Stable Diffusion, its main characteristic is that it is open source, you can access the model, you can train it on your own with new words, objects, and ideas, and you can run it in your independent computer without any corporate supervision and internet connection, you can download it for Mac and for PC. That's both a blessing and a danger, of course. As the algorithm runs without any barrier or censorship, all kinds of creations have emerged, and we only need to go through a few Reddit forums to see what's coming up…

No alt text provided for this image

But if you don't want, or can't set up your computer running the algorithm, you can always use Stable Diffusion within a digital framework; you can use Stable Cog, AI Playground or Dreamstudio, and both of them work amazingly well; also bear in mind that they have different costs, here you have a quick price comparison:

AI Playground (Stable Diffusion): $0,00025/image ($10/month for 2.000 images a day)

Midjourney: $30/month for unlimited images

Dreambooth (Stable Diffusion): $0,01/image ($10 for 1.000? image generations)

Dalle2: $0,032/image ($15 for 460 image generations)?

2- TOOL CATALOG

Now, having introduced our three main agents, I want to introduce some other tools that have been essential to my process of better understanding this technology:

2.1- Prompt Portals

No alt text provided for this image

My favorite, without a doubt, is Krea.ai; it’s a platform in constant transformation that enables you to find the best results for Stable Diffusion; this is a must. Through this platform, you will be able to navigate through a similar Pinterest-style environment, whilst discovering the prompts behind each image. That's the basis of any learning process, copy and replicate what you like and what you love, test the prompts, modify them and explore how that affects your creations. Still, Krea is working on building up a platform similar to AI Playground but with much more capabilities; if I were to recommend just one platform to follow how it evolves, this would be the one. You can also find Public Prompts, a platform developed by a junior medical doctor from Lebanon trying to fight against the prompt marketplace or Lexica.

2.2- Prompt Books

This is the panacea not only to learning but also to getting inspired; the first prompt book I ever read was one on Dalle2 and it changed my understanding of the tool and encouraged me to experiment on a whole new level. In the studio, we’ve now developed our own Basic Prompt Book and another explicitly focused for Data Visualization Prompts.

I strongly recommend that as an exercise you produce your own. In the process, you will learn what works and what doesn't; for my students and work colleagues it’s been a game changer.

2.3- Reversing Prompts

No alt text provided for this image

Sometimes to truly understand something, you have to break it. In the case of text-to-image, I have used two tools; the first is to reverse engineer an image and search for the prompt that created it. Img2prompt and CLIP do precisely that. The other one is to understand how an algorithm has been trained; which images are the ones that define what the algorithm will build? For that, we have Have I Been Trained?, I believe this method is fundamental because it shows how as we are using these tools, we are drawing from millions of images by other artists, photographers, and random people that build up the data sets from which we generate new creations (or we could call them spawns).

2.4- Prompt creators

No alt text provided for this image

Now, you already understand the main idea behind text-to-image, but you will need assistance to create the best prompts to keep innovating and improving your image results. For that, we have several other software we can use. We’ve got Phrase models that bring you a lot of different variables to think about. For Dalle2 specifically, I recommend this chrome extension; it turns the prompting process into a click-on experience. For Midjourney you’ve got Noonshot which is just amazing in terms of how technical you can get. If none of these convince you, you can also test Promptmania.

2.5- Technical Semantics

No alt text provided for this image

If you have arrived here, I'm sure you have come to realize the importance of language use; it is not the same saying I want "a cyberpunk woman portrait” as a "highly detailed portrait of a post-cyberpunk Spanish old woman, 4k resolution, mad max inspired, pastel pink, light blue, brown, white and black color scheme with symbols and graffiti, cinematic light, 35mm, Kodak filters'' the more specific the prompts, the better the results, so I recommend you search for the correct vocabulary to define each aspect, from painting techniques to cinematic techniques, to kinds of photographic film you can use. Again you can find some of these treatments in our promptbooks. Still, you’ll also benefit greatly from exploring references outside of the classic prompt books and finding specialized semantics from the space, you truly want to study, from the kinds of camera to the films you use to shoot, art movements, painting techinques.

2.6- Enhancing

At a certain point, you will need to improve the quality of the images you create; the truth is that all the text-to-image tools right now give you very standard resolution images Dalle2 has reduced a lot to 1024px x 1024px, Midjourney can upscale it up to 1664 x 1664. Of course, Stable Diffusion can work on building much bigger images if you program it; we have diffused images at 4k already in the studio, and Boldtron is up to 6k, but you will need to build up a tool for that. If needed, you can use other essential upscaling tools. I use Photoshop, but you can use Bigjpg or VanceAI for that.

No alt text provided for this image

Other than the size of the image, you may be looking to enhance a specific part of the image, sharpen the borders, or focus on a blurry face. As I said before, Dalle2 is impressive for creating a scene, but it sucks when making more than one face at a time. Luckily we can use tools like GFPGAN or FaceRestoration to enhance them with good results. And this has become a fundamental part of the work of building new images. The best is always to cross over tools.

3- DATA VISUALIZATION

No alt text provided for this image

Right now, we are in the early stages of understanding how this can be a tool for us data storytellers, and we have started from the very beginning, the study of semantics and quantitative semantics; in this prompt book, you will find the research done upon some of the essential qualities that you can find in a visualization, shape, color, density, contrast and all of the basics that our beloved cartographer Jaques Bertin put up as a basis of data visualization.? In the book, you will find a simple guide to creating your semantic fields and some early experiments we have done using them, like the ones below.

No alt text provided for this image

4- POTENTIAL USES

I'm confident that these tools will become mainstream as they get easier to use, better designed, more centralized (sadly), and more result oriented, right now we are at this moment where most of the users are early adopters learning and experimenting, still without showing hard intention behind it. I've started to see the first cases that point out the beginning of the tool's potential, like this one, working on civic engagement around the future architecture of a place.

No alt text provided for this image

Text-to-image has the power to eliminate interface barriers and, second, the potential to imagine almost anything that is in the mind of the users. That puts enormous potential in the hands of people who often go unheard and disempowered by the lack of projection and design tools. This is just one case, but soon we will disclose more possibilities we are already developing in our studio.

5- CONCERNS

These tools have opened a massive debate on authorship and the threat they pose to many creative professions, so I will share the main insights behind two of the best articles I’ve read on the topic (and somehow antagonistic).

This one from Future History by Daniel Jeffreys, answers if AI is stealing “art” and how to manage intellectual property. Here is an excerpt:?

“There's a growing fear of AI training on big datasets where they didn't get the consent of every single image owner in their archive.? This kind of thinking is deeply misguided and it reminds me of early internet critics who wanted to force people to get the permission of anyone they linked to.? Imagine I had to go get the permission of Time magazine to link to them in this article, or Disney, or a random blog post with someone's information I couldn't find.? It would turn all of us into one person IP request departments.? Would people even respond?? Would they care?? What if they were getting millions of requests per day?? It wouldn't scale.? And what a colossal waste of time and creativity!”

The second article is called Towards a Sustainable Generative AI Revolution by Javier Ideami . It offers a more profound insight into how these models work compared to our brains and builds a critical argument around how these AI models have been constructed without the artist's consent. Summarizing his summary:

The good news:

  • Generative AI won’t replace human creativity. It will enhance it.?
  • This technology demystifies creativity. Think of what Edison said: Genius is 99% perspiration (combination, recombination, productive work and experimentation) and 1% inspiration (establishing the seeds, polishing, etc). Thanks to this new technology, we now realize that we can automate a large percentage of the creative process.
  • Although some jobs are in danger, it is also highly likely that new roles that we cannot yet imagine, will emerge from the need to manage and interact with this technology.
  • A good number of people who may not be professional artists, but that have a natural predisposition to exercise their creative muscles, will thrive with this new technology. They will strengthen those muscles in faster and easier ways, and they will enjoy new opportunities to augment and amplify their creative potential.

The tricky:

  • AI generative systems are only possible because of the giant datasets of images, videos, and text that are used to train them. Some of the data used in the datasets to train AI models are public domain data.?
  • But, a good part of the data used in these datasets belongs to living artists that have not declared it to be public domain data.?
  • These are artists that make their living by selling such data (selling their decades of hard work that have produced a specific style and a series of works).
  • These artists are, indeed, the foundation on which this revolution is supporting itself on its meteoric rise. And so, an increasing chorus of living artists are complaining about this. Some of them state that the works of living artists should not be included in these datasets. According to some, their complaints have fallen on deaf ears, largely being ignored (at least so far).
  • I will always support generative AI, but above all, I will support and defend my fellow creatives (because people and their lives should always matter more than technology).?

No alt text provided for this image


That’s it! I hope you enjoyed this article; in the following, we will explore more about applying the prompt book to real data visualization projects and a first hint on working with AI video creation and data. If you are interested in this, subscribe to our data newsletter!

Eva Domínguez (PhD)

Founder, Creative Director & Immersive Narrative Expert (focus on Augmented Reality). PhD on Immersive Journalism.

2 年

Muchas gracias por compartir conocimiento

Excellent, Pau! Thanks!

Nicole Lachenmeier

Co-Founder Superdot – visualizing complexity, Data Experience Design expert, UI/UX Designer, Author of "Visualizing Complexity”, Lecturer HSLU

2 年

Thank you Pau Garcia for sharing these insights!

Carole Gendron

Data & AI Strategist | Creative Technologist | Artist | Turning AI into Human Stories

2 年

要查看或添加评论,请登录

Pau Aleikum Garcia的更多文章

社区洞察

其他会员也浏览了