How to be specific in AI image creation process
Robert Svebeck
Driving Responsible AI Implementation in Region Stockholm / Karolinska University Hospital
Disclaimer: The pace of development in this area is so incredible that writing a "how to" article is almost a waste of time, because at the time of finishing the article, the method is likely already obsolete and replaced by a better method. So anyway, this is the way to do it in early October 2022... ;-)
What to do, when you need a very specific image but you can't draw it yourself.
This article might seem like an instruction, and part of it is. But at the end, I am going to also approach the topic from a strict business perspective. How can this new AI image generators have impact on Your business? In this perspective, I am targeting both artsists, photographers, content creators and their customers.
But first, let's look at what is already possible today, so you are aware of the new process for content creation.
The instruction part
A friend of mine wanted to make a Start Wars inspired retake on the famous painting "The Last Supper" by Leonardo Da Vinci.
As many of you know by now, using one of the existing AI image generators, like DALL-E, Midjourney or any of the different open soruce clones of Stable Diffusion, this is rather easy to do today. Easy and cheap, and even sometimes completely free. All you need to do is to give the AI a simple and descriptive prompt and the AI will generate perhaps something like this:
The prompt I used here is:
a mural painting of (star wars) as the last supper, darth vader at the center, centered, highly detailed oil painting, Albrecht
I will use this text style throughout this article for all prompts used. This article is however not a prompt instruction article, there are already plenty of those around.
As you see, the image above has some kind of pretty weak Star Wars theme. It's not at all an amazing image at this point. We have a Darth Vader in the center as a Jesus figure, and some kind of weird laser sword thing and thats about it. (btw.... is that Walter White from Breaking Bad next to Darth ?!)
If you spend more time on the construction of the prompt, you can get hundreds of amazing results. You might be able to add more Star Wars content into the image.... but there is one big problem. The content of the images will be highly random, and it is more or less impossible to write a prompt that is very long - the AI will not understand it well.
But often, especially in professional use cases, you need the image to have more specific content.
Imagine that what your client is asking for is exactly this:
Luke Skywalker on the left side and next to him The Mandalorian, then a Sith guy on his right side. You want to keep Darth Vader as Jesus in the center but have a storm trooper on his right and then Chewbacca next to the storm trooper on the far right side. Finally your customer also want R2D2 under the table, partly hidden by the table cloth and the whole scene should be insida a star wars space ship.
Then what? Is this the moment you call an artist and ask for a Quote?
Maybe later, but not yet. Because you can create still that image very specific image, without any drawing skills. I will get back to when you will need that artist.
Inpainting
The process to achieve this specific highly details image is called "inpainting" and is available in many existing AI tools, even as a plugin in Photoshop.
The concept of inpainting is about gradually, piece by piece, replace parts of an existing image with something new.
The process
We will use the above image, keep Darth Vader in the center and gradually replace everything else until we are happy with the image - until the image has the content we are looking for.
Lets start with change the character to the right of Darth Vader, and change him to a Storm Trooper. In the software we use (that has an inpainting feature) - we mark the area we want to work with (painting it with white color, like below):
And for that particular (white) area we create a new prompt. This will give us something like this:
man sitting at table wearing a storm trooper helmet from star wars
The process continues in this way. Replacing one thing at the time, with a new prompt, step by step the picture develops.
Chewbacca from star wars sitting at a table holding a light saber
领英推荐
emperor palpatine from star wars sitting at a table (wearing a black cloak and hood)
darth maul from star wars sitting at a table
Continue on changing the ceiling to look like a star wars space ship and also give more details on the storm troopers body.
bulkhead walls of a star wars space ship with portholes in middle
R2-D2 from star wars hiding beneath a tablecloth, legs sticking out
luke skywalker face, side profile
darth vader's hands grasping a table
At then end, after going through the process with more and more details, we get a final image, which we "upscale" to get a higher quality (1024x1024).
Image is done.
(Btw... "Upscalers" - another growing AI area, worth another article...)
The business perspective
Can this be done also with images of profesional photographic quality? Some might say yes, but in my opinion, the end quality is still not good enough, yet, at least it would require quite a lot of time to produce that quality, and probably still more efficient to hire an artist. Eventually for sure, probably already next year (or month) it will be possible?
The artist is still needed - but you will become a better customer to them
So if this was image was for a professional production to be used in marketing, you would most likely still need to get a real artist or a photographer involved to get that perfect professional quality. But the advantage is that you will be able to have a much smoother process towards the creator when you are describing what you are expecting as a result.
And photographers and artist should start to expect that their clients will start to deliver much more precise request.
For a private project, like a birthday card, school project or a club membership invite, this is absolutely good enough. Did we take the job from an artist? Not yet. The professional use still need the artist, but will be able to give a very detailed description of the job. The private use cases most likely rarely used an paid artist. This was solved by someone using copy/paste in MS Paint or similar, or some amateur painter drew it by hand for free.
How and when will this change your process?
I am very interested to hear your thoughts, and if you are an artist or photographer and want to learn more about this, please reach out. I am also absolutely interested in showing you what this is and how it works, and most of all, talk about how you can include this in your workflow. Just reach out!
Thanks for Reading,
Robert Svebeck
Special thanks to my friend @supermegason who was the one doing the actual images for this article. He is active on Discord, Youtube and Reddit under that username.