How is AI supercharging Human Creativity?
Yasmine M.
AI/ML Tech Lead | 7+ years in Data & AI | #sustainability #data #machinelearning #future | Ex-Microsoft, Ubisoft
AI image generators are becoming ubiquitous and have seen a significant rise during the last summer with the release of open source models with tremendous capabilities such as MidJourney.
This topic is one I am deeply interested in and the advances in the field have been mind-blowing, especially since I am both an artist and a ML developer. Thus, I wanted to dig deeper into these new tools and give you an overview and my current thoughts.
This article is an extended version of a short talk I gave at Microsoft during an internal sharing session around various topics. I am part of the Canadian AI specialised cloud architects, where we help every organization to achieve more with data & AI.
First?: a primer on text-to-art generators
Prompt : “the Little Prince Saint Exupéry and his fox friend in the forest watching over a bunch of flowers” with MidJourney
This image has been generated with such a tool, I generated that by writing a text prompt, here it was ? the Little Prince Saint Exupéry and his fox friend in the forest watching over a bunch of flowers??
Of course in my mind I has a certain vision of that, Little Prince is a book and I wanted an illustration with the said characters. In that case, it seems to be a pretty good artwork,?quite abstract in a sense with an ethereal feel but still pretty good in its "reinterpretation".?It's shocking in that sense, as we could call this "artistic", however, let's see how it works behind the scenes and how it made this tool make such "believable" artworks.
Most of the recent image generator tools work like that, it takes a prompt as an input and then it takes a few seconds, up to 30 seconds I'd say to generate a few images.?
Under the hood, they mostly work with the latest breakthrough in text-to-images models : it's called diffusion models.?
The "text guided" part refers to the fact that the model is learning associations between millions of captions and image pairs.?That's how it can learn to generate images linked to a specific text prompt.?
As for the second part, which is diffusion, it refers to the fact that the model kind of tried to do ?the reverse process, understanding how each paint droplet was put on an artwork.?
Basically, it learns by destroying the training data by adding noise, so that's the first step in the figure below:
Then it learns to recover the data step by step by reversing the noising process so it can simply be illustrated like that.?
In fact, the model has learned from all of this and will be outputting pixel values step by step until full depiction of a picture.
It's not necessarily using the images in themselves, but it has millions to billions of model variables that are used each time it tries to generate something new, so it just needs a text prompt and it will generate something from noise.?You can also regenerate from the same text prompt and you'll usually get different results.
Let's compare different image generation tools
DALL-E by OpenAI
DALL-E by Open AI (private company) was the first one to gain critical exposure through its text-to-image capabilities, how does it compare with these open-source newcomers?
The name DALL-E is a combination of WALL-E (the Pixar movie) and Salvador Dalí the artist.
Here, I generated images depicting an astronaut in a tropical resort. The images generated are interesting and I wanted to modify the astronaut so that it was a distinctively female astronaut sipping a cocktail (yes, in space). With DALL-E, one can modify part of an image by highlighting it and changing the prompt, as you can see here, it was quite effective.
One can also go from an image and generate variations from it.
DALL-E also prides itself in product design ideation to image, I just put what I had in mind as a product as a text prompt and ta-da!, I got super realistic photos! Impressive!
MidJourney's tool
MidJourney is proposing a Discord bot with which to interact in order to generate images and we have two choices : upscale an image or generate other variations from it. One text prompt generates 4 different images in a 2x2 grid:
Here I chose to do a string of operations to observe MidJourney's capabilities in concept art & illustration as it's mostly well-known for that. It gave me some nice options even though it didn't exactly follow my prompt as I wanted. this process of iterating and reengineering the words that you use is not dissimilar to getting to know a new search engine and "fine-tune" your keywords, it's now almost an art, called "prompt engineering"
Who knows, maybe we'll have prompt engineers experts in the future x) Note: people are already publishing books and content related to prompt engineering and leveraging AI tools for artistic creation
I tried generating variations from the generated images by using the "v3" button referring to the bottom left image.
Midjourney wasn’t good with faces however, it’s been evolving pretty quickly.
You can also regenerate from the same original prompt and get different results.
Overall it wasn't necessarily what I had in mind in terms of artwork but it definitely was moving and an exciting experiment. Let's try some other prompts.
Midjourney concept art
Here, the concept art looks pretty good and abstract whereas I was referencing litterature (Dune, Spice) and had a character depicted in mind. When I upscaled the bottom right image, we can see that more details started appearing, that was interesting. The rendering does add in more pixels and sometimes the image differs a lot from the minimized version. On the right image, we can maybe see a "galactic princess" hiding in the sand?
I also used some video game characters as reference. Faces somewhat good. The reference of Thanatos is linked to death in greek mythology, which it seems to have picked up(didn’t get the references from the game that much, still looks pretty “cool” as for the artworks generated!)
So I wasn't expecting MidJourney to know them, but because it was trained on so many texts and images pairs, I was wondering if it would output something coherent.?
In the end it did.?It is aesthetically pleasing and realistic to have been "made" by a human.
About the process: at first you get something that's really blurry and it precise itself a bit more so you can see the model progressing step by step while using MidJourney's tool.
Just seeing the process can bring a sense of wonderment and joy in getting from the start of the pixel to the final picture!
Dreamstudio (Stable Diffusion)
Dreamstudio by Stability.ai is another open-source tool leveraging a model called "Stable Diffusion".
There definitely is a controversy?surrounding Stable diffusion and with reason, as we can see it performs marvelously well in creating images in the "style" of well-known contemporary artists. The reason is that it has been training on a dataset with images from these artists (which is a grey area, we'll ponder around that next).
The others haven’t been able to produce something quite similar although MidJourney prides itself in generating "popular" art (it definitely has a distinctive style and is able to appeal to a large audience).
Recap
From what I tried out, I tried making a table depicting the "specialties" for each of these tools.
领英推荐
At the moment : the artistic creations of an AI can't be protected under copyright laws. *DALL-E now allows faces
About the limitations
Overall, there is a lack of transparency and accountability : It was a bit hard to get info on which dataset has been used. This all poses a risk of rampant fraud and plagiarism because of the mainstream availability of these tools and technologies on the market.
Regarding MidJourney, it was billions of images, but also copyrighted art from Artstation and Deviantart that was used for training, that also explains why it's a good generated mainstream concept art and illustration.?As for Dreamstudio (stable diffusion), it's pretty similar as well.??
There also is an open source debate : most of the private companies are keeping models under wraps because they assume they are not safe for public release (up until there are mechanisms in place to prevent abuse) and accessible as a beta first: it's a pretty good principle in my opinion in terms of trying to apply responsible AI.
It seems to be more ethical to train a model on your own past creations or limit the artistic works from the public domain and open source images in a training dataset.
Potential – Technology at the service of creation
Technology is already being used to help automate tedious process in 2D and 3D animation : AI is filling the in-between frames by calculating realistic trajectories based on training, the following image sequence is such an example for a video game character (Ubisoft LaForge).
There are also features in digital painting software that are aiming to help artists such as :
Design at everyone's reach
Thus, the biggest change with this technology is that design, to go from ideation (you still need to go from text-based prompts now...) to a rendered design to share is accessible to anyone. This will have a tremendous impact in the way we relate to, interact with and go about creating art.
Some creators are not worried and it’s true that it doesn’t necessarily change the human being in the loop. Human collaboration, vision and reworks still exist in these kind of technical jobs dealing with an artistic vision. It actually enables creativity by breaching the gap between ideas and technical skills, it mostly feels like a superpowered search engine.
On my side, I wouldn't dare claim an artwork generated by it was my own because it would make no sense for me and with how I relate with art practice and my craft. However I can see the potential for product design, but also generating mood boards and getting inspiration from these.
Here are some examples of creators using these tools in new ways:
Using DALL-E & other tools to generate outfits while walking fake video by @karenxchang??: this creator generated outfits from DALLE and combined them all in a nice little demo, the image given to DALLE was their entire body, DALLE every time changed the outfit. They just had to assemble the static images.
This video is using AI to generate its images relating to music lyrics, it still required additional input beyond just the song lyrics to achieve the music video he was looking for: they modified lyrics, added keyframes for camera motion and synchronized them so it did take human effort.?
In conclusion, what does it means for creators?
In the end these tools enable creators to:
That’s why it’s realistic to say we still need artists and the model would still need that too to keep up-to-date with trends… so this would appeal now but what about in the future or really specific style that emerges tomorrow?
Potential sectors impacted
By the way, Microsoft Designer just released in closed beta (it uses DALL-E)!
To wrap it up, here's DALL-E newest feature, "out-painting", which is absolutely enthralling to see
“Girl with a Pearl Earring” with an “outpainting”-filled background.?August Kamp / OpenAI / Johannes Vermeer (I especially like the fact they put attribution to all the creators here)
Final thoughts around the topic
Overall, it was super fun to experiment with even though I was wary of its capabilities due to the hype and wariness coming from the art world.
There is the thrill of generating something from words in seconds and seeing what the AI will output, fed by all this human imaginary
CTO @ Beink Dream ?? Data, ML, AI, Software, Engineer ??????
1 年Thank you so much for this thorough article on the topic. I particularly enjoyed reading your take (pros, cons, warnings and opinion)! Even though I've gotten used to the idea of generated art, I still find DALL-E's visuals stunning. Jeanne Le Peillet, PhD this can help demystify the tech behind image generators for Beink Dream!
Consultant et Formateur Power BI | MVP Microsoft | Rejoignez-moi sur datacoach.ca (?? +100 modules gratuits)
1 年Amazing article ????
AI Research Engineer at Magic LEMP, PhD in computer science
1 年What a piece of work!! You've analyzed so many aspects of this topic that I'll have to read it several times to make sure I don't miss your valuable points!? Thank you Yasmine!