How is AI supercharging Human Creativity?
Image generated using MidJourney

How is AI supercharging Human Creativity?

AI image generators are becoming ubiquitous and have seen a significant rise during the last summer with the release of open source models with tremendous capabilities such as MidJourney.

This topic is one I am deeply interested in and the advances in the field have been mind-blowing, especially since I am both an artist and a ML developer. Thus, I wanted to dig deeper into these new tools and give you an overview and my current thoughts.

This article is an extended version of a short talk I gave at Microsoft during an internal sharing session around various topics. I am part of the Canadian AI specialised cloud architects, where we help every organization to achieve more with data & AI.

First?: a primer on text-to-art generators

No alt text provided for this image

Prompt : “the Little Prince Saint Exupéry and his fox friend in the forest watching over a bunch of flowers” with MidJourney

This image has been generated with such a tool, I generated that by writing a text prompt, here it was ? the Little Prince Saint Exupéry and his fox friend in the forest watching over a bunch of flowers??

Of course in my mind I has a certain vision of that, Little Prince is a book and I wanted an illustration with the said characters. In that case, it seems to be a pretty good artwork,?quite abstract in a sense with an ethereal feel but still pretty good in its "reinterpretation".?It's shocking in that sense, as we could call this "artistic", however, let's see how it works behind the scenes and how it made this tool make such "believable" artworks.

Most of the recent image generator tools work like that, it takes a prompt as an input and then it takes a few seconds, up to 30 seconds I'd say to generate a few images.?

Under the hood, they mostly work with the latest breakthrough in text-to-images models : it's called diffusion models.?

The "text guided" part refers to the fact that the model is learning associations between millions of captions and image pairs.?That's how it can learn to generate images linked to a specific text prompt.?

As for the second part, which is diffusion, it refers to the fact that the model kind of tried to do ?the reverse process, understanding how each paint droplet was put on an artwork.?

Basically, it learns by destroying the training data by adding noise, so that's the first step in the figure below:

No alt text provided for this image

Then it learns to recover the data step by step by reversing the noising process so it can simply be illustrated like that.?

In fact, the model has learned from all of this and will be outputting pixel values step by step until full depiction of a picture.

It's not necessarily using the images in themselves, but it has millions to billions of model variables that are used each time it tries to generate something new, so it just needs a text prompt and it will generate something from noise.?You can also regenerate from the same text prompt and you'll usually get different results.

No alt text provided for this image

Let's compare different image generation tools

DALL-E by OpenAI

DALL-E by Open AI (private company) was the first one to gain critical exposure through its text-to-image capabilities, how does it compare with these open-source newcomers?

The name DALL-E is a combination of WALL-E (the Pixar movie) and Salvador Dalí the artist.

No alt text provided for this image

Here, I generated images depicting an astronaut in a tropical resort. The images generated are interesting and I wanted to modify the astronaut so that it was a distinctively female astronaut sipping a cocktail (yes, in space). With DALL-E, one can modify part of an image by highlighting it and changing the prompt, as you can see here, it was quite effective.

One can also go from an image and generate variations from it.

No alt text provided for this image

DALL-E also prides itself in product design ideation to image, I just put what I had in mind as a product as a text prompt and ta-da!, I got super realistic photos! Impressive!

MidJourney's tool

MidJourney is proposing a Discord bot with which to interact in order to generate images and we have two choices : upscale an image or generate other variations from it. One text prompt generates 4 different images in a 2x2 grid:

No alt text provided for this image
No alt text provided for this image

Here I chose to do a string of operations to observe MidJourney's capabilities in concept art & illustration as it's mostly well-known for that. It gave me some nice options even though it didn't exactly follow my prompt as I wanted. this process of iterating and reengineering the words that you use is not dissimilar to getting to know a new search engine and "fine-tune" your keywords, it's now almost an art, called "prompt engineering"

No alt text provided for this image

Who knows, maybe we'll have prompt engineers experts in the future x) Note: people are already publishing books and content related to prompt engineering and leveraging AI tools for artistic creation


No alt text provided for this image

I tried generating variations from the generated images by using the "v3" button referring to the bottom left image.

Midjourney wasn’t good with faces however, it’s been evolving pretty quickly.

You can also regenerate from the same original prompt and get different results.


Overall it wasn't necessarily what I had in mind in terms of artwork but it definitely was moving and an exciting experiment. Let's try some other prompts.

Midjourney concept art

No alt text provided for this image

Here, the concept art looks pretty good and abstract whereas I was referencing litterature (Dune, Spice) and had a character depicted in mind. When I upscaled the bottom right image, we can see that more details started appearing, that was interesting. The rendering does add in more pixels and sometimes the image differs a lot from the minimized version. On the right image, we can maybe see a "galactic princess" hiding in the sand?

No alt text provided for this image

I also used some video game characters as reference. Faces somewhat good. The reference of Thanatos is linked to death in greek mythology, which it seems to have picked up(didn’t get the references from the game that much, still looks pretty “cool” as for the artworks generated!)

So I wasn't expecting MidJourney to know them, but because it was trained on so many texts and images pairs, I was wondering if it would output something coherent.?

In the end it did.?It is aesthetically pleasing and realistic to have been "made" by a human.

No alt text provided for this image

About the process: at first you get something that's really blurry and it precise itself a bit more so you can see the model progressing step by step while using MidJourney's tool.

Just seeing the process can bring a sense of wonderment and joy in getting from the start of the pixel to the final picture!

Dreamstudio (Stable Diffusion)

Dreamstudio by Stability.ai is another open-source tool leveraging a model called "Stable Diffusion".

No alt text provided for this image

There definitely is a controversy?surrounding Stable diffusion and with reason, as we can see it performs marvelously well in creating images in the "style" of well-known contemporary artists. The reason is that it has been training on a dataset with images from these artists (which is a grey area, we'll ponder around that next).

The others haven’t been able to produce something quite similar although MidJourney prides itself in generating "popular" art (it definitely has a distinctive style and is able to appeal to a large audience).

No alt text provided for this image

Recap

From what I tried out, I tried making a table depicting the "specialties" for each of these tools.

No alt text provided for this image

At the moment : the artistic creations of an AI can't be protected under copyright laws. *DALL-E now allows faces


About the limitations

  1. It’s polluting to train such models. They made GPU machines run for days (around 30 days for some) with a lot of compute needed, it was hard to get information about that.?It also has billions of variables that are used each time it generates something new! Models are becoming more and more optimized though, now you can even run Stable diffusion model on consumer grade graphic cards because it “only” has 890 million variables.
  2. There is Bias within the models. Because you need a large training dataset, these companies have been using web scraped datasets such as LAION, which is indeed authorized for research purposes. It means it’s ‘‘not meant for real-world production or application.” but these tools then become available to the public for commercial use so that’s problematic. Scrapped data means that it’s not curated, it comes with inherent biases and it can include harmful stereotypes and representations that are then learned by the model. DALL-E did remove the most explicit content from the training data. Some of the companies also have content policies where you are not allowed to query specific prompts and you have to comply with their safety policy.
  3. fast-paced : The field will keep evolving quickly and legislation has a hard time to keep up, there is no legislation over what is generated, AI generated images are not copyrighted even though some companies behind these tools claim copyright over them and offer commercial use under conditions to their customers.
  4. In terms of ethics, it's also a sensitive topic: who's the creator in the end? Is it the user, the model or the original artists (a combination of all?). For research purposes, it’s fine to use scrapped data, the problem arises when these tools are provided as services that you can use commercially. Furthermore, how would it be possible for artists to take down their art from the training data if their data has been used against their will? They can't do that because the model has already been trained and learned from it, so it's still blurry.

Overall, there is a lack of transparency and accountability : It was a bit hard to get info on which dataset has been used. This all poses a risk of rampant fraud and plagiarism because of the mainstream availability of these tools and technologies on the market.

Regarding MidJourney, it was billions of images, but also copyrighted art from Artstation and Deviantart that was used for training, that also explains why it's a good generated mainstream concept art and illustration.?As for Dreamstudio (stable diffusion), it's pretty similar as well.??

There also is an open source debate : most of the private companies are keeping models under wraps because they assume they are not safe for public release (up until there are mechanisms in place to prevent abuse) and accessible as a beta first: it's a pretty good principle in my opinion in terms of trying to apply responsible AI.

It seems to be more ethical to train a model on your own past creations or limit the artistic works from the public domain and open source images in a training dataset.


Potential – Technology at the service of creation

Technology is already being used to help automate tedious process in 2D and 3D animation : AI is filling the in-between frames by calculating realistic trajectories based on training, the following image sequence is such an example for a video game character (Ubisoft LaForge).

No alt text provided for this image

There are also features in digital painting software that are aiming to help artists such as :


No alt text provided for this image

  • Smart filling : knowing where to color based on a lineart.

No alt text provided for this image




  • Denoising (Resample images and erase jpeg noise to reuse your past work as a larger image.)

No alt text provided for this image
No alt text provided for this image



  • Colorization (experimentation phase in most software, not working that well)






Design at everyone's reach

Thus, the biggest change with this technology is that design, to go from ideation (you still need to go from text-based prompts now...) to a rendered design to share is accessible to anyone. This will have a tremendous impact in the way we relate to, interact with and go about creating art.

Some creators are not worried and it’s true that it doesn’t necessarily change the human being in the loop. Human collaboration, vision and reworks still exist in these kind of technical jobs dealing with an artistic vision. It actually enables creativity by breaching the gap between ideas and technical skills, it mostly feels like a superpowered search engine.

On my side, I wouldn't dare claim an artwork generated by it was my own because it would make no sense for me and with how I relate with art practice and my craft. However I can see the potential for product design, but also generating mood boards and getting inspiration from these.

Here are some examples of creators using these tools in new ways:

Using DALL-E & other tools to generate outfits while walking fake video by @karenxchang??: this creator generated outfits from DALLE and combined them all in a nice little demo, the image given to DALLE was their entire body, DALLE every time changed the outfit. They just had to assemble the static images.

This video is using AI to generate its images relating to music lyrics, it still required additional input beyond just the song lyrics to achieve the music video he was looking for: they modified lyrics, added keyframes for camera motion and synchronized them so it did take human effort.?

In conclusion, what does it means for creators?

In the end these tools enable creators to:

  • remove the technical barrier associated with mastering craft and tools, it allows people to go from ideas and generate art or realistic photos from them, that can get them inspired and are shareable.
  • use it as an effective way to get inspired quickly (reference images), Integrate into their workflow (inspiration, rework)
  • Get too reliant: image generators are still based on the training dataset it was fed, equivalent to a “knowledge base” : it thus can limit creativity & restrict vision if one only relies on that

That’s why it’s realistic to say we still need artists and the model would still need that too to keep up-to-date with trends… so this would appeal now but what about in the future or really specific style that emerges tomorrow?

Potential sectors impacted

  • Media industry?
  • Design and fashion industry
  • Etc.

By the way, Microsoft Designer just released in closed beta (it uses DALL-E)!

To wrap it up, here's DALL-E newest feature, "out-painting", which is absolutely enthralling to see

No alt text provided for this image

“Girl with a Pearl Earring” with an “outpainting”-filled background.?August Kamp / OpenAI / Johannes Vermeer (I especially like the fact they put attribution to all the creators here)

Final thoughts around the topic

Overall, it was super fun to experiment with even though I was wary of its capabilities due to the hype and wariness coming from the art world.

  • There needs to be an open conversation, because this impact how we relate to, interact with and understand art and creativity in general
  • The changes are hard to follow but I'd advise anyone to try and keep up with the new possibilities offered out there. In my case it’s when I see communities that I am part of impacted that I try to gain knowledge about it. I prefer to demystify things and as a technologist and artist, I do use popular features from software daily to help me automate some of the tasks in digital painting but this is a whole other level. Personally, it gives me no joy and wouldn’t come to my mind to say that something an AI model generated is mine, However

There is the thrill of generating something from words in seconds and seeing what the AI will output, fed by all this human imaginary


If you have found it interesting: You can look for my other articles, you can also subscribe to get notified when I publish articles, and you can also follow me or reach out to me on LinkedIn. Thank you for reading?:)

Jeanine Harb

CTO @ Beink Dream ?? Data, ML, AI, Software, Engineer ??????

1 年

Thank you so much for this thorough article on the topic. I particularly enjoyed reading your take (pros, cons, warnings and opinion)! Even though I've gotten used to the idea of generated art, I still find DALL-E's visuals stunning. Jeanne Le Peillet, PhD this can help demystify the tech behind image generators for Beink Dream!

Benjamin Ejzenberg

Consultant et Formateur Power BI | MVP Microsoft | Rejoignez-moi sur datacoach.ca (?? +100 modules gratuits)

1 年

Amazing article ????

Raphael Larsen

AI Research Engineer at Magic LEMP, PhD in computer science

1 年

What a piece of work!! You've analyzed so many aspects of this topic that I'll have to read it several times to make sure I don't miss your valuable points!? Thank you Yasmine!

要查看或添加评论,请登录

社区洞察

其他会员也浏览了