Artificial Misinformation
Miss Arty Pants Collage https://missartypants.blogspot.com/2014/04/surrealism-collage.html

Artificial Misinformation

I and many others have been sharing a lot of stories and comments on the many new AI tools exploding into the business world and this has generated a lot of conversation from losing the human touch to destroying livelihoods to bias, to the coming apocalypse. As a Bachelor of Fine Arts educated designer and artist, I have a fair amount of creative connections and most have a similar take on AI: that it's stealing. There is clearly a lot of misinformation about how AI-image tools work. Many think they work like an elementary school art class where you use scissors to cut up existing images and you glue them into a collage. I wanted to take the time to address them from the perspective of a creator an bystander, rather than a machine learning programmer.

No alt text provided for this image
Annie's Art Room: https://anniesartroom.weebly.com/elementary-art/kindergarten-sun-collages

AI tools like Dall-e, MidJourney, etc., are considered by most, a vast majority who have neither used them or researched them, to be simple collage tools, stealing bits and pieces from here and there, gluing them together to create a copyright infringement Frankenstein's monster. Far from stealing, they are learning, hence the term "machine learning". Tools like MidJourney use GANs, or generative adversarial networks. It works more closely to how a college level drawing class works: learning by repetition and observation. While studying for my BFA at the New England School of Art & Design our early “foundation” classes in color and drawing demanded repetitive tasks. Fill this board with 144 shades of red from lighter to darker. Using red, white and black. The next assignment: do it again with red, white and green. In drawing we would be assigned a vegetable (mine as a green pepper) and tasked with 10 5-minute drawings, 4 15-minute drawings, 2 half-hour drawings and a 1-hour drawing. Paper, charcoal, same angle.

No alt text provided for this image
Personal Student Work BFA Year 1. Medium: Charcoal on Paper


If you ever wondered why art students look thin and rung out, now you know. We skipped a lot of meals to buy all that paint and yes, we really did have 5-8 hours of homework every single day. And, according to my RISD-trained teachers, we had it easy.

Repetition is the key to how people earn and how machine learn. Imagine giving a child a crayon and asking it to draw a hat. For the better part of a year, the child is going to do one of a few things; drop the crayon, eat the crayon or stick the crayon up its nose. At a point, it will begin scribbling on paper. Even when it does connect with the paper, it still doesn't know what a "hat" is. Until you start to show it. Crayon after crayon, paper after paper, day after day, diaper after diaper the child keeps drawing, as the parent tried on hat after hat, holds up pictures of hats. If you've only been showing the child the same hat, you're going to get the same image of a hat back. But if you show lots of hats it will understand the concept of how hats COULD look, giving it the ability to then draw something it's never seen, trained on previous input.

No alt text provided for this image
Red Fedora Medium: MidJourney

This is, essentially how GANs work but a bit more brutal. A GAN is a 2-part program like an If/Then chart. One part, the Generator, creates an image. The other part, the Discriminator which says "yes" or "no", "hot" or "cold". Think of a ballet master yelling "again!" over and over as the machine goes from s scribble to a photorealistic image after billions of attempts a day. The machine literally learns by doing it wrong and doing it better while being its own taskmaster.

Not only did this happen during the training of the tools, but it continues every single time a new image is created as it's being created. "Does this look like a hat?", "no", "does this look like a hat", "close, again". Scroll through the slides below to see how MidJourney "develops" an image.

The generator basically scribbles and the discriminator holds it up against the samples of the millions of hats it has seen until it is good enough. This takes place through millions of iterations, getting more efficient everytime, until the discriminator can't tell the difference between source material and the AI image. And because tools like MidJourney can accept prompts much more complicated that "hat", including type of lighting, mood, type of camera, lens and film, color, angle, artistic style, medium, hat style, hat material, background dressing, etc., you're looking at 2 or more trillion computer calculations.

No alt text provided for this image
Dusty Brown Fedora Medium: MidJourney

This diagram shows how sample images (in the millions) are fed to the discriminator, NOT the generator. So the generator is making it's scribble without access to references, and the discriminator is matching the samples with the generator output, and providing feedback to the generator.

No alt text provided for this image
https://developers.google.com/machine-learning/gan/gan_structure

This image shows how the generator output is matched against the samples in a GAN application.

No alt text provided for this image
https://developers.google.com/machine-learning/gan/gan_structure

The discriminator keeps providing feedback until it can no longer tell if the generator image is a fake or real image.

It's impossible to talk about AI-image tools without addressing copyright infringement. As you can see from the process above, the tool isn't "stealing" and chopping up images. One aspect of the tool is creating while the other is checking against a database of sample images. The sample images are where copyright infringement could be considered. MidJourney, Dall-E, Stability, etc. used images from the LAION Image Database project, a non-profit who collected hundreds of millions of public images to build a sample database to train text-to-image tools. They license the database under the Creative Common CC-BY 4.0?license to developers and researchers. There has been some question that their non-curated, automated image scraping may have also included images scraped that were rights protected. For instance, they were found to have scraped Flickr images, which they removed. It's also possible that images posted online by owners were not properly rights managed or copyrighted. It's also possible that unscrupulous web users took images belonging to others and reposted them without permission or consent. Anyone who has every posted a meme is responsible for this type of copyright infringement.

No alt text provided for this image
All rights reserved; Star Trek: the Next Generation Paramount/CBS.


Essentially responsibility for copyright protection in my mind, and I'm not a lawyer, would fall with LAION who is distributing their images database as rights free, when it may not be completely.

Hopefully this dispels the misinformation that MidJourney, Dall-E and others are merely cutting and pasting pieces of copyright protected art together. For more information on how text-to-image works, or to verify the accuracy of my article, visit this PBS story: https://www.pbs.org/newshour/science/how-ai-makes-images-based-on-a-few-words



Dave Jenkins

VP of Product and Research at Iterate.ai

1 年

Nice post-- pretty accurate on how the text works. Some of the image generators may or may not be working on the discriminator models, now-- but that was certainly a major part of the evolutions.

要查看或添加评论,请登录

Michael Durwin的更多文章

社区洞察

其他会员也浏览了