Spot the differences: How is AI art getting so much better, so fast?

Spot the differences: How is AI art getting so much better, so fast?


See how two halves of a neural network collaborate, 
in such programs, with one constantly telling the other: 
That’s not good enough. Over and over. Until it is.        

Think about it, and it could soon be a picture. On the newest frontiers of generative AI image-making, the user inputs neither text nor image; just thought.

What does that mean for the future of storytelling, art, and the human-AI relationship? We’ll get to that. First, a bit of recap.

Algorithm-generated art isn’t, of course, new. Computer programs have been creating patterns and shapes since the 1960s. The Matrix Multiplication series is an early example. Created by a German mathematician Frieder Nake in 1967, these 12 art works feature lines and geometric shapes, created by feeding mathematical operations into an algorithm.

A few years later came computer-aided design (CAD), where complex algorithms such as Aaron, created by artist Harold Cohen in 1973, created accurate design drafts. Advanced software such as Adobe’s Photoshop, released commercially in 1990, made it possible to alter an existing image, and merge images. It became harder to tell fact from imagination.

Computer-generated imagery (CGI) and special effects (also called SFX or FX), geared towards moving visuals and filmmaking, emerged.

The next big leap for still images arrived in 2015, with the first artificial intelligence or AI-driven generative networks for art: Google’s DeepDream. It used real images to create surreal art; Slate magazine called its work dazzling and creepy.

In 2021, this niche tool was replaced by an overnight sensation: OpenAI’s Dall-E. Within months, Stability AI’s Stable Diffusion was out, and so was Microsoft’s Nuwa-Infinity. By 2022, Midjourney, NightCafe and StarryAI had been added to the list.

Dall-E 2 now forms part of the foundation of the Microsoft Designer app and the Bing AI suite. And this month, Stability AI released its newest tool, Stable Doodle, which can turn a rough sketch into a realistic image inspired either by a style of drawing or photography.

Meanwhile, this year was marked by two significant inflection points (both in March).

No alt text provided for this image
The two AI-generated images that made news in March. One fooled most viewers into thinking Pope Francis had stepped out in a puffy white coat; the oth

First, Pope Francis was depicted in a puffy white coat, on social media platforms, in an image that most viewers thought was real. And an AI-made image fooled judges at the Sony World Photography Awards and walked away with a prize; it was submitted by Berlin-based Boris Eldagsen, to prove a point.

Those two works were created using Midjourney and Dall-E 2 respectively.

The latter allows the user to “control” settings for a virtual lens and aperture, which is partly why the fake photo that won the contest — a 1950s-style vintage-look portrait of two women — looked so real.

While that image was the work of a professional photo-artist, the image of the pope was created by a 31-year-old construction worker named Pablo Xavier (last name withheld), from Chicago. Without taking away from the latter’s viral achievement, that’s how easy it is.

Quick draw

So, how does AI create images with such high levels of visual authenticity?

Well, these programs run on what are called Generative Adversarial Networks, or GANs. And GANs work a bit like a live digital-art class.

Say a user types in a text cue: photograph-style image of woman at India Gate, with windblown hair. Two neural networks now get to work (they’re called neural networks because they’re modelled on the human brain’s rapid-fire input and response mechanisms).

The first neural network is the generator. Its job is to create the image it has been asked for. Behind the scenes, the second neural network acts as the discriminator.

As the image takes shape, it essentially scans it for inaccuracies, points them out to the generative program and says: That’s not good enough.

How does the discriminator network know? It analyses the created image against elements from a pre-learned database — woman, India Gate, windblown, hair. It examines each version created by the generator network for misrepresentations or inaccuracies in object, aesthetics, lighting, style.

When there is nothing left to challenge or flag, the result is sent out to the screen, and the user.

As with all AI-driven models, the more it is used, the more it learns. In recent months, the programs have received millions of new data sets that detail exactly what went wrong, and this has helped it too. Which images that were tagged as fake, and why? It now has some of the answers.

Incidentally, as the program continues to be trained, such information is released to the two halves separately.

“We keep the generator constant during the discriminator training phase. As discriminator training tries to figure out how to distinguish real data from fake, it has to learn how to recognize the generator’s flaws. Similarly, we keep the discriminator constant during the generator training phase. Otherwise the generator would be trying to hit a moving target and might never converge,” is how Google explained it, in a blog from last year.

Next up: A possible “thought” model, in which an AI-driven art program reproduces what the user is thinking.

In a pre-print research paper released in December, systems neuroscientists Yu Takagi and Shinji Nishimoto from Osaka University say they have created a model that can capture neural activity with about 80% accuracy, to reproduce thoughts. Among other tools, the researchers used an advanced logical data model (LDM) paired with Stable Diffusion.

For now, there are no regulations in place to govern the use of these technological innovations. Applications that can help detect AI-generated images are still being built; it is unlikely these will ever achieve 100% accuracy.

The world is sailing into uncharted territory. Europol, the European Union’s law enforcement agency, predicts that as much as 90% of content on the internet may be created or edited with the use of AI, by 2026.

What would a future in this world look like? Just think about the implications, let the image take shape, and say when.

By Vishal Mathur


Also Read: In the era of AI, focus shifts to management of data collected from users

Also Read: Words to images: Blurring boundaries between real memories and AI creations

Sanjay Gupta

Medical & Health care

1 年

??

回复

要查看或添加评论,请登录

社区洞察