Generative AI for Image Generation - GAN

Generative AI for Image Generation - GAN

Generative adversarial networks (GANs) are one of the hottest topics in deep learning. They can generate an infinite number of similar image samples based on a given dataset. The underlying idea behind GAN is that it contains two neural networks that compete against each other in a zero-sum game framework, that is a generator and a discriminator.

Welcome !!! It is always a great idea to get a story from a picture. Here is a story where a forger is quite smart. He sells the fake milk and milk shop-owner can tell that it is fake. The forger is smart as he starts learning from feedback given by shop-owner. Every next time, he produces a bit less fake milk and learns from feedback. Eventually, the forger would be able to over smart the shop owner. The forger, finally, generates the milk as close as real milk.

Congratulations!!! You have understood GAN. Here the forger is the generator, and the show owner is the discriminator. GAN consists of two deep learning models, one generator model and one discriminator model. In short, we can summarize GAN as follows

  1. Generator(forger) generates/creates/manufactures milk and
  2. Discriminator(milk shop-owner/ expert person who knows real milk and fake milk)
  3. In the first go, generator (forger) generates milk and discriminator (milk shop-owner) can tell that this is fake. Learning from feedback/loss function, the forger (generator) improves next time, again discriminator tells that this is fake, but this fake would be less fake than the first one. In this way, feedback helps the forger improve the milk quality until the time discriminator says that the milk is real.

Here is the formal look of GAN Architecture

GAN is inspired by the zero-sum non-cooperative game. It means that if one wins, the other loses. A zero-sum game is also known as minimax. Player A wants to maximize its actions, but player B want to minimize them. In-game theory, the GAN model converges when the discriminator (player A) and the generator (player B) reach Nash equilibrium. This is the optimal point for the minimax equation.

Training GAN is equivalent to minimizing JS divergence (or KL divergence) between probability distribution q (estimated distribution, from a generator) and probability distribution p (the real-life distribution). In layman’s words, JS divergence (or KL divergence) represents the distance between two probability distribution functions.

Generator

The generator takes random noise as an input and generates samples as an output. Its goal is to generate such samples that will fool the discriminator to think that it sees real images while actually seeing fakes. We can think of the generator as a counterfeit.

Discriminator

Discriminator takes both real images from the input dataset and fake images from the generator and outputs a verdict whether a given image is legit or not. We can think of the discriminator as a policeman trying to catch the bad guys while letting the good guys free.

The discriminator has the task of determining whether a given image looks natural (that is, is an image from the dataset) or looks like it has been artificially created. The task of the generator is to create natural- looking images that are similar to the original data distribution, images that look natural enough to fool the discriminator network. Firstly a random noise is given to the generator using this it creates the fake images and then these fake images are along with original images sent to the discriminator.

The discriminative model has the task of determining whether a given image looks natural (an image from the dataset) or looks like it has been artificially created. This is basically a binary classifier that will take the form of a normal CNN. The task of the generator is to create natural-looking images that are similar to the original data distribution.

The generator is trying to fool the discriminator while the discriminator is trying not to get fooled by the generator. As the models train through alternating optimization, both methods are improved until a point where the fake images are indistinguishable from the dataset images.

The content is inspired by the book https://www.amazon.in/Generative-Adversarial-Networks-Industrial-Cases/dp/9389423856


要查看或添加评论,请登录

Navin Manaswi的更多文章

社区洞察

其他会员也浏览了