Deep Learning: InfoGAN
Ibrahim Sobh - PhD
?? Senior Expert of Artificial Intelligence, Valeo Group | LinkedIn Top Voice | Machine Learning | Deep Learning | Data Science | Computer Vision | NLP | Developer | Researcher | Lecturer
InfoGAN is an information-theoretic extension to the Generative Adversarial Network (GAN) that is able to learn disentangled representations in a completely unsupervised manner.
Experiments show that InfoGAN learns interpretable representations that are competitive with representations learned by existing supervised methods. InfoGAN successfully disentangles writing styles from digit shapes on the MNIST dataset and background digits from the central digit on the SVHN dataset. It also discovers visual concepts that include hair styles, presence/absence of eyeglasses, and emotions on the CelebA face dataset. (In an unsupervised manner)
Disentangled representation explicitly represents the salient attributes of a data instance. For example, for a dataset of faces, a useful disentangled representation may allocate a separate set of dimensions for each of the following attributes: facial expression, eye color, hairstyle, presence or absence of eyeglasses, and the identity of the corresponding person.
Basic Idea:
InfoGAN splitting the Generator input into two parts: the traditional noise vector and a new “latent code” vector. InfoGAN proposes a simple modification to GAN’s objective that encourages it to learn interpretable and meaningful representations, by maximizing the mutual information between “latent code” and the generator’s output. Despite its simplicity, we found our method to be surprisingly effective.
This framework is implemented by merely adding a regularization term (red box) to the original GAN’s objective function.
where Lambda is the regularization constant. The I(c;G(z,c)) term is the mutual information between the latent code c and the generator output G(z,c). It’s not easy to calculate the mutual information explicitly, so a lower bound of the mutual information objective is used.
GAN is known to be difficult to train. The experiments based on existing techniques introduced by DCGAN
Results:
For MNIST (hand-written digit dataset), the authors specified a 10-state discrete code, c1 (hoping it would map to the hand-written digit value), and two continuous codes between -1 to +1. (c2, c3)
For Manipulating latent codes on 3D Chairs:
In (a), the continuous code captures the pose of the chair while preserving its shape; in (b), the continuous code captures the widths of different chair types, and smoothly interpolate between them.
Change in emotion, roughly ordered from sad to happy.
In conclusion ...
InfoGAN is completely unsupervised and learns interpretable and disentangled representations on challenging datasets.
InfoGAN adds only negligible computation cost on top of GAN and is easy to train.
Best Regards