Artificial Intelligence - Part 7.1 - GENERATIVE AI - GANs
Alessandro Ciappei
Senior Manager | Cloud Infrastructure, Edge Devices Technical Lead | Datacentre Model Transformation | Artificial Intelligence
Generative Adversarial Networks (GANs): A Comprehensive Guide
Generative Adversarial Networks (GANs) are among the most transformative innovations in Artificial Intelligence (AI). Introduced in 2014 by Ian Goodfellow, GANs enable machines to generate data resembling human-created content, making them a pivotal technology in generative AI. Their applications span various fields, including art, entertainment, healthcare, and data science, driving innovations that were once deemed science fiction.
This expanded article delves into the intricacies of GANs, including their mechanisms, training process, implementation, advanced concepts, and challenges, alongside real-world examples and applications.
What Are GANs?
A Generative Adversarial Network (GAN) is a type of neural network architecture designed to generate new data that resembles a given dataset. GANs consist of two neural networks:
These two networks are trained simultaneously in a process akin to a game where each tries to outperform the other.
How Do GANs Work?
GANs operate on a competitive learning principle where the generator and discriminator are in a zero-sum game. The generator's goal is to "fool" the discriminator, while the discriminator's objective is to accurately differentiate real data from fake.
Step-by-Step Process
The generator takes random noise (e.g., a vector of random numbers) as input.
It transforms this noise into synthetic data, such as an image or audio sample.
The discriminator evaluates the synthetic data alongside real data from the training set.
It assigns probabilities to determine whether each sample is real or fake.
The generator receives feedback based on the discriminator's evaluation and adjusts its parameters to produce more realistic data.
Simultaneously, the discriminator improves its ability to detect fake data.
Over multiple iterations, the generator and discriminator become increasingly skilled, leading to the production of highly realistic data.
Mathematical Foundation
The training objective of GANs is formulated as a minimax optimization problem:
Where:
Variants of GANs
As GANs evolved, researchers introduced various architectures tailored for specific use cases:
Adds conditional information (e.g., class labels) to both the generator and discriminator.
Example: Generating images of specific categories, like "dogs" or "cars."
Enables image-to-image translation without paired training examples.
Example: Converting photos from summer to winter landscapes.
Produces highly detailed and controllable images by manipulating latent space.
Example: Creating hyper-realistic human faces.
Gradually increases the resolution of generated images during training.
Example: High-resolution medical imaging.
Utilizes convolutional layers for improved image generation.
Example: Generating artwork or textures.
Implementing GANs
Below is an implementation of a simple GAN using Python and TensorFlow/Keras to generate handwritten digits (like those in the MNIST dataset).
Step 1: Import Libraries
import tensorflow as tf
from tensorflow.keras import layers
import numpy as np
import matplotlib.pyplot as plt
Step 2: Define the Generator
def build_generator(latent_dim):
model = tf.keras.Sequential([
layers.Dense(128, activation='relu', input_dim=latent_dim),
layers.Dense(256, activation='relu'),
layers.Dense(784, activation='sigmoid'), # Output size for MNIST (28x28 flattened)
layers.Reshape((28, 28, 1))
])
return model
Step 3: Define the Discriminator
领英推荐
def build_discriminator():
model = tf.keras.Sequential([
layers.Flatten(input_shape=(28, 28, 1)),
layers.Dense(256, activation='relu'),
layers.Dense(128, activation='relu'),
layers.Dense(1, activation='sigmoid') # Output: Real (1) or Fake (0)
])
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
return model
Step 4: Build the GAN
def build_gan(generator, discriminator):
discriminator.trainable = False # Freeze the discriminator during generator training
gan = tf.keras.Sequential([generator, discriminator])
gan.compile(optimizer='adam', loss='binary_crossentropy')
return gan
Step 5: Train the GAN
def train_gan(generator, discriminator, gan, epochs, batch_size, latent_dim, data):
for epoch in range(epochs):
# Train the discriminator
real_images = data[np.random.randint(0, data.shape[0], batch_size)]
fake_images = generator.predict(np.random.normal(0, 1, (batch_size, latent_dim)))
real_labels = np.ones((batch_size, 1))
fake_labels = np.zeros((batch_size, 1))
d_loss_real = discriminator.train_on_batch(real_images, real_labels)
d_loss_fake = discriminator.train_on_batch(fake_images, fake_labels)
d_loss = 0.5 * np.add(d_loss_real, d_loss_fake)
# Train the generator
noise = np.random.normal(0, 1, (batch_size, latent_dim))
valid_labels = np.ones((batch_size, 1)) # Generator wants the discriminator to output 1
g_loss = gan.train_on_batch(noise, valid_labels)
# Print progress
if epoch % 100 == 0:
print(f"Epoch {epoch}: [D loss: {d_loss[0]} | D accuracy: {d_loss[1]}] [G loss: {g_loss}]")
Step 6: Visualize Results
def generate_and_plot_images(generator, latent_dim):
noise = np.random.normal(0, 1, (16, latent_dim))
generated_images = generator.predict(noise)
generated_images = 0.5 * generated_images + 0.5 # Normalize to [0, 1]
fig, axs = plt.subplots(4, 4, figsize=(4, 4))
for i in range(4):
for j in range(4):
axs[i, j].imshow(generated_images[i * 4 + j, :, :, 0], cmap='gray')
axs[i, j].axis('off')
plt.show()
Implementing GANs: Expanded Walkthrough
Here, we will expand on the implementation of a basic GAN to provide a more detailed guide.
Step 1: Dataset Preparation
from tensorflow.keras.datasets import mnist
import numpy as np
# Load MNIST dataset
(X_train, _), (_, _) = mnist.load_data()
# Normalize data to range [-1, 1]
X_train = (X_train.astype(np.float32) - 127.5) / 127.5
X_train = np.expand_dims(X_train, axis=-1)
Step 2: Build Generator and Discriminator
These networks can be expanded with more layers for greater complexity.
Generator
def build_generator(latent_dim):
model = tf.keras.Sequential([
layers.Dense(256, activation='relu', input_dim=latent_dim),
layers.BatchNormalization(),
layers.Dense(512, activation='relu'),
layers.BatchNormalization(),
layers.Dense(1024, activation='relu'),
layers.BatchNormalization(),
layers.Dense(28 * 28, activation='tanh'),
layers.Reshape((28, 28, 1))
])
return model
Discriminator
def build_discriminator():
model = tf.keras.Sequential([
layers.Flatten(input_shape=(28, 28, 1)),
layers.Dense(512, activation='relu'),
layers.Dropout(0.3),
layers.Dense(256, activation='relu'),
layers.Dropout(0.3),
layers.Dense(1, activation='sigmoid')
])
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.0002), loss='binary_crossentropy', metrics=['accuracy'])
return model
Step 3: Train GAN
Expanded with checkpointing and visualisation.
def train_gan(generator, discriminator, gan, epochs, batch_size, latent_dim, data):
for epoch in range(epochs):
# Training discriminator
real_images = data[np.random.randint(0, data.shape[0], batch_size)]
fake_images = generator.predict(np.random.normal(0, 1, (batch_size, latent_dim)))
real_labels = np.ones((batch_size, 1))
fake_labels = np.zeros((batch_size, 1))
d_loss_real = discriminator.train_on_batch(real_images, real_labels)
d_loss_fake = discriminator.train_on_batch(fake_images, fake_labels)
# Training generator
noise = np.random.normal(0, 1, (batch_size, latent_dim))
valid_labels = np.ones((batch_size, 1)) # Generator aims to fool discriminator
g_loss = gan.train_on_batch(noise, valid_labels)
# Logging and visualization
if epoch % 100 == 0:
print(f"Epoch {epoch}: [D loss: {0.5 * (d_loss_real[0] + d_loss_fake[0])}] [G loss: {g_loss}]")
Advanced Topics in GANs
Latent Space Exploration
Manipulating the latent vector zzz to control the attributes of generated data.
Example: Adjusting the smile intensity in a generated face.
Feature Matching
Improves training stability by modifying the discriminator’s objective to focus on intermediate feature representations rather than raw outputs.
GAN Evaluation Metrics
Inception Score (IS): Measures the quality and diversity of generated images.
Fréchet Inception Distance (FID): Evaluates similarity between real and generated distributions.
Real-World Applications of GANs
Example: Creating realistic human faces using tools like This Person Does Not Exist.
Generating videos where a person appears to say or do something they didn’t.
Applications like Prisma or AI art generators use GANs to apply artistic styles to images.
Generating synthetic training data to improve machine learning model performance.
Enhancing low-resolution medical scans for better diagnosis.
Creating realistic textures and environments procedurally.
Challenges of GANs
The generator produces limited types of data, failing to capture the diversity of the real dataset.
The generator and discriminator can fail to converge, leading to suboptimal performance.
GANs require significant computational resources for training.
Conclusion
Generative Adversarial Networks have opened new frontiers in AI, enabling machines to create content indistinguishable from human-made data. While their applications are vast, their implementation requires careful tuning and significant resources. By understanding the workings of GANs and leveraging frameworks like TensorFlow or PyTorch, businesses and researchers can harness their power for innovative solutions across industries.