Day 2/60 Reviewing AI & Machine Learning: Generative Deep Learning

Deep Learning

A branch of machine learning that replicates the way our human brains learn. Most of deep learning systems are neural networks. These systems contain multiple stacked hidden layers. Any system in machine learning or AI that employs many layers to learn a high level representation of input data is also considered to be part of deep learning (e.g. deep Boltzmann machines)

A neural network consists of an input layer, where the input data gets passed in and an output layer where the output is returned. Layers in between the input and output layer are referred to as “hidden” layers. These layers can store increasingly more advanced and sophisticated aspects of the original input.

The goal for a neural network involves finding the optimal set of weights for each layer that will make accurate predictions. This can be achieved by training the neural network with a suitable dataset. When we initialize a neural network, the weights/biases are randomized and get increasingly optimized to our desired values that will make accurate predictions, the more we train them.

Here is a list of projects you can try:

  • A neural network that can recognize handwritten digits (use the MNIST dataset for training and testing)
  • A neural network that can classify 10 common objects we see in our everyday lives (use the CIFAR-10 dataset for training and testing)


The core principle of machine learning is to ensure the model generalizes to unseen data rather than simply remembering the training dataset. This is the same as a student who simply memorizes physics formulas instead of truly understanding them.

Dropout layers in a neural network have been widely used to help neural networks achieve this. It is very simple. Each dropout layer within a neural network chooses a random set of inputs from the preceding layer and sets them to zero.

The addition of dropout layers drastically reduces overfitting the model, by ensuring that it does not become over-reliant from its training set.

Dropout layers are used the most commonly after fully connected layers as they are the most prone to overfitting due to their large amounts of its weights.

Batch normalization is also used to reduce overfitting so most modern machine learning applications solely use it for regularization. However, there is no set rule for regularization as different techniques can be applied to different situations.


Autoencoders in Generative AI:

Autoencoder: a neural network made up of two parts, an encoder and a decoder

Encoder - compresses input data into representational vectors

Decoder - decompresses a given representation vector back to the form in the original input

Therefore, an autoencoder aims to minimize the loss between the original input and the reconstruction of the input after its trip through the neural network.

Let’s say you have a picture of a flower.

  1. Picture → input
  2. Encoder processes this picture and reduces it to a simpler form, capturing the most important features (like edges, shapes, etc.). This simpler form is called the latent space or bottleneck.
  3. Latent Space: This is a compact version of the original picture that still contains the essential information.
  4. Decoder: The decoder takes this latent space representation and tries to reconstruct the original picture of the cat.

By minimizing the difference between the original input and the reconstructed output, we can fine-tune our model. This allows us to generate complex images from the latent space, which has been optimized based on our training dataset or input data.



Generative Adversarial Networks (GANs)

Generative Adversarial Network (GAN) is a type of neural network architecture used in machine learning, specifically in the field of generative modeling. They consist of two main parts:

  • Generator
  • Discriminator

Generator: creates new data that looks similar to real data. For example, if we are working with images, the generator will try to create new images that look like the real ones it has seen during training.

Discriminator: looks at data and determine if it is real (from the training dataset) or fake (created by the generator). It acts like a detective trying to catch counterfeit currency.

Training Process

  • The generator creates fake data and sends it to the discriminator.
  • The discriminator evaluates the data and decides if it is real or fake.

Based on the discriminator's feedback, the generator tries to improve its fake data to fool the discriminator next time.

The discriminator also keeps improving to better distinguish between real and fake data.

This process gets repeat to improve the generated data (e.g. images, text, etc) until the discriminator no longer can accurately distinguish between fake data and real data.

GAN challenges:

  • Oscillating loss: occurs when the generator and discriminator keep improving, but they don't stabilize, making training very unpredictable.
  • Mode collapse: occurs when the generator keeps creating the same kind of fake data over and over again, ignoring the variety in real data.
  • Uninformative loss: occurs when the feedback from the discriminator to the generator is not helpful, making it hard for the generator to improve.
  • Hyperparameters: the settings you choose before starting the training process. They affect how the GAN learn and performs. However, unfit hyperparameters can negatively affect the desired performance of your GAN.

Tackling the GAN challenges:

  • Wasserstein loss: a meaningful loss metric that is related to the generator’s convergence and sample quality. In other words it is aimed at improving the stability of the training process and reduce the oscillating loss
  • Lipschitz constraint: a constraint put into place that keeps the discriminator’s output from changing too rapidly or wildly. It ensures that the changes are smooth and gradual.
  • Weight clipping: limits the values of the weights of a discriminator to a certain range to keep the model’s performance under control
  • Gradient penalty loss: encourages smooth and consistent changes in the discriminator by penalizing it if the changes are too extreme


How do machines draw, compose, and write?

Draw:

CycleGAN: A type of GAN that can transform images from one domain to another

It uses two generators and two discriminators. One generator learns to transform images from Domain A to Domain B, and another generator learns to do the reverse. Cycle consistency loss ensures that if you convert an image to another domain and back, you get the original image.

Neural Style Transfer: takes two images—a content image and a style image—and blends them so that the output image looks like the content image but in the style of the style image.

It uses a convolutional neural network (CNN) to perform feature-extraction from both images. Then minimizes a loss function that combines the content loss (difference in content features) and style loss (difference in style features) to generate the final image.

Modern-day models such as Stable Diffusion and DALL-E have indeed gained significant attention and popularity, often eclipsing earlier techniques like CycleGAN and Neural Style Transfer.

Compose amp; Write:

Recurrent Neural Networks (RNNs): great for sequential data and can remember previous inputs in the sequence. They are useful for tasks like composing music or writing text. This type of model is more suitable for generating monophonic music, with one single melody line (e.g. a solo singer humming with out any additional harmony or other notes played together). Nowadays, other models are great for generating polyphonic music which I do not have much knowledge about.

Long Short-Term Memory Networks (LSTMs): A type of RNN designed to remember information for long periods, making them better at handling long sequences of data.

Transformers: handle sequential data without the limitations of RNNs. They use a mechanism called self-attention to weigh the importance of a choice of words in a sequence.

Generative Pre-trained Transformers (GPTs): A specific type of transformer model designed for generating human-like text. It’s trained on large datasets of text and can produce coherent and contextually relevant sentences.


Let me know your thoughts in the comment below. See you soon in day 3!

要查看或添加评论,请登录

Tom Zhang的更多文章

社区洞察

其他会员也浏览了