Variational autoencoders (VAEs)

Variational autoencoders (VAEs)

Introduction :

No alt text provided for this image

A particular class of neural network called an autoencoder is made to learn effective ways to represent input data. They are made up of an encoder that maps the input data to a latent space and a decoder that reverses the mapping. In order to reduce the discrepancy between the input data and the reconstructed data, the encoder and decoder are often trained together.

The issue of feature engineering, or the act of manually choosing and modifying input features to enhance the performance of a machine-learning model, was addressed by the development of autoencoders. Due to the need for domain expertise and the potential for engineer bias, feature engineering can be time-consuming and error-prone. In order to automate the feature engineering process, autoencoders were created, which learn effective representations of the input data straight from the data.


Autoencoders vs Variational autoencoders :

No alt text provided for this image

The use of probabilistic modeling distinguishes variational autoencoders (VAEs) from regular autoencoders. Conventional autoencoders are purely deterministic and strive to precisely reconstruct the input data. The latent variables that best explain the input data are learned from a probability distribution by VAEs, who then sample new data from that distribution.

The latent variables are specifically modeled by VAEs as being produced by a prior distribution, often a regular Gaussian distribution. The decoder maps from the latent space to the output data space from where the encoder maps the input data to a distribution in the latent space. VAEs minimize a loss function during training that contains a regularization term and a reconstruction loss term (testing the precision of the input data's reconstruction) (measuring the difference between the learned latent variable distribution and the prior distribution).

For tasks like data synthesis and data augmentation, the probabilistic modeling in VAEs enables the generation of new data by sampling from the learned latent variable distribution. Additionally, it has been demonstrated that VAEs work well for learning representations of the input data that are decoupled from one another, with each dimension of the latent space corresponding to a particular source of variation in the input data.

In the context of machine learning, probabilistic modeling is used to build models that can handle uncertain or incomplete data. For example, in image recognition, it is common for an image to be occluded or partially obscured, and probabilistic models can be used to account for this variability and still recognize the image accurately.

VAE architecture :

  • The encoder is a neural network that takes an input data point and maps it to a distribution in the latent space. The latent space is a lower-dimensional representation of the input data, which captures the underlying structure of the data. The encoder typically consists of several layers of fully connected neural networks, which transform the input data into a lower-dimensional representation.
  • The decoder is a neural network that takes a point in the latent space and maps it back to the original data space. The decoder generates new data points that are similar to the input data, by learning the structure of the data from the latent space. The decoder is typically a mirror image of the encoder and is also composed of several layers of fully connected neural networks.


No alt text provided for this image

  • The latent space is the central component of a VAE. It is a lower-dimensional representation of the input data that captures the underlying structure of the data. The distribution of the latent space is usually chosen to be Gaussian, which makes the optimization easier. The latent space is also the key to generating new data points, by sampling points from the latent space and using the decoder to map them back to the original data space.


No alt text provided for this image

VAEs applications:

  • Image generation: The decoder component of the VAE can be used to generate new images. By sampling from the latent space, the VAE can generate a new image that is similar to the input images that were used to train it.
  • Data compression: The encoder component of the VAE can be used to compress data. By mapping the input data to a lower-dimensional latent space, the VAE can effectively compress the data. This can be useful in situations where storage space is limited.
  • Data denoising: The VAE can be trained to denoise noisy data. By adding noise to the input data and training the VAE to reconstruct the original data, the VAE can learn to remove noise from the input data.

In each of these applications, the VAE is trained on a dataset of input data. During training, the VAE learns to encode the input data into a lower-dimensional latent space, and then decode the latent space representation to reconstruct the original data. Once the VAE has been trained, it can be used to perform the desired task, such as generating new images, compressing data, or denoising data.

VAE loss function :

The loss function used in training a VAE is composed of two parts: the reconstruction loss and the regularization loss.

No alt text provided for this image

The reconstruction loss measures the difference between the input and output of the VAE, while the regularization loss is used to ensure that the encoder output is close to a unit Gaussian distribution. The regularization loss is typically computed using the Kullback-Leibler (KL) divergence between the encoder output and the target distribution.

The optimization techniques used in VAE training are similar to those used in other neural network architectures, such as stochastic gradient descent (SGD) and its variants. However, due to the probabilistic nature of VAEs, other techniques such as the reparameterization trick and importance sampling are also used to improve training stability and convergence. Additionally, techniques such as early stopping and learning rate schedules can also be used to optimize VAE training.

The Reparameterization trick :

The reparameterization trick is a method employed by variational autoencoders (VAEs) to enable backpropagation training of the model. The encoder creates a distribution of latent variables in the VAE, and the decoder uses that distribution to create a sample. The trick is to isolate the randomness from the model parameters by reformulating the stochastic process that created the latent variables. As a result, it is possible to employ stochastic gradient descent to optimize the model by propagating the gradients across it. The reparameterization approach specifically entails sampling from a parameterized distribution that is a deterministic function of the encoder's output and a sample from a fixed distribution in place of directly sampling from the encoder's output (e.g. a standard normal distribution). The reparameterization trick is key to the successful training of VAEs and has become a standard technique in deep generative models.

No alt text provided for this image

VAEs and other generative models :

Variational autoencoders (VAEs) are one type of generative model, which are used to learn the underlying structure of a dataset and generate new samples that resemble the original data. While VAEs and generative adversarial networks (GANs) are both generative models, they differ in their approach to learning the latent space. GANs use a two-part network consisting of a generator and a discriminator, while VAEs use a single network that learns to encode and decode the data.

Another type of generative model is the autoregressive model, which is based on predicting the probability of each individual element in the data based on the previous elements. Unlike VAEs, which learn a compressed representation of the data, autoregressive models preserve the structure of the data and can be used for tasks such as text generation.

In summary, while VAEs, GANs, and autoregressive models are all generative models, they differ in their approach to learning the underlying structure of the data and generating new samples. The choice of which model to use depends on the specific task and the characteristics of the data being modeled.


VAE paper
My GitHub

要查看或添加评论,请登录

社区洞察

其他会员也浏览了