登录查看更多内容

Autoencoders

Mohammad Reza Ghahtarani

Artificial Intelligence Engineer | computer vision| machine learning engineer| AI Researcher

发布日期: 2024年6月13日

These networks have a unique architecture consisting of two parts. The first part is responsible for compressing and extracting essential features from the input, known as the encoder, and the second part is responsible for reconstructing the input from the extracted features of the previous section, known as the decoder. The main idea of autoencoder networks is this simple: feature extraction in the first part and reconstructing the original image from the extracted features in the second part.

Autoencoder networks are considered an unsupervised learning algorithm used to extract hidden features from the input. The hidden features of the input may not be directly observable but show the distribution of the data. It can be said that during the training of autoencoder networks, the network learns which of the hidden variables of a dataset, known as the latent space, create a more accurate reconstruction of the original data. In fact, the latent spaces only include the essential information received from the input images.

In general, autoencoders are used for tasks such as feature extraction, data compression, image denoising, anomaly detection, and facial recognition. There are also various types of autoencoders, such as Variational Autoencoder (VAE), Adversarial Autoencoder (AAE), and Adaptive Autoencoder.

Autoencoders vs Encoder-Decoder

All autoencoder models consist of an encoder and a decoder, but not all encoder-decoder models are autoencoders.

Encoder-Decoder

In the encoder-decoder architecture, an encoder network extracts key features from the input, and the decoder network receives these extracted features as input. This framework is used in various architectures, such as convolutional neural networks for tasks like image segmentation and RNNs for tasks like seq2seq.

In encoder-decoder architectures, the network output is different from the input. For example, consider an image segmentation network like U-net. In the first part of this network, the encoder extracts feature from the input image to perform semantic classification. In the decoder part, using the feature map and pixel classification, a mask image for each object in the image is created. The main goal of this model is to label each pixel based on its semantic class. Therefore, this model is trained and optimized in a supervised manner using ground truth to predict the class of each pixel.

Autoencoders

On the other hand, autoencoders are a subset of encoder-decoders trained through unsupervised learning to reconstruct input data. Since these models do not have labels for training, instead of predicting known patterns, they explore hidden patterns in the data. However, unlike other unsupervised methods, these networks use the input data itself as a criterion for measuring the output. For this reason, these models are called self-supervised learning.

How Autoencoders Work

Autoencoders extract latent variables by passing the data through the bottleneck before reaching the decoder. This method ensures that the encoder only extracts hidden features useful for accurately reconstructing the input image.

领英推荐

The Basics of GANs: Creating Realistic Data with…

Jyoti Dabass, Ph.D 3 个月前

Demystifying AutoEncoders: The Architects of Data…

Rany ElHousieny, PhD??? 1 年前

Emergent behaviour: applying the AI paradigm shift to…

Ian G. 1 年前

Components of Autoencoders

As previously mentioned, there are various types of autoencoders, and different elements exist within their architecture, but their key components are common, which include:

Encoder: The first part of these networks is the encoder, which is responsible for compressing the input data by reducing its dimensions. In a simple architecture, the encoder network consists of hidden layers of a neural network that gradually decrease from the input layer. As data passes through these layers, the compression (squeezing) process also occurs, and the data is compressed into smaller dimensions.
Bottleneck: The bottleneck contains the most compressed representation between the encoder and decoder parts. As mentioned earlier, the main goal in the encoder part is to extract the most important and minimum number of features that can be used to reconstruct the input data. The features extracted from this part will enter the decoder.
Decoder: This part of the architecture, like the encoder part, consists of hidden layers of a neural network but, unlike the encoder layers, it gradually increases, decompressing the features from the bottleneck and reconstructing them to their original state. After reconstructing the data, the output is compared with the ground truth to evaluate the performance of the autoencoder. The error between the output data and the ground truth is called the reconstruction error.

In some cases, such as GAN networks, the decoder part is omitted, and only the encoder part is used. In many cases, the decoder network continues to be trained post-training, like in VAE networks, where the network generates new outputs.

Advantages

One of the most significant advantages of using autoencoder networks compared to other networks like PCA is extracting more complex and non-linear correlations. The reason is the use of non-linear activation functions such as sigmoid in their layers.

Designing an Autoencoder

Various types of autoencoders typically follow the mentioned structure. Besides the type of neural network chosen, such as convolutional neural networks or RNNs like LSTM, several other hyperparameters are important in network design, including:

Bottleneck size: The size of the bottleneck determines the degree of data compression and can also be used as a regularization method to counter overfitting or underfitting.
Number of layers: The depth of an autoencoder network is measured by the number of layers in the encoder and decoder. Greater depth can lead to more complexity, while less depth increases processing speed.
Number of neurons per layer: Generally, in the encoder part, the number of neurons decreases with each layer and reaches a minimum in the bottleneck part. Conversely, in the decoder part, the number of neurons increases with each layer. (This rule is not always present, for example, in sparse autoencoder networks, the number of neurons varies, or in networks that work with large images, more neurons are needed than when the images are small).
Loss function: In general, this function measures the reconstruction error between the output and the input. Gradient descent is used for optimizing model weights during backpropagation. Ideal algorithms for the loss function are defined based on the task of the autoencoder.