登录查看更多内容

KL Divergence: Prerequisite to Variational AutoEncoder (VAE)

Shradha Agarwal

SWE ops, Newfold Digital | IIITD MTech CSE (AI) | Devops | AWS Certified SA-Associate

发布日期: 2024年3月23日

KL Divergence

The Kullback-Leibler divergence (KL divergence) assesses the inefficiency of approximating the true probability distribution (P) with a predicted one (Q). Denoted by D_KL(P || Q), it quantifies the additional information, on average, required to describe reality using the predicted scenario. Consequently, a higher KL divergence indicates a larger discrepancy between predicted and actual outcomes. We will see how we use this measure to serve as a penalty in VAE loss function.

The above KL divergence formula penalizes poor distribution predictions (Q) by leveraging logarithms. When Q significantly underestimates probabilities compared to the actual distribution (P), the P/Q ratio inflates the logarithm, magnifying the discrepancy. Additionally, the formula incorporates P as a weighting factor, emphasizing penalties in regions with high probabilities in the actual distribution. This ensures that KL divergence prioritizes accurate predictions for frequently occurring events.

Properties of KL divergence

D_KL (P || Q) >=0 and D_KL (Q || P) >=0
D_KL (P || Q) ≠ D_KL (Q || P)

Discrete Distributions

Let's take an example for discrete distributions:

D_KL = 0.25 ln(0.25/0.18) + 0.25 ln(0.25/0.23) + 0.25 ln(0.25/0.15) + 0.25 ln(0.25/0.44) = 0.0893 nats

Multivariate normal distribution

We have multivariate normal distributions with mean μ1 and μ2; covariance matrices Σ1 and Σ2. Here, x is a vector of length k.

For the above distributions, KL divergence comes to be:

Figure 4. KL divergence for multivariate normal distributions

要查看或添加评论，请登录

Shradha Agarwal的更多文章

Denoising Diffusion Probabilistic Model - DDPM

2024年5月22日

Denoising Diffusion Probabilistic Model - DDPM

Diffusion model is a generative model that has emerged as a powerful technique for creating realistic data. It operates…
PEFT with LoRA for Fine-tuning

2024年5月8日

PEFT with LoRA for Fine-tuning

Fine-tuning is a process where a pre-trained model is further trained on new data to enhance its performance on a…
Retrieval Augmented Generation (RAG)

2024年4月29日

Retrieval Augmented Generation (RAG)

Retrieval-Augmented Generation (RAG) is a method that improves how language models create text by using additional…

3 条评论
BERT-Bidirectional Encoder Representations from Transformers

2024年4月22日

BERT-Bidirectional Encoder Representations from Transformers

Introduction BERT was introduced in the research paper - "BERT: Pre-training of Deep Bidirectional Transformers for…
Transformer Architecture

2024年4月18日

Transformer Architecture

The Transformer is a groundbreaking model architecture introduced in the seminal paper “Attention is All You Need” by…
Attention Mechanisms

2024年4月16日

Attention Mechanisms

The attention mechanism has significantly improved the performance of models in tasks like machine translation and text…
Long Short Term Memory (LSTM)

2024年4月8日

Long Short Term Memory (LSTM)

Figure 1. LSTM Architecture at time step t Long Short-Term Memory (LSTM) networks tackle a challenge in deep learning:…
Recurrent Neural Networks

2024年4月2日

Recurrent Neural Networks

RNNs are a type of artificial neural network architected specifically to tackle sequential data. In contrast to…
Generative Adversarial Networks

2024年3月28日

Generative Adversarial Networks

Figure 1. GAN Architecture Vanilla GAN introduced by Ian J.
Variational Autoencoders

2024年3月25日

Variational Autoencoders

Variational Autoencoders (VAEs) are generative models explicitly designed to capture the underlying probability…

See all articles

KL Divergence: Prerequisite to Variational AutoEncoder (VAE)

Shradha Agarwal

SWE ops, Newfold Digital | IIITD MTech CSE (AI) | Devops | AWS Certified SA-Associate

KL Divergence

Properties of KL divergence

Discrete Distributions

Multivariate normal distribution

Shradha Agarwal的更多文章

社区洞察

其他会员也浏览了

Types of propensity score matching

T-Tests in R – Welch t-test

Out of Trend: Definition Methodology and Results and 5 key Questions?

Why Petrophysicists should avoid Least Squares Regression

Causal Patterns & Five Examples

Questioning The Numbers...

A little trick that gets the XIRR (and XNPV) formulas working

Empirical conversions, SPI? & SGI

Session-9

KL Divergence

Properties of KL divergence

Discrete Distributions

Multivariate normal distribution

Shradha Agarwal的更多文章

Denoising Diffusion Probabilistic Model - DDPM

PEFT with LoRA for Fine-tuning

Retrieval Augmented Generation (RAG)

BERT-Bidirectional Encoder Representations from Transformers

Transformer Architecture

Attention Mechanisms

Long Short Term Memory (LSTM)

Recurrent Neural Networks

Generative Adversarial Networks

Variational Autoencoders

社区洞察

其他会员也浏览了

Types of propensity score matching

T-Tests in R – Welch t-test

Out of Trend: Definition Methodology and Results and 5 key Questions?

Why Petrophysicists should avoid Least Squares Regression

Causal Patterns & Five Examples

Questioning The Numbers...

A little trick that gets the XIRR (and XNPV) formulas working

Empirical conversions, SPI? & SGI

Session-9