A Simple Introduction to KL Divergence through Python Code

A Simple Introduction to KL Divergence through Python Code

KL Divergence or Kullback-Leibler divergence is a commonly used loss metric in machine learning. With such an intimidating name, it can be hard to understand this concept. This article is meant to be an approachable tutorial, especially for software engineers, who speak code better than equations.

Lets start with the basics. There is a bag full of red balls and blue balls. There are 10 red and 15 blue balls in the bag. If you dip your hand into the bag, and pull out a ball at random, you have a 40% chance (10/25) you will get a red ball and a 60% chance you will get a blue ball. In this case, the variable is the color of the ball. Since there are two (or finite) possibilities, this is a random variable with discrete states. A mathematical function that emulates this bag would be a discrete probability distribution.

Now, lets try to mimic this distribution by using some Python code. We could generate a random number between 1 and 10. If the number is 4 or below, then it is a red ball, else it is a blue ball. If we run this a few times, we will find that the probability of finding a red ball is not exactly 40%. However, more times this is run, the probability tends towards the 40:60 split.

In machine learning, observed values are coming from distribution. Our model is approximating that distribution, much like the random number generator above is modeling the bag full of red and blue balls. How do we measure how close is our model to the reality? Mathematical form of this measurement is the KL divergence.

To read the details of the KL divergence equation, along with working Python code for above example, please read the tutorial.

(Ps: I find it really hard to put code in limited editing tools provided by LinkedIn. Hence, linking to the main article on my blog.)

Ashish B.

Founder, AI Leader, Author, ex-Google/Amazon/Twitch/Twitter, Speaker, Life-long learner

6 年

The idea of the counting is to show how KL divergence works. In a deep learning scenario, training data gives actual distribution and the trained model provides the other distribution. Your aim is to determine how close is your learned distribution to the real data distribution as a way to ascertain effectiveness of your training.

Amir Hussein

PhD researcher at Johns Hopkins University

6 年

Hi Mr.Ashish, thank you for this wonderful explanation of KL divergence implementation.? As far as I understood you use the simple counting method to estimate the probabilities. If I want to implement this method to measure the divergence between the corresponding activations of deep neural network would the counting method work? because the activations are real numbers.

回复

要查看或添加评论,请登录

Ashish B.的更多文章

社区洞察

其他会员也浏览了