登录查看更多内容

A Simple Introduction to KL Divergence through Python Code

Ashish B.

Founder, AI Leader, Author, ex-Google/Amazon/Twitch/Twitter, Speaker, Life-long learner

发布日期: 2017年10月12日

KL Divergence or Kullback-Leibler divergence is a commonly used loss metric in machine learning. With such an intimidating name, it can be hard to understand this concept. This article is meant to be an approachable tutorial, especially for software engineers, who speak code better than equations.

Lets start with the basics. There is a bag full of red balls and blue balls. There are 10 red and 15 blue balls in the bag. If you dip your hand into the bag, and pull out a ball at random, you have a 40% chance (10/25) you will get a red ball and a 60% chance you will get a blue ball. In this case, the variable is the color of the ball. Since there are two (or finite) possibilities, this is a random variable with discrete states. A mathematical function that emulates this bag would be a discrete probability distribution.

Now, lets try to mimic this distribution by using some Python code. We could generate a random number between 1 and 10. If the number is 4 or below, then it is a red ball, else it is a blue ball. If we run this a few times, we will find that the probability of finding a red ball is not exactly 40%. However, more times this is run, the probability tends towards the 40:60 split.

In machine learning, observed values are coming from distribution. Our model is approximating that distribution, much like the random number generator above is modeling the bag full of red and blue balls. How do we measure how close is our model to the reality? Mathematical form of this measurement is the KL divergence.

To read the details of the KL divergence equation, along with working Python code for above example, please read the tutorial.

(Ps: I find it really hard to put code in limited editing tools provided by LinkedIn. Hence, linking to the main article on my blog.)

Ashish B.

Founder, AI Leader, Author, ex-Google/Amazon/Twitch/Twitter, Speaker, Life-long learner

6 年

The idea of the counting is to show how KL divergence works. In a deep learning scenario, training data gives actual distribution and the trained model provides the other distribution. Your aim is to determine how close is your learned distribution to the real data distribution as a way to ascertain effectiveness of your training.

2 次回应

Amir Hussein

PhD researcher at Johns Hopkins University

6 年

Hi Mr.Ashish, thank you for this wonderful explanation of KL divergence implementation.? As far as I understood you use the simple counting method to estimate the probabilities. If I want to implement this method to measure the divergence between the corresponding activations of deep neural network would the counting method work? because the activations are real numbers.

查看更多评论

要查看或添加评论，请登录

Ashish B.的更多文章

Navigating the EdTech Landscape: Challenges and the Path to Success

2024年8月22日

Navigating the EdTech Landscape: Challenges and the Path to Success

The EdTech industry is booming, promising to revolutionize education through technology. However, beneath the surface…

6 条评论
Decoding Multimodality in Generative AI: Beyond the Buzzwords

2024年6月21日

Decoding Multimodality in Generative AI: Beyond the Buzzwords

Multimodality is a burgeoning frontier in generative AI (GenAI), promising to revolutionize how we interact with and…

1 条评论
Optimizing the WFH Setup for the Long Term - Deep Learning Edition

2020年5月10日

Optimizing the WFH Setup for the Long Term - Deep Learning Edition

One of the seemingly lasting impacts of COVID-19 is going to the reality of working from home for long periods of time…

20 条评论
Learning through mentoring

2018年10月15日

Learning through mentoring

Elena Sinel contacted me a few weeks back to see if I would be interested in mentoring teens during a #GirlsInAI…

9 条评论
Innovation of Small Things

2017年1月16日

Innovation of Small Things

“It was a time when the unthinkable became the thinkable and the impossible really happened” ― Arundhati Roy, The God…

6 条评论
TensorFlow 201: A slightly advanced tutorial

2016年12月9日

TensorFlow 201: A slightly advanced tutorial

You wanted to build deep learning networks. You discovered TensorFlowlibrary through a Google search or Udacity course…
Summary of "Nuts and Bolts of Applying Deep Learning: Tips and Tricks" talk by Andrew Ng

2016年9月26日

Summary of "Nuts and Bolts of Applying Deep Learning: Tips and Tricks" talk by Andrew Ng

Recently concluded [Bay Area DL School](https://www.bayareadlschool.
Will self-driving cars eliminate speed limits, be eco-friendly and more?

2015年10月16日

Will self-driving cars eliminate speed limits, be eco-friendly and more?

In the first part of self-driving cars, I talked about impacts to insurance and licensing regimes. There are some very…
Feature Engineering with Dates and Python - Summary

2015年10月7日

Feature Engineering with Dates and Python - Summary

Feature engineering is often the most important part… is what research on winning Kaggle’rs by David Wind reveals. It…
Will the self-driving car take my mom shopping?

2015年9月21日

Will the self-driving car take my mom shopping?

(part two - Will self driving car eliminate speed limits?) Every year, my mom travels from India to spend her summers…

4 条评论

See all articles

A Simple Introduction to KL Divergence through Python Code

Ashish B.

Founder, AI Leader, Author, ex-Google/Amazon/Twitch/Twitter, Speaker, Life-long learner

Ashish B.的更多文章

社区洞察

其他会员也浏览了

"Is Your Number Happy? Learn How to Check Using Python"

Python Typecasting, Functions, Lists, Tuples, Sets, and Dictionaries part -2

Rounding in Python 3

Comparing Floats and Integers in Python: When Equality Isn't Quite What It Seems

Python Date Object

Introduction to Floating-Point Arithmetic in Python by MarsDevs.

Hyperoperations Implementation in Python, Part 3. - Expressions

"List, Tuple & Set what's same and how do they differ (Python)"

0.1 + 0.2 is not 0.3 in Python. Here Is Why!

Efficient In-Place Matrix Zeroing in Python

Ashish B.的更多文章

Navigating the EdTech Landscape: Challenges and the Path to Success

Decoding Multimodality in Generative AI: Beyond the Buzzwords

Optimizing the WFH Setup for the Long Term - Deep Learning Edition

Learning through mentoring

Innovation of Small Things

TensorFlow 201: A slightly advanced tutorial

Summary of "Nuts and Bolts of Applying Deep Learning: Tips and Tricks" talk by Andrew Ng

Will self-driving cars eliminate speed limits, be eco-friendly and more?

Feature Engineering with Dates and Python - Summary

Will the self-driving car take my mom shopping?

社区洞察

其他会员也浏览了

"Is Your Number Happy? Learn How to Check Using Python"

Python Typecasting, Functions, Lists, Tuples, Sets, and Dictionaries part -2

Rounding in Python 3

Comparing Floats and Integers in Python: When Equality Isn't Quite What It Seems

Python Date Object

Introduction to Floating-Point Arithmetic in Python by MarsDevs.

Hyperoperations Implementation in Python, Part 3. - Expressions

"List, Tuple & Set what's same and how do they differ (Python)"

0.1 + 0.2 is not 0.3 in Python. Here Is Why!

Efficient In-Place Matrix Zeroing in Python