登录查看更多内容

Probability Inequalities in Machine Learning

Rahul Venkat

ML Scientist at GE HealthCare

发布日期: 2023年2月8日

One of the common preliminaries that students are asked to be familiar with to master machine learning are probability and linear algebra.

During one of my journeys in the famed Bangalore traffic, as I was staring deep into the abyss (metaphorically and literally from the metro) I was wondering with the years of experience, what in probability is really useful in practice and theory.

Probability for ML Practice:

In probability, from my experience, a basic understanding of probability mass function, density function, expectations, variance, covariance might be necessary to follow most new papers with a predominantly practical focus.

Probability for ML Theory:

On the other hand, to even get a basic understanding of machine learning theory requires knowing a set of few probability inequalities, that I think are not taught as a part of most undergraduate courses. Probability inequalities correspond to upper-bounding the probability of a random variable taking a value beyond a specified target. These inequalities are useful in theoretically guaranteeing that the risk cannot exceed a certain value with a very high probability.

In this post, I will briefly list down a few probability inequalities and post references for proofs. Selfishly, I hope that by writing this post, I can remember them when needed and not forget it completely, like I do most of the times. (Aside, how do most of you remember concepts? I seem to forget nearly everything I read!)

Markov Inequality

The Markov inequality states that probability that a random variable does not take a value greater than a (>0) is bounded as follows:

No alt text provided for this image — Markov Inequality

Note that there is no constraint on the type of random variable. This is also not considered a "tight bound" as we could find a quantity less than RHS that could hold too.

2. Chebyshev Inequality

The Chebychev inequality is used to bound the probability that a random variable takes values beyond k standard deviations. As one can expect, this bound is weaker than in the case of knowing the exact distribution.

领英推荐

Essential Mathematics for Engineers, Machine Learning,…

World of Electrical 1 个月前

Understanding and teaching complex numbers with the…

Ajit Jaokar 8 个月前

Artificial Intelligence #21: Maths for AI - a…

Ajit Jaokar 3 年前

3. Chernoff Bounds

Chernoff bounds are nearly the same as Markov inequality, but for substituting the random variable X = exp(tx). This makes the right-hand side a moment generating function (function of t). For the uninitiated, a moment generating function is an expression that allows us to generate different moments of a distribution through a parameter s (mean is considered the first moment, variance the second moment etc.) Since the right hand side is now a function, we can optimize the right hand side depending on the distribution and make the bound tighter than Markov. Example, in certain applications variance might be lead to a tighter bound than variance.

4. Hoeffding inequality

Hoeffding inequality is used to bound the sum of N iid random variables. This is particularly useful in machine learning as samples in a dataset are considered as iid random variables and hence, Hoeffding is perhaps one of the most used inequalities in machine learning.

Here, S_n denotes the sum of n iid random variables.

I think beyond this there are a couple of inequalities like Holder's inequality, but I am not sure if they are really used much.

References and Proofs:

Rahul Venkat的更多文章

When is semi-supervised learning useful?

2023年3月13日

When is semi-supervised learning useful?

Problems in machine learning are generally represented as Y=f(X), where Y is the variable of interest to be predicted…

Probability Inequalities in Machine Learning

Rahul Venkat

ML Scientist at GE HealthCare

Probability for ML Practice:

Probability for ML Theory:

领英推荐

Rahul Venkat的更多文章

社区洞察

其他会员也浏览了

How a False Love for AI/ML is Destroying our Engineering Colleges

?? Symbolic Substitution & Transformation: From Equations to AI Learning ??

What are the 10 most cited theorems in math that are not about category theory use theorems in category theory in their proof?

"The Art of Seeing Differently: How do things work in Mathematics?”

Sorting Algorithms

How much Mathematics is Enough for AI?

Why bother learning deep mathematics?

Want to get into AI? You need to learn a lot of math - and here's how...

Mathematics for AI: Linear Algebra and How to Understand It Better

Using AI to solve a Math Joke

Probability for ML Practice:

Probability for ML Theory:

领英推荐

Rahul Venkat的更多文章

When is semi-supervised learning useful?

社区洞察

其他会员也浏览了

How a False Love for AI/ML is Destroying our Engineering Colleges

?? Symbolic Substitution & Transformation: From Equations to AI Learning ??

What are the 10 most cited theorems in math that are not about category theory use theorems in category theory in their proof?

"The Art of Seeing Differently: How do things work in Mathematics?”

Sorting Algorithms

How much Mathematics is Enough for AI?

Why bother learning deep mathematics?

Want to get into AI? You need to learn a lot of math - and here's how...

Mathematics for AI: Linear Algebra and How to Understand It Better

Using AI to solve a Math Joke