登录查看更多内容

Maths for AI 101: Fundamentals of Probability

Omkar C.

Flutter Developer | Mobile App Developer | Node.js | AI Enthusiast

发布日期: 2024年9月16日

What is Probability?

Probability is all about measuring uncertainty or the chances of something happening. For example, when we say, "the probability of a coin landing heads is 0.5," what are we really saying? There are two main ways to think about this: the Frequentist and the Bayesian views.

The Frequentist view sees probability as the long-term frequency of an event happening if you repeated it many times. So, if we say a coin has a 0.5 probability of landing heads, it means that if we flip the coin over and over again, we’d expect it to come up heads about half the time.

The Bayesian view, however, treats probability more like a measure of our own uncertainty or belief about an event. So, saying there's a 0.5 chance of getting heads simply means we're equally unsure whether the next flip will be heads or tails, without needing to flip it hundreds of times.

Even though these two interpretations look at probability differently, the core rules stay the same. Moving forward, we’ll dive deeper into probability from a Bayesian perspective, especially focusing on Bayes' Theorem, which is the backbone of Bayesian statistics and a key tool for updating our beliefs as we get new information.

Basic Terminologies

Probability helps us deal with uncertainty, but what exactly are we uncertain about? For example, when we toss a coin, we are uncertain whether we will get heads or tails. In this case, we know the possible outcomes—heads or tails—which together form our sample space. Once we perform the experiment of tossing the coin and, say, it lands on heads, this result is called the event of getting a head.

Now, if we are interested in an outcome that is not getting a head, we refer to this as the complement of the event. The complement of an event includes everything in the sample space that does not belong to the original event. For instance, if our event is getting a head, the complement of this event is getting a tail. The probabilities of an event and its complement always add up to 1. Thus, if the probability of getting a head is 0.5, the probability of getting a tail (the complement) is also 0.5.

Sample Space (S): The set of all possible outcomes of an experiment. For a coin toss: S ={Heads, Tails}
Event (E): A specific outcome or a set of outcomes from the sample space. For example, getting a head: E = {Heads}
Complement of an Event (E): The set of outcomes in the sample space that are not part of the event. For the event of getting heads, the complement is getting tails: E? = {Tails}

Now let's consider rolling a die, where the sample space is S = {1, 2, 3, 4, 5, 6}. In probability experiments, we may encounter situations where some events overlap and others do not. For instance, let Event E be rolling an odd number {1, 3, 5} and Event F be rolling an even number {2, 4, 6}. These two events have no outcomes in common and do not overlap; such events are called disjoint events. On the other hand, consider Event E as rolling a number greater than 3 {4, 5, 6} and Event F as rolling an even number less than 6 {2, 4}. In this case, these events overlap because both include the outcome 4; such events are referred to as joint events.

When we have multiple events, we can perform operations on them to combine outcomes in different ways. If we want to find all outcomes that are present in both events, this operation is called intersection. On the other hand, if we want to find all outcomes that are present in either of the events, this operation is called union.

Intersection

- Event E: Rolling an even number {2, 4, 6}

- Event F: Rolling a number greater than 3 {4, 5, 6}

- Intersection (E ∩ F): {4, 6}

Union

- Event E: Rolling an even number {2, 4, 6}

- Event F: Rolling a number greater than 3 {4, 5, 6}

- Union (E ∪ F): {2, 4, 5, 6}

Axioms Of Probability

P(S) = 1
P(E) ≥ 0 for all E in S
If E? ,E? ,E?, ....are events in S where E? ∩ E?= {} for i ≠ j, then P(E? ∪ E? ∪ E? ∪ ...) = P(E?) + P(E?) + P(E?) + ...

These axioms provide a foundation for deriving additional rules that will guide us toward understanding Bayes' Theorem.

P(E?) = 1 - P(E) : As we know P(S) = 1 and P(S) = P(E)+P(E?), thus we can clearly conclude that propbability complement of an Event (E?) is total probability subtracted with probability of that Event (E)

P(E ∪ F) = P(E) + P(F) - P(E ∩ F) : This one follows the third axiom of probability. If events E and F do not overlap, their intersection is a null set, meaning P(E ∩ F) = 0. However, if they do overlap, Let E and F as two overlapping discs. Probability, in this sense, works like an area. To find the total area covered by both discs (E and F), without counting the overlapping part twice, we must subtract the overlapping area (P(E ∩ F)) once from the sum of P(E) and P(F). Similary we can say P(E ∩ F) = P(E) + P(F) - P(E ∪ F).

P(E ∩ F) = P(F|E)*P(E) (Conditional Probability) : By P(F|E) , we mean the probability of Event F occurring given that Event E has already occurred. Since E has happened, our sample space is now limited to the outcomes of E. However, we want to find the probability of F within this reduced space, knowing that E has occurred. This corresponds to the outcomes where both E and F occur together (E ∩ F). To ensure the probabilities within this reduced space sum to 1, we need to normalize by dividing by P(E)

Let's understand with the example :

Imagine we have a standard deck of 52 playing cards. Let's define two events:

Event E: Drawing a Heart. There are 13 Hearts in a deck, so P(E)=13/52=1/4.
Event F: Drawing a Queen. There are 4 Queens in a deck.

We want to find the conditional probability of drawing a Queen, given that we have drawn a Heart

Since we know we have drawn a Heart (Event E), our sample space is reduced to the 13 Hearts in the deck. We are no longer considering the entire deck of 52 cards but only those 13 Hearts.

Now, we need to find the probability that we have drawn both a Heart and a Queen. There is only 1 Queen of Hearts in the deck, so:

Thus our formula P(F|E) = P(E ∩ F)/P(E), normalizes by dividing the joint probability of E and F by the probability of E. This effectively adjusts the probability from the original sample space (52 cards) to the new sample space where only Hearts (13 Cards) are considered.

P(E ∩ F) = P(F)*P(E) Since P(F|E)=P(E|F) (Independent Events), : Both Events E and F are not dependent on each other thus occurance of one doesn't have any effect on other.
Law of Total Probability :

Let say we have Sample Space (S) with more complicated event spaces, these spaces form partition of S provided A? ∩ A?= {} for i ≠ j and Union of all forms a Sample Space.

领英推荐

The AIFI Newsletter: 8th October 2024

AIFI - Artificial Intelligence Finance Institute 5 个月前

The Numbers Game Gets an Upgrade: Qwen2-Math Ushers in…

A Square Solution 7 个月前

Mathematical Foundations

Valentino Assandri 3 个月前

Now we are putting a disc on top of it, which represent Event (B). Then what will be probability event B?

As per the Law of Probability it is as mentioned below :

From the Fig.10 and formula provided (in Fig.11), we see that Event B and the subsequent event spaces A? overlap. This means that B intersects with each partitioned event space A?, forming new sub-events that belong to both B and A?. To find the total probability of event B, we use the Law of Total Probability, which tells us how to express the probability of an event that spans across multiple partitions of the sample space.

Think of each partition A?, as a separate "world" in which event B might occur. The conditional probability P(B|A?) tells us how likely B is to occur within that specific "world," and P(A?) tells us the likelihood of being in that "world" in the first place. The Law of Total Probability essentially combines these possibilities to give us the overall probability of B happening across the entire sample space S.

Let's understand it with example of Pizza :

Imagine you have a big pizza (the Sample Space, S), and this pizza is cut into different slices of different flavors (these slices are like the event spaces: A1, A2, A3,...). Now, each slice has a different flavor, and together, they make up the whole pizza.

Now, let's say you put a piece of cheese on the pizza. The cheese lands on some parts of different slices (this is Event B). The question is, how much of the pizza has cheese on it?

To figure that out, you need to look at each slice separately. You see how much cheese is on each slice and add it all up. This is what the formula does! It adds up all the little parts of the slices that have cheese (the cheese on A1, the cheese on A2, and so on) to get the total amount of cheese on the pizza.

So, the formula is like saying, "Let's see how much cheese is on each flavor slice and then add them all together to know how much of the whole pizza has cheese!"

Baye's Theorem

By just looking at above formula you might have figured out it is derived from the Conditional Probability and uses Law of total probability

Let's understand this with an example

Imagine you have a toy box with two types of toy cars: red cars and blue cars. Some cars make noise, and some are quiet. You want to find a red car that makes noise. But you don't know which car will be noisy until you pull one out!

What We Know About the Toy Box

- There are 10 red cars and 10 blue cars in the toy box (20 cars total).

- Out of the 10 red cars, 2 are noisy.

- Out of the 10 blue cars, 6 are noisy.

At the start, you might think, "If I pick a noisy car, it could be red or blue." But as you get more information (like hearing noise), you need to update your belief.

New Information — You Hear Noise!

You reach into the toy box without looking and pull out a car. You hear it making noise! Now you know it’s a noisy car, but you don’t know if it’s red or blue.

You start to wonder, "How likely is it that this noisy car is red?"

Bayes' Theorem helps us update our belief (hypothesis) based on the new evidence (information).

- Hypothesis (H): "The noisy car is red."

- Evidence (E): "The car makes noise."

Where :

- P(H|E): The updated belief — probability that the car is red given that it makes noise.

- P(E|H): The likelihood — probability that a red car makes noise. Since 2 out of 10 red cars make noise, P(E|H) = 2/10 = 0.2.

- P(H): The prior belief — probability that any car you pick is red before knowing if it makes noise. Since there are 10 red cars out of 20 total cars, P(H) = 10/20 = 0.5.

- P(E): The total evidence — probability of picking any noisy car, regardless of color. There are 2 noisy red cars and 6 noisy blue cars, so P(E) = (2 + 6)/20 = 8/20 = 0.4.

Applying Bayes' Theorem

Now, let’s calculate the probability that the noisy car is red using Bayes' Theorem:

P(H|E) = 0.2 x 0.5/0.4 =0.25

- P(H|E) = 0.25 means there is a 25% chance the noisy car is red after hearing it makes noise.

- At first, you might think there's a good chance a noisy car could be red or blue. But after hearing the noise and knowing that more noisy cars are blue, you realize the chance of it being a noisy red car is actually lower (only 25%).

Bayes' Theorem is like being a clever detective. You start with a guess (hypothesis) — "I might have a noisy red car." But when you get new evidence (you hear the noise), you use Bayes' Theorem to update your guess. Now, you realize it's more likely a noisy blue car because there are more noisy blue cars in the box. This is how Bayes' Theorem helps us change our belief when we learn something new!

要查看或添加评论，请登录

Omkar C.的更多文章

Offline-first architecture in Flutter Apps

2025年1月1日

Offline-first architecture in Flutter Apps

Well, what is Offline-First? and why does it matter? Imagine this: your app constantly fetches the same data from a…

Maths for AI 101: Fundamentals of Probability

Omkar C.

Flutter Developer | Mobile App Developer | Node.js | AI Enthusiast

What is Probability?

Basic Terminologies

Axioms Of Probability

领英推荐

Baye's Theorem

Omkar C.的更多文章

社区洞察

其他会员也浏览了

Mathematical foundations of Data Science: Understanding machine learning algorithms as a function f(x) that maps your inputs and outputs

A Machine Learning Guide to HTM (Hierarchical Temporal Memory)

Major software libraries for physics-informed machine learning

Artificial Brains. INDUSTRY 4.0 Decision Support Systems. Augmented & Distributed Artificial Intelligence

Artificial Intelligence #13: An easy maths-based strategy to understand machine learning and deep learning

Machine Learning Methods and Macroeconomy Forecasting: An In-Depth Exploration

LLMs Can’t Learn Maths & Reasoning: What Recent Research Reveals

Pythonizing Business Efficiency (Part2): Task I — Yield Forecasting Using LSTM

Cracking the Code: My Journey to Mastering Stock Predictions

What is Probability?

Basic Terminologies

Axioms Of Probability

领英推荐

Baye's Theorem

Omkar C.的更多文章

Offline-first architecture in Flutter Apps

社区洞察

其他会员也浏览了

Mathematical foundations of Data Science: Understanding machine learning algorithms as a function f(x) that maps your inputs and outputs

A Machine Learning Guide to HTM (Hierarchical Temporal Memory)

Major software libraries for physics-informed machine learning

Artificial Brains. INDUSTRY 4.0 Decision Support Systems. Augmented & Distributed Artificial Intelligence

Artificial Intelligence #13: An easy maths-based strategy to understand machine learning and deep learning

Machine Learning Methods and Macroeconomy Forecasting: An In-Depth Exploration

LLMs Can’t Learn Maths & Reasoning: What Recent Research Reveals

Pythonizing Business Efficiency (Part2): Task I — Yield Forecasting Using LSTM

Cracking the Code: My Journey to Mastering Stock Predictions