Bayesian probabilities visualized 2
Olivier Le Moal from istock

Bayesian probabilities visualized 2

In the previous article we covered the basics about what some of these words / phrases used in the Bayesian world really mean from a more logic based or computer programming point of view. e.g. posterior = output value.

This will be continued here to demystify each of the parts / operations used when applying Bayesian probabilities. All of these math concepts can be described as more basic generalizations from the computer programming world. For me at least, when I see a bunch of talk about "probability mass functions" or hypothesis etc. it comes across as something much more complicated than it actually turns out to be if you go through the trouble of translating from math to computer programming artifacts. In fact once that process is done I am left wondering, why are these things given fancy sounding names like "probability mass function" at all? I hope this helps others onramp to being productive much more quickly than it has been for me.

Also keep in mind that unlike what you may have learned in school most math is really only useful when chaining together operations into workflows (e.g. operations being repeated by looping over lists of numbers).


Monty Hall Problem

This story problem is often cited in books about statistics. It is about an old TV game show you can lookup on the Internet if you want to know more about it. In short the player has three doors to pick from, one of which has a valuable prize behind it. The show host "Monty" opens one of the doors after the player has chosen their door. Monty never opens the same door the player selected and never opens the door with the prize. This leaves two unopened doors. Then the player is given the chance to change their mind about which door they want to pick between the two doors still unopened.

This is a multiphase workflow where the probabilities change as the workflow progresses.

No alt text provided for this image

Here you can see from the perspective of the player the workflow and the percentages involved for one scenario.

Why the 66% in Phase 3? The idea is that you are comparing the probabilities between two populations. The Door 1 population (player chosen) and (Door 2 + Door 3) population (not chosen). Then Monty eliminates one of the doors not chosen, yet the total probability for the "not chosen" population remains the same. Monty opening one of the doors is just a red herring.

So what is a hypothesis? One way to think of it is first you have a scenario and then the scenario has multiple possible outcomes. So a hypothesis is then one possibility from a set of possibilities within the scenario. Each possibility in the set will have it's own variables that goes into a calculation. You will tend to setup a loop over the possibilities (hypothesis) and do the same calculation using the variables for each possibility. In the end you will either sum up the calculated values or pick the possibility with the highest calculated probability, etc.

In the picture above the scenario is the Monty Hall problem where the player has selected door 1 and Monty has opened door 3. The possibilities (hypothesis) would be that the prize is behind door 1, the prize is behind door 2, and the prize is behind door 3. Then calculations would be done for each possibility to determine something about the scenario (what is the probability of a prize for each door).

No alt text provided for this image

In the table above the column "% for Monty choice" is describing the probability for the choice made by Monty in the scenario.

Here is the scenario: The Monty Hall problem where the player has selected door 1 and Monty has opened door 3.

So, the column "% for Monty choice" is giving the probability Monty opened door 3 given the player chose door 1 and Monty knows which door has the prize.

There was a 50% chance Monty would open door 3 if the prize is behind door 1 (row 1).

There was a 100% chance Monty would open door 3 if the prize is behind door 2 (row 2).

There was a 0% chance Monty would open door 3 if the prize is behind door 3 (row 3).

Remember the rules say Monty cannot open the door the player chose (door 1) and Monty cannot open the door he knows the prize is behind.

The hardest part of doing this kind of work is probably figuring out what the different parts are and what the probabilities should be for each. Once you got these the math is straight forwards. Frame the question so that you list each thing involved and how those things relate to each other. For example, above you need to realize that the "% for Monty choice" for each possibility of where the prize could appear is required.

Just like in computer programming you have data structures and processing that gets done with the data in those data structures.

One basic data structure is the list. You can have lists of things, list of numbers, even lists of lists.

In the Python programming language the bracket symbols [ ... ] represent a container of a list of things. To access items within the list you provide an index for a slot in the list. List indexes start at zero and count up from there.

In the Monty Hall example above the data structure you need to come up with is a list of lists like this:

possibilities = [

[33%+, 50%],

[33%+, 100%],

[33%+, 0%],

]

Putting this in decimal form for doing the actual math:

possibilities = [

[0.3333, 0.5],

[0.3333, 1],

[0.3333, 0],

]

Looping over this list of lists and doing step one of the math...

outputs = []

row_index = 0

initial, monty = 0, 1

for row in possibilities:

outputs[row_index] = row[initial] * row[monty]

row_index += 1

The outputs here are not the final outputs. We have multiplied the percentages for each variable for each possibility and stored the results in the outputs list. Now the final output we want is the percentage for each possibility where they sum up to 100% for the entire set of possibilities. This final step is a transformation of the values we calculated so they are relative to the total population of all possibilities instead of only being relative to themselves.

Normalization

This is another one of those words from the math world that can use some explaining. One way to think of this is normalization = "the normal scale". What normal scale? The scale we normally want to see. So when talking about percentages we normally want to say 100% is the max which represents "all", and everything else is a subset of 100%. The total of all the subsets sum up to 100%. That is the scale we normally want to see when we are talking about percentages. So once we get done doing the math on the possibilities we can rescale or "normalize" the outputs to that scale. This also implies that there are other scales and we could just as easily rescale to match those (like temperature being scaled to Celsius, Fahrenheit, or Kelvin) and this is true. The transformation math is a little different for each scale type. Here we are going to normalize to the percentile scale 0-100%.

First lets look at the values we got inside the list "outputs" in the Python code above.

outputs = [0.16665, 0.3333. 0]

So here you can see that the output for the first row is (0.16665) about half of the output for row two (0.3333). That relative difference is going to be maintained when we rescale. In the first article we talked about how a baking recipe can be "two parts of one thing plus three parts of another thing plus five parts of something else. You can scale the amounts for a small dinner or a large dinner, but the 'relative parts' stay the same." This is the same concept here.

Also, lets remind ourselves what each row of the data stands for.

row one = The probability (%) that Monty would open door three given the player chose door one and the prize was behind door one.

row two = The probability (%) that Monty would open door three given the player chose door one and the prize was behind door two.

row three = The probability (%) that Monty would open door three given the player chose door one and the prize was behind door three.

All of these possibilities (hypothesis) are within the scenario of the player chose door one and Monty opened door three. This allows us to decide what the player should do next by calculating the percentages for what remains and then choose the possibility with the highest percentage.

So, now lets normalize the output values.

Step one is to sum up the values.

total = sum([0.16665, 0.3333. 0])

The total is now equal to approximately 0.49995 with rounding.

We next loop over the output values and divide each by the total.

row_index = 0

for output in outputs:

outputs[row_index] = output / total

row_index += 1

Now the outputs list contains:

outputs = [0.3333, 0.6666, 0]

In other words, [33%+, 66%+, 0]

If you take the rounding into account these sum up to 100%.

That is all there is to it for doing a "normalization".

If we look at the table from above again you can see this is what we said the output values were.

No alt text provided for this image

Also remember there are other types of normalization for different types of scales similar to how a temperature value can be fitted to the different scales Celsius, Fahrenheit, or Kelvin.

One example is where you take a list of numbers and instead of summing them to get a total and then dividing each number by the total, you just take the max value from the list and divide each value in the list by that max value. This makes the percentages relative to the max value instead of relative to the sum of all the values.

Like this:

values = [1, 3, 6, 8, 9, 11]

row_index = 0

max_value = 11

for value in values:

outputs[row_index] = value / max_value

row_index += 1

Which results in this output list rounded and converted to percentages:

outputs = [9%, 27%, 55%, 73%, 82%, 100%]

This is saying that:

1 is 9% of 11

3 is 27% of 11

6 is 55% of 11

And so on...


The takeaways for this article are:

1) We can apply the concepts of population subgroups to separate the parts of a story problem and figure out percentages (probabilities) for each to then calculate other answers.

2) The word "hypothesis" can be thought of as "possibilities" instead.

3) The word "normalization" can be thought of as rescaling values to a form we like.

In the next article we will continue to simplify the language used in the statistics world by mapping to easier to understand words and operations. I mentioned at the top of this article something called "probability mass functions". We will show how simple of a thing this really is as well as looking a bit more into using Python to do operations across entire lists of things without using loops directly like we did in this article.

要查看或添加评论,请登录

Gerald Gibson的更多文章

  • Chat That App Intro

    Chat That App Intro

    Chat That App is a Python desktop app I created by using ChatGPT from OpenAI to generate classes, functions, etc. that…

  • ChatGPT + Timeseries Anomalies

    ChatGPT + Timeseries Anomalies

    Over the past five years, I have been transforming my career from software engineering to machine learning engineering.…

    2 条评论
  • Airflow + PostgreSQL + WSL

    Airflow + PostgreSQL + WSL

    Airflow is a software service that provides asynchronous and distributed execution of workflows. There are several…

    3 条评论
  • TensorFlow-GPU + Ubuntu + WSL

    TensorFlow-GPU + Ubuntu + WSL

    This article walks you through the steps I discovered recently for setting up a working environment to create…

    4 条评论
  • Probabilistic Data Separation

    Probabilistic Data Separation

    Clusters, modes, distributions, categories, sub-populations, sub-signals, mixtures, proportions, ratios, density curve.…

  • Regional and Online Learnable Fields

    Regional and Online Learnable Fields

    Regional and Online Learnable Fields is a type of data clustering algorithm invented in the early 2000's. It was…

    1 条评论
  • Designing an architecture for MLOps

    Designing an architecture for MLOps

    A large part of architecting anything complex (think software, large buildings, aircraft, etc.) is the skill of mental…

  • Splunk & Datacamp Training

    Splunk & Datacamp Training

    Not a real article. Just a place to host these since the one drive sharing option is not working.

  • Random, Stochastic, Probabilistic

    Random, Stochastic, Probabilistic

    At the end of the previous article it was mentioned that we would show how, from a computer programming perspective…

  • Bayesian probabilities visualized

    Bayesian probabilities visualized

    I once saw an interview of Benoit Mandelbrot in which he described as a child in his math studies he saw shapes in his…

社区洞察

其他会员也浏览了