Estimating Proportions - Again
I was not particularly happy with my post yesterday Think Bayes Estimating Proportions. My goal is to make these posts accessible both to folks familiar with software, particularly Python, but also to folks who do not have this technical background. Yesterday, I was focusing on non-technical readers, but I am not happy with my results. So I'm trying again.
As a reminder, Bayes' theorem is expressed mathematically as
The left side of the equals side can be read as "The hypothesis is true given some data is observed." If you think this relates to science, it does.
The right-hand side has three terms. The term, P(H) is the probability that the hypothesis is true with no data (or the prior). The term, P(D | H), is the probability of seeing the observed data if the hypothesis is true; also known as the likelihood. The final term, P(D), is the probability of observing the data under any hypothesis.This final term is sometimes called the marginal likelihood or the model evidence.
For the problem of assessing the bias of a coin, let us posit (or hypothesize but that terminology would become too confusing) that the bias of the coin ranges from 0 (a two-tailed coin) through a fair coin (heads and tails equally likely) to 1 (a two-headed coin). You can think of this as a set of 101 urns containing balls labelled either 'H' or 'T'. The first urn has 100 balls labeled 'T'. The second urn contains one ball labeled 'H' and 99 labeled 'T'. And continue increasing balls labeled 'H' and decreasing balls labeled 'T' in each urn until, finally, the last urn has 100 urns labeled 'H'. The fraction of balls labeled 'H' in each urn corresponds to a single hypothesis that we use in Bayes' theorem.
The term, P(D | H), is called the likelihood. It is the probability of the observed data given a specific hypothesis is true. For example, suppose we flipped our Euro coin and it came up heads. For our "zeroth" urn, having all 100 balls labeled 'T', the likelihood of seeing a coin flip of heads is zero. Similarly, for our next urn, the likelihood of drawing a ball labeled 'H' (or seeing a head after our flip) is 1 / 100. The likelihood increases for each hypothesis by 1/100 until hypothesis 101 in which corresponds to the urn with 100 balls all labeled 'H'.
The most difficult term to calculate is P(D), the probability of the data under all hypotheses. Because this term is independent of the data, we can sometimes ignore the calculation. For example, suppose we only have two hypotheses and all we care about is the ratio of the probability of two different hypotheses. In this ratio, P(D) cancels out. A second simplifying case occurs if we can determine that the hypotheses we care about are mutually independent and identically distributed (see https://en.wikipedia.org/wiki/Independent_and_identically_distributed_random_variables). In the general case, as Wikipedia reports (https://en.wikipedia.org/wiki/Marginal_likelihood), "Unfortunately, marginal likelihoods are difficult to compute."
I could continue and use these details to explain how they apply to our problem of estimating proportions (and the Python code written by Downey), but I think I'll wait. As older TV shows said, "Tune in next time."