登录查看更多内容

Estimating Proportions - Again

Larry Jones

发布日期: 2025年2月1日

I was not particularly happy with my post yesterday Think Bayes Estimating Proportions. My goal is to make these posts accessible both to folks familiar with software, particularly Python, but also to folks who do not have this technical background. Yesterday, I was focusing on non-technical readers, but I am not happy with my results. So I'm trying again.

As a reminder, Bayes' theorem is expressed mathematically as

Bayes' Theorem in Terms of Hypothesis and Data

The left side of the equals side can be read as "The hypothesis is true given some data is observed." If you think this relates to science, it does.

The right-hand side has three terms. The term, P(H) is the probability that the hypothesis is true with no data (or the prior). The term, P(D | H), is the probability of seeing the observed data if the hypothesis is true; also known as the likelihood. The final term, P(D), is the probability of observing the data under any hypothesis.This final term is sometimes called the marginal likelihood or the model evidence.

领英推荐

NumPy for Data Science Beginners: 2021

Free Online Courses With Printable Certificates 1 年前

NumPy for Data Science Beginners: 2021

Free Online Courses With Printable Certificates 1 年前

Data Science #36

Andriy Burkov 3 个月前

For the problem of assessing the bias of a coin, let us posit (or hypothesize but that terminology would become too confusing) that the bias of the coin ranges from 0 (a two-tailed coin) through a fair coin (heads and tails equally likely) to 1 (a two-headed coin). You can think of this as a set of 101 urns containing balls labelled either 'H' or 'T'. The first urn has 100 balls labeled 'T'. The second urn contains one ball labeled 'H' and 99 labeled 'T'. And continue increasing balls labeled 'H' and decreasing balls labeled 'T' in each urn until, finally, the last urn has 100 urns labeled 'H'. The fraction of balls labeled 'H' in each urn corresponds to a single hypothesis that we use in Bayes' theorem.

The term, P(D | H), is called the likelihood. It is the probability of the observed data given a specific hypothesis is true. For example, suppose we flipped our Euro coin and it came up heads. For our "zeroth" urn, having all 100 balls labeled 'T', the likelihood of seeing a coin flip of heads is zero. Similarly, for our next urn, the likelihood of drawing a ball labeled 'H' (or seeing a head after our flip) is 1 / 100. The likelihood increases for each hypothesis by 1/100 until hypothesis 101 in which corresponds to the urn with 100 balls all labeled 'H'.

The most difficult term to calculate is P(D), the probability of the data under all hypotheses. Because this term is independent of the data, we can sometimes ignore the calculation. For example, suppose we only have two hypotheses and all we care about is the ratio of the probability of two different hypotheses. In this ratio, P(D) cancels out. A second simplifying case occurs if we can determine that the hypotheses we care about are mutually independent and identically distributed (see https://en.wikipedia.org/wiki/Independent_and_identically_distributed_random_variables). In the general case, as Wikipedia reports (https://en.wikipedia.org/wiki/Marginal_likelihood), "Unfortunately, marginal likelihoods are difficult to compute."

I could continue and use these details to explain how they apply to our problem of estimating proportions (and the Python code written by Downey), but I think I'll wait. As older TV shows said, "Tune in next time."

要查看或添加评论，请登录

Larry Jones的更多文章

Think Bayes Estimating Proportions

2025年1月30日

Think Bayes Estimating Proportions

Over the next several days, I will be using examples from the book, "Think Bayes: Bayesian Statistics in Python" by…
A Diversion from E. T. Jaynes

2025年1月30日

A Diversion from E. T. Jaynes

Yes. I said it was time to move onto other applications of Bayes' theorem.
Alternatives to the Monty Hall Problem

2025年1月29日

Alternatives to the Monty Hall Problem

In my previous two posts, I introduced the Monty Hall Problem and then presented the (a) solution. Introducing the…
Solving the Monty Hall Problem

2025年1月27日

Solving the Monty Hall Problem

In my previous post, [Introducing the Monty Hall Problem](https://www.linkedin.
Introducing the Monty Hall Problem

2025年1月22日

Introducing the Monty Hall Problem

Theoretically, Bayes' Theorem is sufficient to determine one set of conditional probabilities from other probabilities…
The Vocabulary of Bayes

2025年1月18日

The Vocabulary of Bayes

Like many ideas, Bayes' Theorem introduces a new vocabulary and some adapted vocabulary. The new vocabulary consists of…
The Basics of Bayes' Theorem

2025年1月16日

The Basics of Bayes' Theorem

I'd like to introduce you to the "basics" of Bayes' theorem. The basics are very simple (well, at least for the…

1 条评论
70% True and 30% False - An Introduction

2025年1月15日

70% True and 30% False - An Introduction

Much of my education (a Masters degree in Physics) and my career has focused on finding "the answer" to problems…

1 条评论

See all articles

Estimating Proportions - Again

Larry Jones

领英推荐

Larry Jones的更多文章

社区洞察

其他会员也浏览了

Data Science #36

Welcome to Tinker & Trade Weekly: Where Curiosity Meets the Market

Kaplan Meier Curve

Investigating the effect of Company Announcements on their Share Price following COVID-19 (using the S&P 500)

Data Science #4

Definitive Guide for Algo Analysis - Part One

How to Estimate Chance with Dice Rolls Using Convolutions and Recursion

Unveiling the Power of Advanced Algo Trading Strategies

Data science and machine learning in petroleum geostatistics - Part 2

Graph Theory 101 - Part:8 - Multilayer & Multiplex Networks

领英推荐

Larry Jones的更多文章

Think Bayes Estimating Proportions

A Diversion from E. T. Jaynes

Alternatives to the Monty Hall Problem

Solving the Monty Hall Problem

Introducing the Monty Hall Problem

The Vocabulary of Bayes

The Basics of Bayes' Theorem

70% True and 30% False - An Introduction

社区洞察

其他会员也浏览了

Data Science #36

Welcome to Tinker & Trade Weekly: Where Curiosity Meets the Market

Kaplan Meier Curve

Investigating the effect of Company Announcements on their Share Price following COVID-19 (using the S&P 500)

Data Science #4

Definitive Guide for Algo Analysis - Part One

How to Estimate Chance with Dice Rolls Using Convolutions and Recursion

Unveiling the Power of Advanced Algo Trading Strategies

Data science and machine learning in petroleum geostatistics - Part 2

Graph Theory 101 - Part:8 - Multilayer & Multiplex Networks