Brief Guide to Understanding Bayes’ Theorem

Brief Guide to Understanding Bayes’ Theorem

Before you begin using Bayes’ Theorem to perform practical tasks, knowing a little about its history is helpful. The reason this knowledge is so useful is because Bayes’ Theorem doesn’t seem to be able to do everything it purports to do when you first see it, which is why many statisticians rejected it outright.

After you do have a basic knowledge of how Bayes’ Theorem came into being, you need to look at the theorem itself. The following sections provide you with a history of Bayes’ Theorem that then moves into the theorem itself. Here, Bayes’ Theorem is presented from a practical perspective.

A little Bayes history

You might wonder why anyone would name an algorithm Na?ve Bayes (yet you find this algorithm among the most effective machine learning algorithms in packages such as Scikit-learn). The na?ve part comes from its formulation; it makes some extreme simplifications to standard probability calculations. The reference to Bayes in its name relates to the Reverend Bayes and his theorem on probability.

The Reverend Thomas Bayes (1701–1761) was an English statistician and a philosopher who formulated his theorem during the first half of the eighteenth century. Bayes’ Theorem is based on a thought experiment and then a demonstration using the simplest of means.

Reverend Bayes wanted to determine the probability of a future event based on the number of times it occurred in the past. It’s hard to contemplate how to accomplish this task with any accuracy.

The demonstration relied on the use of two balls. An assistant would drop the first ball on a table where the end position of this ball was equally possible in any location, but not tell Bayes its location. The assistant would then drop a second ball, tell Bayes the position of the second ball, and then provide the position of the first ball relative to the location of this second ball

The assistant would then drop the second ball a number of additional times — each time telling Bayes the location of the second ball and the position of the first ball relative to the second. After each toss of the second ball, Bayes would attempt to guess the position of the first. Eventually, he was to guess the position of the first ball based on the evidence provided by the second ball.

The theorem was never published while Bayes was alive. His friend Richard Price found Bayes’ notes after his death in 1761 and published the material for Bayes, but no one seemed to read it at first. Bayes’ Theorem has deeply revolutionized the theory of probability by introducing the idea of conditional probability — that is, probability conditioned by evidence.

The critics saw problems with Bayes’ Theorem that you can summarize as follows:

  • Guessing has no place in rigorous mathematics.
  • If Bayes didn’t know what to guess, he would simply assign all possible outcomes an equal probability of occurring.
  • Using the prior calculations to make a new guess presented an insurmountable problem.

Often, it takes a problem to illuminate the need for a previously defined solution, which is what happened with Bayes’ Theorem. By the late eighteenth century, the need to study astronomy and make sense of the observations made by the

  • Chinese in 1100 BC
  • Greeks in 200 BC
  • Romans in AD 100
  • Arabs in AD 1000

became essential. The readings made by these other civilizations not only reflected social and other biases but also were unreliable because of the differing methods of observation and the technology use.

You might wonder why the study of astronomy suddenly became essential, and the short answer is money. Navigation of the late eighteenth century relied heavily on accurate celestial observations, so anyone who could make the readings more accurate could reduce the time required to ship goods from one part of the world to another.

Pierre-Simon Laplace wanted to solve the problem, but he couldn’t just dive into the astronomy data without first having a means to dig through all that data to find out which was correct and which wasn’t. He encountered Richard Price, who told him about Bayes’ Theorem.

Laplace used the theorem to solve an easier problem, that of the births of males and females. Some people had noticed that more boys than girls were born each year, but no proof existed for this observation. Laplace used Bayes’ Theorem to prove that more boys are born each year than girls based on birth records.

Other statisticians took notice and started using the theorem, often secretly, for a host of other calculations, such as the calculation of the masses of Jupiter and Saturn from a wide variety of observations by Alexis Bouvard.

The basic Bayes theorem formula

When thinking about Bayes’ Theorem, it helps to start from the beginning — that is, probability itself. Probability tells you the likelihood of an event and is expressed in a numeric form.

The probability of an event is measured in the range from 0 to 1 (from 0 percent to 100 percent) and it’s empirically derived from counting the number of times a specific event happens with respect to all the events. You can calculate it from data!

When you observe events (for example, when a feature has a certain characteristic) and you want to estimate the probability associated with the event, you count the number of times the characteristic appears in the data and divide that figure by the total number of observations available. The result is a number ranging from 0 to 1, which expresses the probability.

When you estimate the probability of an event, you tend to believe that you can apply the probability in each situation. The term for this belief is a priori because it constitutes the first estimate of probability with regard to an event (the one that comes to mind first).

For example, if you estimate the probability of an unknown person’s being a female, you might say, after some counting, that it’s 50 percent, which is the prior, or the first, probability that you will stick with.

The prior probability can change in the face of evidence, that is, something that can radically modify your expectations.

For example, the evidence of whether a person is male or female could be that the person’s hair is long or short. You can estimate having long hair as an event with 35 percent probability for the general population, but within the female population, it’s 60 percent. If the percentage is higher in the female population, contrary to the general probability (the prior for having long hair), that’s useful information for making a prediction.

Imagine that you have to guess whether a person is male or female and the evidence is that the person has long hair. This sounds like a predictive problem, and in the end, this situation is similar to predicting a categorical variable from data: We have a target variable with different categories and you have to guess the probability of each category based on evidence, the data. Reverend Bayes provided a useful formula:

P(B|E) = P(E|B)*P(B) / P(E)

The formula looks like statistical jargon and is a bit counterintuitive, so it needs to be explained in depth. Reading the formula using the previous example as input makes the meaning behind the formula quite a bit clearer:

  • P(B|E): The probability of being a female (the belief B) given long hair (the evidence E). This part of the formula defines what you want to predict. In short, it says to predict y given x where y is an outcome (male or female) and x is the evidence (long or short hair).
  • P(E|B): The probability of having long hair, the evidence of when a person is female. In this case, you already know that it’s 60 percent. In every data problem, you can obtain this figure easily by simple cross-tabulation of the features against the target outcome.
  • P(B): The probability of being a female, which has a 50 percent general chance (a prior).
  • P(E): The probability of having long hair in general, which is 35 percent (another prior).
When reading parts of the formula such as P(B|E), you should read them as follows: probability of B given E. The | symbol translates as given. A probability expressed in this way is a conditional probability, because it’s the probability of a belief, B, conditioned by the evidence presented by E. In this example, plugging the numbers into the formula translates into.


60% * 50% / 35% = 85.7%

Therefore, getting back to the previous example, even if being a female is a 50 percent probability, just knowing evidence like long hair takes it up to 85.7 percent, which is a more favorable chance for the guess. You can be more confident in guessing that the person with long hair is a female because you have a bit less than a 15 percent chance of being wrong.

ABOUT THE AUTHOR

Vartul Mittal is Technology & Innovation Specialist. Vartul Mittal focuses on helping clients accelerate their digital transformation journey. He has 14+ years of Global Business Transformation experience in Management Consulting and Global In-house Centres in managing technology & business teams in Intelligent Automation, Advanced Analytics and Cloud Adoption. He is passionate about extending customer relationships beyond the current project with a longer term goal of becoming a trusted adviser and bringing greater value to businesses via digital disruption.

He has lived and worked across multiple countries and cultures involving senior client stakeholders from various industries like Financial Services Sector, FMCG and Retail. He has delivered engagements for Fortune 500 organizations such as Coca Cola India, Kotak Mahindra Bank, IBM, Royal Bank of Scotland, Standard Life Insurance, Citibank and Barclays. Vartul Mittal is also renowned speaker on Analytics, Automation, AI and Innovation among Top Global Universities and International Conferences


要查看或添加评论,请登录

Vartul Mittal的更多文章

  • Data Mining For Rookies

    Data Mining For Rookies

    Data mining is the way that ordinary businesspeople use a range of data analysis techniques to uncover useful…

    2 条评论
  • Amazon Web Services for Rookies

    Amazon Web Services for Rookies

    Amazon Web Services (AWS) is a cloud service provider that offers easy access to a variety of useful computing…

    2 条评论
  • Hybrid Cloud For Rookies

    Hybrid Cloud For Rookies

    A company deploys a hybrid cloud when it uses public and private cloud services in combination with its internal data…

  • API for Rookies

    API for Rookies

    What is an API? (Application Programming Interface) API is the acronym for Application Programming Interface, which is…

  • Machine Learning For Rookies

    Machine Learning For Rookies

    Machine learning is an incredible technology that you use more often than you think today and that has the potential to…

  • AI in the Professional Services Industry

    AI in the Professional Services Industry

    Business organizations look to professional services firms to offload existing processes such as payroll, claims…

  • The Limitations of the Data in Predictive Analytics

    The Limitations of the Data in Predictive Analytics

    As with many aspects of any business system, data is a human creation — so it’s apt to have some limits on its…

  • 10 Applications that require Deep Learning

    10 Applications that require Deep Learning

    This article is too short. It can’t even begin to describe the ways in which deep learning will affect you in the…

    1 条评论
  • Linear Regression vs. Logistic Regression

    Linear Regression vs. Logistic Regression

    Both linear and logistic regression see a lot of use in data science but are commonly used for different kinds of…

  • Predictive Analytics For Rookies

    Predictive Analytics For Rookies

    A predictive analytics project combines execution of details with big-picture thinking. These handy tips and checklists…

社区洞察

其他会员也浏览了