Probably Dangerous Misunderstanding Probability
Some years ago, my long-time friend lamented about his difficult decision-making situation about whether or not to go ahead with his father’s cancer surgery.
“Possible to postpone the operation?” I asked.
“If postponed, he’d have to live with pain though it’d be manageable.” He replied.
“What did the doc say about the chance of success for the surgery?” I continued.
“Well, he said something like 51% chance …”
“Of success?”
“Yeah.”
“But what did that mean? He showed you some research showing 51 cases out of a hundred of similar surgery led to success?” I asked unbelievably.
“No, he just mentioned off-the-cuff.” He said, kind of helplessly.
“And based on that you went ahead with the surgery?”
“Yeah, it’s 51%.”
Since 51% is higher than 49%, is it possible for a son to say no to a potentially life-saving surgery for his father, especially when other family members and relatives are also watching? But seriously, a probability of 51% compared to 49% is not the sort of difference in reality that most people could fathom. Having 2% more probability of success than failure is not the same “feel” and reality as having 2 more dollars or 2 more apples. What was in the doctor’s mind when he said 51% chance of success? To give a rounded value because researched probability was supposed to be 0.505? To leave the interpretation of second decimal place probability to patient because high-precision probability values matter to our decision-making? Or could it be because in the doctor’s opinion that it would be better to go for an operation, he uttered 51% instead of 50%?
We typically pick the choice which we have calculated or think would have the highest chance of success. We all do that, and more often than not, it seems like a very legitimate and rational way to reason the logic of coming to any particular conclusion. This is so because when we explain our decisions based on probability of achieving higher success, our managers, directors, shareholders, customers, patients, public and so on get it – they would nod their heads in agreement because that would be how they would conclude as well. When it is an acceptable reasoning method, such as making the decision that yields highest probability of success, the decision maker would be exonerated even in the face of an adverse outcome. A venture fund investment manager would not be questioned further if he could show that the failed start-up was the most likely to succeed at the point of investment. A marketing manager who decided to use a particular advertising TV clip over three other proposals would not be questioned too much for failure if she had selected what was then a clip with the highest probability of success. A doctor would not be faulted for a patient’s demise if the choice made out of three treatment methods was known to be giving the highest probability of recovery.
As I often mention to my students in statistics class, most of us might go about life not using much of the theory and mathematics in statistics to produce probabilities. However, almost all of us have to read about and consume probability values. The danger of not fully appreciating the consumed probability value, or to haphazardly produce a probability value (I am avoiding the word “irresponsibly”), is very much amplified when a single value is used to make extremely critical decisions, be it a live-saving surgery, taking up a career assignment, investing in stocks, launching a marketing campaign, a price war, and so on. But how and why would the producer and consumer of a probability value differ in their intended meanings? Isn’t it easy to align them – just go back to the definition of probability and everyone would have the same understanding as to what a probability value means, right? Not quite. I’m afraid the probabilisticians, or rather the statisticians (because I googled “probabilisticians” and learned that it was not yet a word to be found), don’t quite agree with each other on what is the meaning of “probability” still. How would the rest of us mere mortals be any clearer?
To begin, there are THREE definitions of probability.
The first definition goes as follows: first count the events which we are interested in measuring its probability of occurrence, then divide that count by the total number of observations we made to get those counts. This gives us a fraction called Relative Frequency, not yet a probability. Relative frequency is based on and conveys a message only about past events, as opposed to probability which tells us information about future events. For instance, if you throw 12 times (total count of observations) an unknown dice discarded from a casino, and notice “six” being shown for 3 times (count of interested event), then you may say the relative frequency of “six” of the dice is 3/12 = 0.25. To make it a probability, you will “declare” that this relative frequency is equal to the probability of observing “six”. What you are declaring is that in future, no matter what the order of occurrence of “six” will be, the proportion of “six” to the total count will always be 25%. To illustrate, you are saying that in future, any of the following universes could happen:
- 1 1 6 6 2 2 3 3 …
- 1 6 1 6 2 2 3 3 …
- 6 1 2 5 3 4 6 2 …
- 2 3 3 4 5 1 6 6 …
so long as the proportion of occurrence of our interested event, “six”, is always fixed at 25%.
For instance, suppose a national swimmer won gold medal two times (count of interested event) out of three participations (total count of observations) in recent Olympics 100 meter male freestyle events. He is about to enter the next Olympics 100 meter male freestyle swim in Tokyo 2020. What is the probability that he will win another gold medal? Based on past information, we gather a relative frequency of gold medal as 2 out of 3. So, with some bold bravado, we say the probability of winning another gold medal is 2/3, or about 0.6667. We are also implying that in future, the swimmer will always win two out of three, or four out of six, or six out of nine Olympics swims.
An entrepreneur who started 6 companies, failed four times and exited successfully twice, has a relative frequency of 2/6 = 0.3333 or 33.33% success rate. A venture capital manager deciding whether or not to invest in this entrepreneur might like to know the probability of success if he is invested again. The entrepreneur could present to the manager a probability of success of 33.33% based on evidence of relative frequency. The manager could readily agree with this reasoning and rank this entrepreneur as a better bet than other start-ups with 0% success by the founders, though other factors might ultimately influence whether or not this entrepreneur will be invested by the venture capital fund. The entrepreneur is conveying the meaning that if he is invested 100 times in future, he would give successful returns to the investment about 33 times, which could occur after fifty investments – the entrepreneur is not saying that the current investment deal being discussed will be the next successful outcome.
The second definition of probability also involves counting but allows the counting to be done on theoretical possibilities rather than real occurrences. So, take the six-sided dice again as an example. Using this definition, we know theoretically that a perfect dice has six equal sides with all sides having equal chance of being shown as the top face. Thus, if we have to give a probability of “six” occurring (ie shown as the top face), we count one “six” as a possibility, and divide that by a total of 6 possibilities to get 1/6 = 0.1667. Notice that in getting this probability of “six”, we have not even touched a dice in real life but merely considered the potential possible outcomes. Now, suppose the face having “six” has been defaced such that it looks like a “one”. We would say the probability of observing a “six” is 0/6 = 0, since we count 0 face showing a “six”.
This definition of probability is often used in what is known as updating probability based on additional information. For example, suppose there is a huge jar of dices and 60% of them are known to be undefaced perfect dice while the rest of 40% are defaced dices where “six” is defaced to look like a “one” (so there are two “one” faces). A dice is now randomly picked from the jar and tossed. If we observe a “six”, what is the probability that it is an undefaced dice? That’s right, 1! We are absolutely sure that the dice must be undefaced because a defaced dice will never show a “six”. But if we observe a “one”, what is the probability that the dice is undefaced? This is harder to answer. Based on one throw alone, it is almost impossible to answer the question using the first definition of probability. But here, without tossing any dice at all, we know the theoretical probabilities of undefaced and defaced dice, which are:
Those of you who have heard of the Bayes’ Theorem would find that it applies nicely here to give us the answer. The short story is that we multiply the probabilities by the proportion in jar to get what is called the joint probability. For example, the joint probability of a defaced dice and showing “one” is 0.4 x 2/6 = 0.1333. After that, we gather appropriate joint probabilities to calculate the updated probability telling us: given an observation of “one”, what is the probability of the dice being undefaced? Applying Bayes’ Theorem, we get the following probabilities:
In other words, the probability of the dice being undefaced when it shows a “one” is 3/7 or 0.4286. Interestingly, if we force-use the first definition of probability, we could insist that if we had thrown the specific dice 1,000 times, we would have counted perhaps about 167 times “one”, thus also arriving at the conclusion that the dice most likely is an undefaced dice. But there are two problems with this argument: (i) the chosen dice can only be thrown once, and (ii) determining the probability of the dice being an undefaced dice is different from determining whether the chosen dice is indeed undefaced.
There are many practical situations in which the equivalent “dice” involved can only be thrown once whose true nature of whether being defaced or otherwise could not be ascertained. For example, suppose from historical data, it was known that 5% of inmates were wrongly imprisoned. A lie detector was known to have 80% accuracy in detecting whether an inmate was telling the truth. An inmate screamed for justice for being wrongly imprisoned. The lie detector indicated that the inmate was telling the truth. What is the probability that the inmate was wrongly imprisoned?
The manner to answer this probability question is the same as what we did just now. First we formulate the known probabilities:
Without the lie detector’s information (that the particular inmate was telling the truth), it would be objective to tell the inmate that 95% he was not wrongly imprisoned and he should stop his objection. But with the lie detector’s information, we could apply Bayes’ Theorem to update our prior probabilities (ie 5% and 95%) to get:
So, accordingly, the probability that the screaming inmate was wrongly imprisoned given that the lie detector indicated inmate was telling the truth was 0.174. Supposing such a probability was alarming since inmates must have gone through the rigors of prosecution to be imprisoned, it could be a justifiable evidence for his case to be reviewed.
The third probability definition would appear less scientific, as it is defined as the number between 0 to 1 which one feels justified to give for the occurrence of a particular event. It is subjective and less evidential. Most academic textbooks I’ve seen give this definition a miss, although I have seen a textbook mentioning this definition albeit in a cursory manner. Despite its apparent weakness, it is quite commonly heard. Typically, the event with highest subjective probability is commonly known as a “hunch”. Crime dramas would typically include a seasoned detective whose hunches about the crux of the crimes would almost always be in the right direction of solving the crimes. Experienced jungle navigators would be able to trek through an unchartered part of the jungle through educated “guesses”.
Students who know their own study success well could usually give a pretty good estimated probability about their chance of passing or getting an A+ without going through the counting and analysis as used in the first two definitions of probability. Certainly, such “subjective” probabilities go beyond just students assessing their chances of passing. Faced with a waist-level hurdle, people usually have a good probability estimate about whether they could jump over to clear the hurdle or get injured along the way. You can readily provide a probability for your career success in 10 years’ time, or the probability of getting a promotion, or the probability of your start-up company being successful, and so on. Or if you are a surgeon doctor, you could also readily give a probability of the success of a particular surgery on a patient just “from the gut”. Stock trading firms would hire “sharp-shooter” in-house traders – those who tend to have very good “gut instinct” about which stock to trade, to long, or short, or by how much. Companies like to hire “rain-maker” salespeople – those who somehow just knew how to spend their time to get the highest probability of clinching the next deals. Sports teams like basketball, badminton, table-tennis and so on like to have players with “ball sense” – those who could move their arms and limbs swiftly in such a way as to achieve the highest probability of defense and attack. The list goes on.
You get the idea – you can indeed come up with a pretty meaningful probability number that is neither 1 or 0. Calling such a probability definition “subjective” or “unscientific” might not pay sufficient acknowledgement to the validity of these numbers – they could well be the most appropriate value derived from unconscious sum total of all the information, experience, background knowledge, interactions between knowledge, and so on which is beyond what we could understand at the moment.
Being consumers of probabilities ourselves and given that there are three different interpretations of probabilities, we have to be extra careful before we act on the choice with the highest probability of success. When presented with probability proposals, we should at least probe on what definition were the probability values based: were they produced and to be interpreted based on the relative-frequency definition, the Bayes’ Theorem definition, or the intuitive definition? If you are faced with a major surgery and are told by the doctor that the probability of success is 90%, you probably wouldn’t want to act on the number alone before you clarify how the probability value was obtained or to be interpreted. If it was based on 100 past similar surgeries by reputable institution with 90 cases being successful, you might be more reassured. But it is not to say that an intuitive probability should be written off immediately. If your surgeon had known you for decades and said that even though published success rate based on relative-frequency was 60%, his hunch was that the surgery should be completely safe in your case. This would be very helpful information for your consideration, too, don’t you think?
As for my friend’s father, he didn’t make it – my friend and I had the conversation about doctor’s probability of successful surgery at the wake. I certainly hope he had made the decision to go for surgery with knowledge of the three definitions of probability.
Can write some code ...
6 年This is a very well written topics on probability theory. Thanks for the sharing.