COVID-19 test accuracy versus testing accuracy - so does Elon Musk have it or not? PART 2

In PART 1 (https://www.dhirubhai.net/pulse/covid-19-test-accuracy-versus-testing-so-does-elon-musk-waters) of this article we looked at the difference between test accuracy and testing accuracy. And we learned that the two terms are very different - for example, if 0.1% of a population is infected and we test them with a 99.9% accurate test, 50% of the time we will incorrectly inform a patient that he or she is infected.

But that was a scenario when we only test somebody once.

Musk's multiple tests

Often we can test somebody multiple times. That's what happened to Musk, who was tested four times within around one hour and received positive, negative, negative, positive results. So how do we interpret a series of test results?

Well to do that we need to understand Bayesian probabilities in order to calculate the likelihood that Musk has COVID-19.

Bayes' theorem (or Bayes' rule) describes the probability of something being true based on whether or not something else is true. Let's clarify by way of an example or two.

Example of Bayes' Theorem

Consider that we have an opaque bag containing four balls that are identical except for their colors. Imagine, for example, that they are four pool balls or snooker balls. Now let's say that two of the balls are red and the other two are blue.

If we put our hand in the bag and extract a ball without looking, what's the probability that it's red? That would be 0.5 or 50%. This is a result from a universe that doesn't care whether or not Bayes ever existed.

Now if we don't return that ball to the bag and we take out another ball, what's the probability that it is red? Now it's 0.33 or 33%.

Because we know that there's a red ball missing, we know there's only a 1 in 3 chance of taking out a red ball from the bag.

But if we don't know the color of the first ball

Now let's consider the case that we had taken out a ball from the bag and discarded it without looking at it. So now we have no idea whether the bag contains one red and two blues or one blue and two reds.

So now when we take a second ball out of the bag the chance of a red is still 0.5 or 50%. Schr?dinger might say that the bag contains both 2 reds / 1 blue and 2 blues / 1 red at the same time. Then again he might just say that amateur physicists should keep their mouths shut because they don't understand quantum mechanics and his thought experiment. Or he might just say nothing because he's dead.

Anyway, back in the Newtonian world, the differences between the probabilities are explained by Bayes' theorem which, in it's simplest form in plain language, says that if event X happened then the probability that Y will happen is Z?%, whereas if we don't know whether event X happened then the probability that Y will happen is Z?%.

To use a more mathematically precise definition: Bayes’ theorem relates conditional probabilities whereby if A and B denote two events, P(A|B) denotes the conditional probability of A occurring, given that B occurs.

Multiple prior events

There might be more than one prior event, rather than just X or B in the definitions above. So we often refer to the "prior" conditions.

Finally let's end the math lesson by talking about Bernoulli random variables. A Bernoulli random variable can take only two values, 1 or 0. It takes on a 1 if an experiment with probability p resulted in success and a 0 otherwise. Examples of this type of experiment include flipping a coin, the gender of a newborn (well of humans, not of fungi with 36,000 different genders), a random binary digit, and whether someone likes Marmite.

Let's say that X is a Bernoulli random variable. Then we write this in mathematical language as X ~ Ber(p):

Probability of success (known as the probability mass function, or PMF): P(X=1) = p

Probability of failure: P(X = 0) = (1 ? p)

Expectation: E[X] = p

Variance: Var(X) = p(1 ? p)

So before you doze off, let's apply the math to Musk's test results.

Assumptions to analyze Musk's test results

Let's start by assuming that X is a Bernoulli random variable. If Musk has COVID-19 then X is 1 or true, and if he doesn't X is 0 or false.

Let's also assume that T is the test result, and T=1 if the result is positive and 0 if the result is negative. So we actually have T?=1, T?=0, T?=0, T?=1

Now we know sensitivity and specificity (the concepts were described in part 1 of this article). Conveniently they correspond to the likelihood probabilities in Bayes' theorem.

Sensitivity is the probability that the test comes back positive given that the patient has the virus. This is called the true positive (TP) rate, or P of T = 1 given that X = 1, which we write as:

P(T=1 | X=1)

Specificity is the probability that the test comes back negative given that the patient does not have the virus - we call it the true negative (TN) rate, or P of T equals 0 given X = 0, written as:

P(T=0 | X=0)

Applying to multiple tests

Given that Musk took a series of tests, another quantity that we're also interested in is the prior for each test. This is the probability that Musk has COVID-19 without any knowledge from any test. For the first test that he takes, a reasonable assumption would be that this is equal to the overall prevalence of the virus among the population. We don't know this number, and we're not sure how accurate our estimate is either. We covered this in the earlier part 1 article.

So let's think of this as a parameter to the problem and check how the answer varies as this changes. Let's start with 0.6% - so P(X=1) is 0.6%

Now what we're trying to find is P of X given T, or P(X|T). So here is where Bayes' theorem comes into play. We're given P of T given X, or P(T|X), and we're given P of X, or P(X) - the likelihood and prior respectively. We want to find P of X given T, or P(X|T) - which is also called the posterior.

Bayes' theorem tells us that this is equal to P of T given X times P of X divided by P of T:

P(X|T) = P(T|X) * P(X) / P(T)

Now our problem is a little more complicated because there are 4 tests - we have T?=1, T?=0, T?=0, T?=1. So our strategy to solve the problem:

1 - calculate P of X given T?=1 using Bayes' theorem

2 - calculate P of X given T?=0 and T?=1

Now although P of X given T?, P(X|T?), is the posterior from the first step, it is also the prior in the second step. Hence on the second step we apply Bayes' theorem again but using the new prior.

Then this posterior becomes prior for step 3 which gives us posterior 3 and we apply Bayes' theorems again using this new prior, and then for step 4 posterior 3 becomes prior 4. And we use this to get posterior number 4.

So to calculate the first posterior, for the first test, the numerator is the sensitivity and the prior. For the denominator we need to find the marginal probability that T?=1. So we must sum over all the possible values of X which means when X=1 and when X=0.

When X=1 this is the same as the numerator.

When X=0 we have the probability that the test is positive given that Musk does not have the virus.

In this case we're not given the probability but we can calculate it from the specificity - we know that the probability of the test being negative given that Musk does not have the virus is 99.5% i.e. the probability of a true negative. Therefore the probability that the test is positive, given that Elon does not have the virus, is just 1 - 0.995 = 0.005 (the probability of a false positive). (Referring back to the graphics in part 1 of this article, we're eliminating the green upper-right rectangle from the right hand side to leave behind the purple lower-right rectangle.)

Similarly the prior probability of not having the virus is just 1 - 0.6% (prevalence).

The result is 50%.

For the second test

Now for step 2 we calculate the posterior given the first two tests.

Remember that the step 1 posterior is the step 2 prior. And for simplicity we can eliminate T? and just use P?(X).

Then it's just a matter of applying Bayes' theorem again. A difference here, though, is that T?=0.

So in the numerator we have P of T = 0 given X = 1, P(T=0|X=1) - that is the probability of a false negative. Since the probability of a TP is 84%, this value is 1-0.84 = 16%.

For the third and fourth tests

Repeating for step 3 this gives us 2.6%, a reduction due to T?=0 (test result is negative).

And repeating again for step 4 we get 81.5%, an increase due to T?=1 (test result is positive).

The final result and our conclusion

And that's the final result. With a True, False, False, True set of test results in that order, the likelihood that Musk is COVID-19 positive is 81.5%, subject to the other assumptions around test accuracy, that we've used an accurate estimate of population-wide infection prevalence, and that population prevalence is an adequate indicator for Musk.

So yes, it is more likely than not that Musk has it.


Further reading:

Fungi with 36,000 genders:




要查看或添加评论,请登录

Robert Waters的更多文章

社区洞察

其他会员也浏览了