What did the New York Times article get wrong about NIPT? Part One: Demystify the Confusion Matrix

What did the New York Times article get wrong about NIPT? Part One: Demystify the Confusion Matrix

On New Year’s Day 2022, NYT published an article about NIPT (NonInvasive Prenatal Test) titled “When They Warn of Rare Disorders, These Prenatal Tests Are Usually Wrong”. Sounds very scary, right? But my reaction is, “So what?” If you think this outrageous, I will add, “Isn’t that expected or even desired?” The news article pointed out that NIPTs will report a positive finding in many rare disorders while the probability of the fetus actually having the disorder is very low; as low as 7% in Prader-Willi and Angelman syndromes. I agreed with the author that many things in the whole process are worrisome, including inaccurate wording, lack of genetic counseling, inadequate follow-up with confirmative testing, etc. However, I am NOT worried about the low positive predictive value (PPV) itself due to the very low prevalence of the disease and the nature of screening. Why? Because this is a classic case of what people often called “The Bayesian Trap”.

To fully understand the Bayesian Trap, let’s consider this simple case. I have a friend, Harry, who is shy and introverted, likes to read books, and has a passion for detail. Do you think Harry is a farmer or a librarian? We all tend to think Harry is a librarian, but since this is a “trap”, you know the answer should be a farmer. But why?

To explain this, let’s stare at the famous error matrix, aka “confusion matrix” one more time. You probably see it many times and still get confused when people throw terms like sensitivity, specificity, accuracy, positive predictive value (PPV) and negative predictive value (NPV).

Confusion Matrix

In this setting, the “actual condition” is on the left, and the “predicted condition” is at the top.

In Harry's case, the test is whether a person is “shy and introverted, likes to read books, and has a passion for detail.” If the answer is yes, he is a librarian; if not, he is a farmer. Let’s call it the "bookworm test".

Sensitivity = When you really have a positive condition, how often does the test predict that you have it = TP/(TP+FN)

Let’s say the test sensitivity is 99%. That means, among all librarians, 99% pass the bookworm test.

Specificity = when you don’t have the positive condition, how often the test predicts that you don’t have it = TN /(FP+TN)

Let’s say the test specificity is 98%. That means among all farmers, 98% do not pass the bookworm test, but 2% do. These 2% of the farmers will have a positive result but not a librarian, i.e., false positives (FP).

Note that both of these parameters are calculated horizontally in the table. They can be determined purely by the test’s performance and, therefore, not affected by the ratio of the total number of librarians vs. farmers in the country.

Now here is a different question: When the test gives a positive prediction, what is the probability that this person really has the condition?

In this case, the bookworm test predicts that Harry is a librarian. What is the chance that he is indeed a librarian? This probability is called positive predictive value (PPV). It is calculated vertically in the table. Because you are now comparing the numbers of real positive people (# of librarians) and real negative people (# of farmers), their ratio in the population becomes important.

PPV = Among all positive predictions, how likely it is true positive = TP/(TP+FP)

Assume there are 100K farmers and 100 librarians in the country. An easy way to think about PPV is: For every 1000 people, the test will create 2%, that is 20 false positives, while in 1000 people, only 1 is a real librarian. When you have a positive test result, is it more likely from the 20 or the 1? You can estimate that the probability of a real librarian is roughly 1 in 21, slightly below 5%. So, Harry is much more likely to be a farmer than a librarian, even with a positive bookworm test result.

If you really want to calculate PPV precisely, here are the steps.

Among all librarians, 99 are bookworms, and 1 is not (SEN= 99%). Among all farmers, 100K*98%=98,000 are not bookworms, and 2,000 are (SPC=98%).

So, 99+2000 = 2099 people with positive bookworm test results. Among them, 99/2099 = 4.7% will be true positives, i.e., librarians.

Negative predictive value (NPV) is defined similarly:

NPV = Among all negative predictions, how likely it is true negative = TN/(TN+FN)

Now, hopefully, you are not confused about these four very important measurements for a test. But naturally, you will have another question:

If the test is so good (SEN = 99%, SPC = 98%), why is a positive test result still weak in determining if an individual is a real positive? Is there any value in this very accurate test?

Oh, yes, there is. Think about the baseline probability of being a librarian: it is 100/(100+100K) = 0.099%, less than 0.1%. That means a random individual is only 0.1% likely to be a librarian. Now, with a positive bookworm result, the probability is 4.7%, 47-fold of the background. Not bad at all.

From this perspective, you can use Bayes’ Rules to calculate PPV. It can be expressed as a posterior probability (after the test):

Given a positive bookworm test (BW), how likely is Harry to be a librarian (Lib)? We can apply the Bayes’ rule as follows:

Bayes' Rule and Its Application in the Librarian Bookworm Test

My favorite way of interpreting Bayesian statistics is that you constantly improve your beliefs with new data. In this case, without any test, everyone's probability of being a librarian is about 0.1% (Prior). Now, based on the data, we developed a bookworm test. With a positive test result, we improved our prediction of Harry being a librarian to 4.7%, a 47-fold increase over the prior knowledge. Isn't it impressive for a simple test? By the way, you can calculate all 3 components in Baye's rule formula above, with some algebra gymnastics, to get the definition of PPV precisely.

To be continued in Part Two: What did the New York Times article get wrong about NIPT? – Clinical screening and testing.

Reference:

https://www.nytimes.com/2022/01/01/upshot/pregnancy-birth-genetic-testing.html

https://en.wikipedia.org/wiki/Confusion_matrix

https://www.nejm.org/doi/full/10.1056/NEJMoa2310336

https://www.nejm.org/doi/full/10.1056/NEJMoa2304714

Shan Yang

Expert in Life Sciences, Oncology, Diagnostics, Data, Marketing Strategy | Wharton MBA

5 个月

Thanks everyone for liking, resharing, and commenting on part one of the story. I finally finished part two, which used the principles I illustrated here to apply to NIPT testing. I also talked about the practical considerations of selecting the threshold and some thoughts about the future picture of NIPT and other non-invasive screening methods. You can read part two here: https://www.dhirubhai.net/posts/shanyang_nipt-mced-multimodality-activity-7211410034071674882-Js8K?utm_source=share&utm_medium=member_desktop

Geoff Nilsen

VP Omics at NashBio

7 个月

Fantastic Shan Yang! Like you, I expect the NYT to do better. Not as nicely stand-alone as your article but there's a fun treatment of this topic as well in the Cartoon Guide to Statistics (https://archive.org/details/TheCartoonGuideToStatistics/page/n53/mode/2up)

Zhanzhi H.

Tireless Advocate for Healthy Babies for All Families via Newborn Screening

8 个月

Can't wait to read the second part! That misleading article had done so much harm to the field of not just NIPT but any population based screenings, including newborn screening. Now if I get asked again about this article, I know where to point them to ?? Great job, Shan!

Jason Gottwals

Genomics | Diagnostics | Oncology | Therapeutics

8 个月

It often feels like 95% of the stories I read on “science” in the mainstream press are either mischaracterizations, hyperbole, or just flat out wrong. Excellent work Shan Yang.

要查看或添加评论,请登录

社区洞察

其他会员也浏览了