登录查看更多内容

Garbage In, Garbage Out?

Mark Braunstein

发布日期: 2024年8月8日

How Long Will A.I.’s ‘Slop’ Era Last? a recent NY Times opinion piece by best-selling science writer and essayist, David Wallace-Wells, inspired me to synthesize some thoughts from several of my prior posts. ?

Since I believe that only subscribers can read the article, I’ll do my best to provide a sufficient overview. It begins with a discussion of the financial community’s increasing scepticism of “large language models like ChatGPT, Gemini or Claude, each of which were trained on gobsmackingly large quantities of text to better simulate interaction with humans and bring them closer to approximations of humanlike thinking, at least in theory.”

Instead, it says, we may get what he calls “A.I. slop” – “often uncanny, frequently misleading material, now flooding web browsers and social-media platforms like spam in old inboxes.”?

It goes on to say “Peer away from those chatbots and you can see a very different story, with different robot protagonists: machine-learning tools trained much more narrowly and focused less on producing a conversational, natural-language interface than on processing data dumps much more efficiently than human minds ever could. These products are less eerie, which means they have generated little existential angst. They are also — for now, at least — much more reliable and productive.”

Not the way I would say it, but definitely my view. In fact, to illustrate this, he uses Google DeepMind's AlphaFold, long my ‘go to example’ of the best of AI.

Molecular shape is the key to chemistry, so protein structure is, in many respects, the key to life and medicine. Because of this, for decades determining protein structure experimentally has been a focus of intensive scientific research using expensive, time consuming (over years) techniques such as nuclear magnetic resonance and X-ray crystallography.

In his acceptance speech for the 1972 Nobel Prize in Chemistry, Christian Anfinsen postulated?that, in theory, a protein’s?amino acid sequence?should fully determine its structure. In 1994, to stimulate progress on solving this incredibly important problem, Professors John Moult and Krzysztof Fidelis founded Critical Assessment of Structure Prediction (CASP), a biennial blind competition whose “goal is to help advance the methods of identifying protein structure from sequence”.?

CASP’s measure of success is the?Global Distance Test (GDT)?that ranges from 0-100. It’s essentially the percentage of amino acid residues (beads in the protein chain) within a threshold distance from their correct position in the predicted structure. According to?Professor Moult, a GDT score of around 90 is informally considered to be competitive with results obtained from experimental methods. Google DeepMind (based in London) set about to achieve this goal through its AlphaFold project.

You can see from the graphic that there was little progress over the decade from 2006-2016 and results were far from the objective of 90. However, in 2018 at CASP13, AlphaFold was entered for the first time, and it clearly exceeded the prior efforts.? Two years later, at CASP 14, AlphaFold 2 was entered using according to DeepMind “new deep learning architectures we’ve developed … enabling us to achieve unparalleled levels of accuracy.”

Prior to the first entry of AlphaFold in 2018 the CASP results were quite static and well below Professor Moult's GDT goal of 90 or above.

The CASP14 press release states that “AlphaFold produced models for about two-thirds of the CASP14 target proteins with global distance test scores above 90 out of 100“. It then quotes “Professor Dame Janet Thornton, Director Emeritus of EMBL's European Bioinformatics Institute), and not affiliated with CASP or DeepMind", as saying: “One of biology’s biggest mysteries is how proteins fold to create exquisitely unique three-dimensional structures. Every living thing – from the smallest bacteria to plants, animals and humans – is defined and powered by the proteins that help it function at the molecular level.?So far, this mystery remained unsolved, and determining a single protein structure often required years of experimental effort. It’s tremendous to see the triumph of human curiosity, endeavour and intelligence in solving this problem. A better understanding of protein structures and the ability to predict them using a computer means a better understanding of life, evolution and, of course, human health and disease.”

Of critical importance to our discussion, according to DeepMind “the model was trained on publicly available data consisting of ~170,000 protein structures from the protein data bank together with large databases containing protein sequences of unknown structure.”

This parallels the argument I’ve been trying to make with respect to the training and design of healthcare AI models.? I’ve previously focused on the potential to train models using SNOMED-CT “because it contains a vast amount of information about medical concepts and, importantly, their relationships.? This accurately conveys meaning - accumulated over decades and carefully curated by human domain experts – to the model and embedding is the way of representing it in a computable form.”

To "accelerate scientific research" DeepMind has posted over 200 million protein structure predictions.? This example, Influenza virus NS1A binding protein, is a key to the body’s influenza virus against host innate immunity, its first line of defense.?

Phillip Alcock 6 天前

Slow Intelligence, Part I

Rajesh Kasturirangan 1 年前

Why IA is forgetting things?

Dru HERO (Asdrúbal Hernández-Romero) 11 个月前

Illustrating the role AlphaFold is already playing in research, the graphic is from UniProt, a widely?supported collaboration between the European Bioinformatics Institute (EMBL-EBI), the SIB Swiss Institute of Bioinformatics and the Protein Information Resource (PIR). Note that this model shows a?predicted local distance difference test?(pLDDT) score between 0 and 100 for each area of the protein structure prediction. ?GDT measures the overall accuracy of a predicted protein structure compared to a reference structure while pLDDT assesses the local confidence the model has for each residue in the predicted protein structure.? You can see that most of the model has “Very High” confidence above pLDDT of 90.

To further illustrate the important roles of health datasets and standards in AI, I return to Abstractive Health , a company I follow and have previously posted about.? It cleverly uses standards and datasets to improve the performance of its summarization tool.? As shown, an Abstractive Health summary of a patient’s hospital care, generated upon their discharge, consists of the three traditional parts: 1) History of the Present Illness (HPI), 2) Treatment Course, a summary of their care in the hospital, and 3) Follow-up Care suggested to the physician(s) who will take over the patient's treatment once they go home.?

Abstractive Health’s hospital discharge summary consists of three parts, each of which benefits from the use of specialized health data standards and/or datasets.

Each section presents its own challenges and benefits from the use of specialized, health and situation specific standards or datasets.? The HPI is now generated by LLaMA-3 developed by Meta for summarization of the news and trained on over 300,000 CNN/Daily Mail news articles & XSum's professionally written one-sentence summaries of them.

Unsurprisingly, the Wallace-Wells article discusses the well-known tendency of LLMs to ‘hallucinate’. Constrained Beam Search is a technique to exert more control over the output of text generation by LLMs, especially when we know exactly what we want inside the generated text. My prior discussion of Abstractive Health noted its use of SNOMED CT with Constrained Beam Search to find and eliminate hallucinations by assuring that medical concepts found in the output of their clinical summarization tool were found in the patient’s chart.

Here, the initial summary (done by Meta's BART at the time) contains diagnoses of altered mental status and hypotension (in red) that are not found in the patient's admission note. Constrained Beam Search, guided by SNOMED CT to identify clinical terms (including synonyms), replaces them with mitral regurgitation and diabetes (in gold and green) so the summary accurately reflects the patient's clinical situation.

Constrained Beam Search, guided by SNOMED CT, corrects the initial summary (done by Meta's BART at the time) by replacing altered mental status and hypotension with mitral regurgitation and diabetes.

To help it determine what’s important for the Treatment Course section, Google's BERT, a family of Natural Language Processing tools, was trained on discharge summaries of 6,600 clinically complex neurology inpatients for patients from the hospital where the tool was initially tested.?

BERT also identifies needed follow-on care.? For this, it was trained on the CLInical Follow-uP (CLIP) dataset of clinical action items derived from MIMIC-III, a dataset of over 40,000 critical care patients maintained at MIT.

I’ve also previously posted about GenHealth.ai and its unique (so far as I know) Large Medical Model trained on structured data (e.g. not text, but coded data) from the claims and medical records of 140 million US patients. According to the company, as illustrated here, instead of being trained on and predicting the next word, training of their model and its output are in terms of healthcare events.

GenHealth's large medical model is trained on and outputs medical events.

Among many other use cases, the company says running the model multiple times can show, as illustrated here, probabilistic paths into the future that can be aggregated to present a patient's predicted clinical course.

GenHealth combines multiple (perhaps 40-50) runs of the model to produce a stable prediction of a patient’s likely future clinical course. Here virtually all runs predict Parkinson’s Disease developing over the next year.

We will have to see how well that works but, in my view, AI designed and trained properly will be transformative for medicine and probably sooner than many expect.

So, for now, my bottom-line advice is to pay far more attention to the design and training of these models. Of course, this is typically 'invisible' so today the focus is far too often only on their output.

Ming Zhun Ku

Data Scientist | StudIEAust | Technology & Health Industry

1 个月

Insightful!

Nancy Lopez

RCM Director || Business Relationships || Enabling Healthcare Innovation

1 个月

A commendable initiative! The intersection of healthcare and AI offers immense opportunities for improvement. Your emphasis on model design and accuracy is indeed crucial in creating reliable healthcare solutions. Looking forward to your synthesized insights.

查看更多评论

要查看或添加评论，请登录

查看全部

Garbage In, Garbage Out?

Mark Braunstein

领英推荐

更多精彩文章

社区洞察

其他会员也浏览了

Museletter the Eighty-first

The dawn of the thinking machines

In Praise of the Uncool: The Tech Classics Still Rocking our Business World

Can Intelligence Be “Artificial”? Philosophical Food for Thought

The New Pantheon: On Molding the Titans of the Digital Age

AGI & The Vulnerable World Hypothesis (part 1): GPT-4 Doesn’t Understand

Random Thoughts About AI

Heuristic Processing and Persistence of Biases in GPT-4? Towards Roko's Basilisk. Part One: Newcomb's Paradox.

?Tech Readings of the Week #16

You can call me 'AI'

领英推荐

AI for Clinical Trials

2024年9月25日

AI for Diabetic Patients

2024年9月12日

From Little Red Wagons to AI: The Long Journey to Care Coordination

2024年9月5日

How Does AI Pay Attention?

2024年8月28日

How Do AI Models Actually Work?

2024年7月22日

How a Health Data Standard Helps Generative AI Be More Accurate and Transparent

2024年6月14日

Can AI Unburden Physicians?

2024年1月16日

Can 'Dumb Humans' Make AI Safe?

2023年12月26日

Dr AI?

2023年12月12日

Say Hi to Pi!

2023年9月20日