Advances in AI and Medicine
Nashya Haider, PhD., MPhil.
Founder and Director | Digital Transformation, Business Development, Medical Communications
Since the first landmark demonstrations of medical AI algorithms that can detect a disease from medical images at the level of experts, the landscape of medical AI has matured considerably.?
The deployment of medical AI systems in routine clinical care presents a critical yet largely unfulfilled opportunity, as the medical AI community navigates the complex ethical, technical, and human-centered challenges required for safe and effective translation.
In this review, we summarize significant advances and highlight overarching trends, providing a concise overview of the state of medical AI.?
Recent progress in the deployment of AI algorithms in medicine
?Although AI systems have repeatedly been shown to be successful in a wide variety of retrospective medical studies, relatively few AI tools have been translated into medical practice. Critics point out that AI systems may in practice be less helpful than retrospective data would suggest; systems may be too slow or complicated to be useful in natural medical settings, or unforeseen complications may arise from how humans and AIs interact.
Moreover, retrospective in silico datasets undergo extensive filtering and cleaning, which may make them less representative of real-world medical practice. Randomized controlled trials (RCTs) and prospective studies can bridge this gap between theory and practice, more rigorously demonstrating that AI models can have a quantifiable, positive impact when deployed in natural healthcare settings. Recently, RCTs have tested the usefulness of AI systems in healthcare.
In addition to accuracy, a variety of other metrics have been used to assess the utility of AI, providing a holistic view of its impact on medical systems. For example, an RCT evaluating an AI system for managing insulin doses measured the number of time patients spent within the target glucose range; a study that assessed a monitoring system for intraoperative hypotension tracked the average duration of hypotension episodes, while a system that flagged cases of intracranial hemorrhage for human review was judged by its reduction of turnaround time. Recent guidelines, such as AI-specific extensions to the SPIRIT and CONSORT guidelines and upcoming guidelines such as STARD-AI, may help standardize medical AI reporting, including clinical trials protocols and results, making it easier for the community to share findings and rigorously investigate the usefulness of medical AI.
In recent years, some AI tools have moved past testing to deployment, winning administrative support and clearing regulatory hurdles. The Center for Medicare and Medicaid Services, which approves public insurance reimbursement costs, has facilitated the adoption of AI in clinical settings by allowing reimbursement for the use of two specific AI systems for medical image diagnosis.
Furthermore, a 2020 study found that the US Food and Drug Administration (FDA) is approving AI, particularly machine learning (ML; a type of AI), products at an accelerating rate. These advances primarily take the form of FDA clearances, which require products to meet a lower regulatory bar than full-fledged approvals do, but they are nonetheless clearing a path for AI/ML systems to be used in real clinical settings. It is essential to point out that the datasets used for these regulatory clearances are often made up of retrospective, single-institution data that are mostly unpublished and considered proprietary. To build trust in medical AI systems, more robust standards for reporting transparency and validation will be required, including demonstrations of impact on clinical outcomes.
Deep learning for the interpretation of medical images
In recent years, deep learning, in which neural networks learn patterns directly from raw data, has achieved remarkable success in image classification. Medical AI research has consequently blossomed in specialties that rely heavily on the interpretation of images, such as radiology, pathology, gastroenterology, and ophthalmology. ?
AI systems have achieved considerable improvements in accuracy for radiology tasks, including mammography interpretation, cardiac function assessment and lung cancer screening, tackling not only diagnosis but also risk prediction and treatment.
For instance, one AI system was trained to estimate 3-year lung cancer risk from radiologists’ computed tomography (CT) readings and other clinical information. These predictions could then be used to schedule follow-up CT scans for patients with cancer, augmenting current screening guidelines. Validation of such systems on multiple clinical sites and an increasing number of prospective evaluations have brought AI closer to being deployed and making a practical impact in the field of radiology.
Deep learning models have been applied widely in ophthalmology, making important advances toward deployment. Besides quantifying model performance, studies have investigated the human impact of such models on health systems.
For example, one study examined how an AI system for eye disease screening affected patient experience and medical workflows, using human observation and interviews. Other studies have looked at the financial impact of AI in the ophthalmology setting, finding that semi-automated or fully automated AI screening might provide cost savings in specific contexts, such as the detection of diabetic retinopathy.
?Opportunities for the development of AI algorithms
Medical AI studies often follow a familiar pattern, tackling an image classification problem, using supervised learning on labeled data to train an AI system, and then evaluating the system by comparing it against human experts. Although such studies have achieved noteworthy advances, we present three other promising avenues of research that break from this mold. First, we address non-image data sources such as text, chemical, and genomic sequences, providing rich medical insights. Second, we discuss problem formulations that go beyond supervised learning, obtaining insights from unlabeled or otherwise imperfect data through paradigms such as unsupervised or semi-supervised learning.
Finally, we look at AI systems that collaborate with humans instead of competing against them, which is a path toward achieving better performance than either AI or humans alone.
Medical data beyond images
Moving beyond image classification, deep learning models can learn from many kinds of input data, including numbers, text, or even combinations of input types. Recent work has drawn on various rich data sources involving molecular information, natural language, medical signals such as electroencephalogram (EEG) data and multimodal data. The following is a summary of applications using these data sources.
AI has enabled recent advances in biochemistry, improving understanding of the structure and behavior of biomolecules. The work of Senior et al. on AlphaFold represented a breakthrough in the key task of protein folding, which involves predicting the 3D structure of a protein from its chemical sequence.
Improvements in protein structure prediction can provide mechanistic insight into a range of phenomena, such as drug-protein interactions or the effects of mutations. Alley et al. also made strides in protein analysis, creating statistical summaries that capture critical properties of proteins and help neural networks learn with less data. By using such summaries rather than raw chemical sequences, models for downstream tasks like predicting molecular function may obtain high performance with much less labeled data.
Furthermore, AI is now beginning to accelerate the process of drug discovery. Deep learning models for molecular analysis have been shown to accelerate the discovery of novel drugs by reducing the need for slower, more costly physical experiments. Such models have proven useful for predicting relevant physical properties such as the bioactivity or toxicity of potential drugs. One study used AI to identify a drug that was subsequently proven to be effective at fighting antibiotic-resistant bacteria in experimental models. Another drug designed by AI was shown to inhibit DDR1 (a receptor implicated in several diseases, including fibrosis) in experimental models; remarkably, it was discovered in only 21 days and experimentally tested in 46 days, dramatically accelerating a process that usually takes several years. Importantly, deep learning models can select effective molecules that differ from existing drugs in clinically meaningful ways, thereby opening novel pathways for treatment and providing new tools in the fight against drug-resistant pathogens.
Recent research has exploited the availability of large medical text datasets for healthcare-related natural language processing tasks, taking advantage of technical advances like transformers and contextual word embeddings (two technologies that help models consider surrounding context when interpreting each part of a text). One study presented BioBERT, a model trained on a large corpus of medical texts that surpassed prior state-of-the-art performance on natural language tasks like answering biomedical questions. Such models have been used to improve performance on tasks such as learning from biomedical literature which drugs are known to interact with each other or automatically labeling radiology reports.
Thus, advances in natural language processing have opened up a wealth of new datasets and AI opportunities, although major limitations still exist due to the difficulty of extracting information from long text sequences.
Additionally, ML methods have been used to predict outcomes from medical signal data, such as EEG, electrocardiogram and audio data. For example, ML applied to EEG signals from clinically unresponsive patients with brain injuries allowed the detection of brain activity, a predictor of eventual recovery. Moreover, AI’s ability to directly transform brain waves to speech or text has remarkable potential value for patients with aphasia or locked-in syndrome who have had strokes. Medical signal data can also be passively collected outside a clinical setting in the real world by using wearable sensors such as smartwatches that enable remote health monitoring.
AI setups beyond supervised learning
In addition to using novel data sources, recent studies have tried unconventional problem formulations. Conventionally, datasets derive inputs and labels from real data, and models like neural networks are used to learn functions mapping from inputs to labels. However, because labeling can be expensive and time-consuming, datasets containing both accurate inputs and labels are often difficult to obtain and are frequently reused across many studies. Other paradigms, including unsupervised learning (specifically self-supervised learning), semi-supervised learning, causal inference, and reinforcement learning, have been used to tackle problems in which data are unlabeled or otherwise noisy. These advances have pushed the boundary of medical AI, enhancing existing technologies and deepening the understanding of diseases.
Unsupervised learning, which involves learning from data without any labels, has provided actionable insights, allowing models to find novel patterns and categories rather than being limited to existing labels, as in the supervised paradigm. For example, clustering algorithms, which organize unlabeled data points by grouping similar data points together, have been applied to conditions such as sepsis, breast cancer, and endometriosis, identifying clinically meaningful patient subgroups. These categories can reveal novel patterns in disease manifestation that may eventually help to determine diagnosis, prognosis and treatment.
Other formulations rely on extracting information out of noisy or otherwise imperfect data, dramatically reducing the cost of data collection. As an example, Campanella et al. trained a weakly supervised model to diagnose several types of cancer from whole-slide images, using only the final diagnoses as labels and skipping the pixel-wise annotation expected in a supervised learning setup.
With this approach, they were able to achieve excellent classification results, even with annotation costs lowered30. Unconventional problem formulations have also been used to enhance and reconstruct images. For instance, when creating a model to enhance spatial detail in low-quality magnetic resonance imaging (MRI) images, Masutani et al. synthetically generated input data; they took high-quality MRI images, randomly added noise and then trained a convolutional neural network (a type of neural network commonly used for image data) to recover the original high-quality MRI images from their simulated ‘low-quality’ inputs. Such formulations allow researchers to leverage large datasets, despite their imperfections, to train high-performing models.
领英推荐
Setups beyond human versus AI
Although the majority of studies have focused on a head-to-head comparison of AI with humans, real-life medical practice is more likely to involve human-in-the-loop setups, where in humans actively collaborate with AI systems.
Thus, recent studies have begun to explore such collaborative setups between AI and humans. These setups typically feature humans receiving assistance from AI, although occasionally AI and humans work separately and have their predictions averaged or otherwise combined afterward. Multiple studies on a variety of tasks have shown that clinical experts and AI in combination achieve better performance than experts alone.
For example, Sim et al. found that AI-assisted clinical experts surpassed both humans and AI alone when detecting malignant nodules on chest radiographs. The usefulness of human–AI collaboration will likely depend on the specifics of the task and the clinical context.
There are still open questions about exactly how AI assistance affects human performance. Furthermore, some clinicians may benefit more from AI assistance than others; studies suggest that less experienced clinicians, such as trainees, benefit more from AI input than their more experienced peers.
Technical considerations also play a major role in determining the effectiveness of AI assistance. Predictably, the accuracy of AI advice can affect its usefulness, so incorrect predictions have been found to hinder clinician performance even if correct predictions prove helpful. Additionally, AI predictions can be communicated in multiple ways, appearing, for example, as probabilities, text recommendations or images edited to highlight areas of interest. The presentation format of AI assistance has been shown to affect its helpfulness to human users, so future work on optimizing medical AI assistance may draw on existing research on human–computer interactions.
Challenges for the future of the field
Despite striking advances, the field of medical AI faces major technical challenges, particularly in terms of building user trust in AI systems and composing training datasets. Questions also remain about the regulation of AI in medicine and the ways in which AI may shift and create responsibilities throughout the healthcare system, affecting researchers, physicians and patients alike. Finally, there exist important ethical concerns about data use and equity in medical AI.
Implementation challenges?
Medical AI data often raise specific, practical challenges. Although it is hoped that AI will reduce medical costs, the devices required to obtain the inputs for AI systems can be prohibitively expensive. Specifically, the equipment needed to capture images of whole slides is costly and is therefore unavailable in many health systems, impeding both data collection for and deployment of AI systems for pathology.
Additional concerns arise from large image sizes, because the amount of memory required by a neural network can increase with both the complexity of the model and the number of pixels in the input. As a result, many medical images, especially whole-slide images, which can easily contain billions of pixels each, are too large to fit into the average neural network. There exist many ways to address this issue. Pictures may be resized at the expense of fine details, or they may be split into multiple small patches, although this will hinder the system’s ability to draw connections between different areas of the image. In other cases, humans may identify a smaller region of interest, such as part of a slide image that contains a tumor, and crop the image before feeding it into an AI system, though this intervention adds a manual step into what might otherwise be a fully automated workflow.
Some studies use large custom models that can accept whole medical images, but running these models can require expensive hardware with more memory. Thus, systems for medical image classification often involve trade-offs to make inputs compatible with neural networks.
Another issue affecting images as well as many other types of medical data is a shortage of the labels required for supervised learning. Labels are often hand-assigned by medical experts, but this approach can prove difficult due to dataset size, time constraints or shortage of expertise. In other cases, labels can be provided by non-expert humans, for example, via crowdsourcing. However, such labels may be less accurate, and crowdsourced labeling projects face complications associated with privacy, as the data must be shared with many labelers. Labels can also be applied by other AI models, as in some weak-supervision setups95, but these labels again carry the risk of noise. Currently, the difficulty of obtaining quality labels is a major blockade for supervised learning projects, driving interest in platforms that make labeling more efficient and weakly supervised and unsupervised setups that require less labeling effort.
Problems also arise when technological factors lead to bias in datasets. For example, single-source bias occurs when a single system generates an entire dataset, as when all the images in a collection come from a single camera with fixed settings. Models that exhibit single-source bias may underperform on inputs collected from other sources. To improve generalization, models can undergo site-specific training to adapt to the specific quirks of each place they are deployed, and they can also be trained and validated on datasets collected from different sources. However, the latter approach must be undertaken with care, especially when the distribution of labels differs dramatically across datasets. For instance, if a model is trained on datasets from two institutions, one containing only positive cases and one containing only negative cases, then it can achieve high performance through spurious ‘shortcuts’ without learning about the relevant pathology. An image classification model might thus base its predictions entirely on the differences between the two institutions’ cameras; such a model would likely learn nothing about the underlying disease and fail to generalize elsewhere. We therefore encourage researchers to be wary of technological bias, even when using data from diverse sources.
?A variety of qualities are desired for an AI system to garner user trust. For example, it is useful for AI systems to be reliable, convenient to use and easy to integrate into clinical workflows. AI systems can be packaged with easy-to-read instructions, explaining how and when they should be used; it may be helpful for such user manuals to be standardized across systems.
Explainability is another key aspect of earning trust, as it is?easier to buy into an AI system’s predictions when the system can explain how it reached its conclusions. Because many AI systems currently function as uninterpretable ‘black boxes’, explaining their predictions poses a serious technical challenge. Some methods for explaining AI predictions exist, such as saliency methods that highlight regions of an image that most contribute to a prediction of a disease by a model. However, these methods may not be reliable, and further research is necessary to interpret AI decision-making processes, quantify their reliability and convey those interpretations clearly to human audiences. In addition to building trust among users, enhanced explainability will allow developers to check models more thoroughly for errors and verify to what degree AI decision-making mirrors expert human approaches.
Moreover, when medical AI models achieve novel insights that go beyond current human knowledge, improved explainability may help researchers grasp those new insights and thus better understand the biological mechanisms behind the disease.
Perhaps the most obvious component of trustworthiness is accuracy because users are unlikely to trust a model that has not been rigorously shown to give correct predictions. Additionally, trustworthy AI studies should be reproducible, so that repeatedly training a model with a given dataset and protocol produces consistent results. Studies should also be replicable so that models perform consistently even when trained with different samples of data. Unfortunately, proving the reproducibility and applicability of AI studies raises unique challenges. Datasets, code, and trained models are often not released publicly, making it difficult for the wider AI community to independently verify and build on previous results.
Accountability Recent work highlights regulatory issues regarding the deployment of AI models for healthcare. Beyond accuracy, regulators can look at a variety of criteria to evaluate models. For example, they may require validation studies showing that AI systems are robust and generalizable across clinical settings and patient populations and ensure that systems protect patient privacy. Additionally, because the usefulness of AI systems can depend heavily on how humans provide input and interpret output, regulators may require testing of human factors and adequate training for the human users of medical AI systems.
Traditionally, regulators of AI systems approve only one locked set of parameters, yet this approach does not account for the necessity to update models, as data evolve due to changes in patient populations, data collection tools and care management. Regulators must therefore develop novel certification processes to handle such systems. Importantly, the FDA has recently proposed a framework for adaptive AI systems in which they would approve not only an initial model but also a process for updating it over time.
?Shifts in responsibility?Although AI systems have the potential to empower humans in medical decision-making, they also run the risk of limiting personal autonomy and creating new obligations. As AI systems take on more responsibilities in the healthcare setting, a concern facing the system is that clinicians may become overly reliant on AI, perhaps seeing a gradual decline in their own skills or personal connections with patients. In turn, medical AI developers may gain outsized influence on healthcare and should thus be obliged to create safe, useful AI systems and responsibly influence public views on health. As medical decision-making becomes more reliant on potentially unexplained AI judgments, individual patients might lose some understanding of, or control over, their own care. Patients might at the same time gain new responsibilities as AI makes healthcare more pervasive in daily life. As an example, if smart devices provide patients with constant advice, then those patients may be expected to follow those recommendations or else be responsible for negative health outcomes.
The proliferation of AI also raises concerns around accountability, as it is currently unclear whether developers, regulators, sellers or healthcare providers should be held accountable if a model makes mistakes even after being thoroughly clinically validated. Currently, doctors are held liable when they deviate from the standard of care and patient injury occurs. If doctors are generally skeptical of medical AI, then individual doctors may be adversely influenced to ignore AI recommendations that conflict with standard practice, even if those recommendations may be personalized and beneficial for a specific patient. However, if the standard of care shifts so that doctors routinely use AI tools, then there will be a strong medicolegal incentive for doctors to follow AI recommendations.
Ethical data use
There are concerns that bad actors interested in identity theft and other misconduct might take advantage of medical datasets, which often contain large amounts of sensitive information about real patients. Decentralizing data storage is one way to reduce the potential damage of any individual hack or data leak. The process of federated learning facilitates such decentralization while also making it easier to collaborate across institutions without complicated data-sharing agreements.
However, even after models are trained, there remains the risk that AI systems will face privacy attacks, which can sometimes reconstruct original data points used in training just by examining the resulting model. Patient data can be better protected from such attacks if inputs are encrypted before training, but this approach comes at the cost of model interpretability.
Conclusion
The field of medical AI has made considerable progress toward large-scale deployment, especially through prospective studies such as RCTs and through medical image analysis, yet medical AI remains in an early phase of validation and implementation. To date, a limited number of studies have used external validation, prospective evaluation, and diverse metrics to explore the full impact of AI in real clinical settings, and the range of assessed use cases has been relatively narrow. Although the field requires more testing and practical solutions, there is also a need for bold imagination.
AI has proven capable of extracting insights from unexpected sources and drawing connections that humans would not normally anticipate, so we hope to see even more creative, out-of-the-box approaches to medical AI. There are rich opportunities for novel AI research involving non-image data types and unconventional problem formulations, which open a broader array of possible datasets.
Opportunities also exist in AI–human collaboration, an alternative to the AI-versus-human competitions common in research; we would like to see collaborative setups receive more study, as they may provide better results than either AI or humans alone and are more likely to reflect real medical practice. Despite the potential of the field, major technical and ethical questions remain for medical AI. As these pivotal issues are systematically addressed, the potential of AI to markedly improve the future of medicine may be realized.
Summarised by Nashya Haider; Source: Nat Med.?2022 Jan;28(1)