AI Medical Research and the Imperative of Data Sharing
Thoughts about technology that is inclusive, trusted, and creates a more sustainable world
These posts represent my personal views on the future of the digital economy powered by the cloud and artificial intelligence. Unless otherwise indicated, they do not represent the official views of Microsoft.
Last January, a paper published in the scientific journal Nature announced an exciting breakthrough in the application of AI to medicine: a team of researchers from Google Health, Google’s Deep Mind, and several prestigious universities had developed a deep learning algorithm that was better at detecting breast cancer from mammograms than a team of human radiologists. The paper deservedly won immediate attention from many publications, including The Guardian, The New York Times, and Wired. In today’s post, I want to briefly summarize this research and then discuss a controversy it has inspired over the issue of data sharing.
With our daily fare of news reports about the latest coronavirus test results, most of us are by now familiar with the two key statistics used to assess the accuracy of a medical diagnostic test. A false negative is a test result that fails to detect the disease being tested for even though the patient really has it. In the case of life-threatening conditions like COVID-19 or cancer, this is a grave error indeed because it means the patient won’t receive treatment that could save their lives. A false positive is a test result that mistakenly says the patient has the disease when in fact, they don’t. This is also a serious error because it can lead to unnecessary treatments that may have serious health risks, inflict needless pain, and waste money.
With these terms, we can understand what the Google breast cancer study achieved. The study used two large data sets, one from the UK and one from the US, that together contained mammograms from many tens of thousands of women, a few hundred of whom had clinically confirmed breast cancer. The researchers trained a deep learning algorithm to recognize the cancer cases based on features in the images, and they then compared the algorithm’s predictions with the original clinically confirmed cases from each country’s findings. For the US data, the AI produced 5.7% fewer false positives and 9.4% fewer false negatives. For the UK data, it produced 1.2% fewer false positives and 2.7% fewer false negatives.
The AI’s retrospective improvement over human experts in the US case is particularly noteworthy, with 9.4% fewer cases of missed cancer diagnosis. The improvement over human performance for the UK data was also positive but smaller perhaps because, in the UK, mammograms are read by two radiologists. The Google team also asked a panel of six US-board-certified radiologists to diagnose a test set of 500 images drawn from their data and found that their AI algorithm beat all six of the human experts.
While it does a better job than human radiologists, with a 30% false negative rate, the Google algorithm is still far from perfect. And it remains an academic study that has not yet been translated into clinical practice. Nevertheless, there is little doubt that this paper represents a milestone in the still young but rapidly growing field of AI cancer diagnosis. But this past August, Nature published a sharp critique of the Google paper by an all-star team of cancer researchers, computer scientists, and statisticians from Stanford, Harvard, MIT, the University of Toronto, and other prestigious institutions. The title of their critique was “The importance of transparency and reproducibility in artificial intelligence research,” and their complaint concerned not the quality of the original paper but the failure of its authors to share their data as well as certain critical details of their computer code.
The critics argue that withholding this information makes it impossible for other researchers to replicate the original findings or build upon them. They write that:
“Many egregious failures of science were due to lack of public access to code and data used in the discovery process… Making one’s methods reproducible may surface biases or shortcomings to authors before publication. Preventing external validation of a model will likely reduce its impact and could lead to unintended consequences. The failure of [the Google paper authors] McKinney et al. to share key materials and information transforms their work from a scientific publication open to verification into a promotion of a closed technology.”
While it might seem self-evident that sharing data used to develop AIs that can diagnose cancer is a good thing, it’s only fair to point out that there are two sides to this debate. The Google researchers offered several substantive reasons for their stance. Contrary to what some might expect, concern about patient privacy was not one of them. The authors readily acknowledge that the unnamed US medical school, which shared its data with them, “judged that the potential benefits of the research outweighed the minimal privacy risks associated with sharing de-identified data with a trusted party capable of and committed to safeguarding these data.”
Having addressed privacy, the Google team then advanced two arguments against sharing their data and code which appear more relevant in this context. First, they argued that providing other researchers with a fully functional copy of their code and data would be tantamount to releasing “medical device software” and this would raise regulatory and liability concerns. Second, they admitted their economic interest in a certain degree of secrecy by noting that “the development of impactful medical technologies must remain a sustainable venture to promote a vibrant ecosystem that supports future innovation.” (Regrettably and somewhat ironically given the topic at issue, neither Nature nor the Google authors have made their reply to their critics freely available, but the journal will sell copies to interested readers.)
I don’t want to dismiss these reservations about medical data sharing. As the Google authors say, the patient privacy issue can be resolved with proper anonymization and other safeguards, as was clearly anticipated by Europe’s GDPR data protection law. But no one doubts that the fruits of AI medical research must pass regulatory review before being unleashed on the world. And certainly, entrepreneurs who make big bets on AI research with the potential to improve millions of lives have a right to seek a return on their investment.
A viable international system for safely and legally sharing the data that medical AI algorithms depend on is essential. The potential benefit to humanity is too great to be ignored. The consulting firm McKinsey has predicted that AI-based medicine could extend average human life expectancy by 1.3 years and save from $2 trillion to $10 trillion in healthcare costs annually by better tailoring drugs and treatments to patients. We don’t want to let data hoarding jeopardize those outcomes.
It’s worth pointing out that Europe is currently leading the world in thinking about a regulatory framework to govern the sharing of healthcare data for purposes both of scientific research and practical application in the clinic. Speaking at a conference in February about Europe’s new Beating Cancer Plan, European Commission President Ursula von der Leyen said that:
“Within the data we have lies an incredible amount of missed opportunities, unknown improvements, potential correction of false hypothesis—but we must use and share these data. We are now setting up a Common Health Data Space, an infrastructure where scientists and medical clinicians will be able not only to store clinical and research data, but also to access other scientists' data.”
I’ve written before about the EU’s plans for shared data spaces in several key industries, of which healthcare is only one. There is no doubt that such data spaces will live in the cloud and will require a well-conceived framework of regulations and standards for open data sharing. With the rapid progress of medical AI and, in particular, of data-hungry AI-based cancer diagnostics, the idea of a Common Health Data Space is thus especially important. The EC has promised to publish its strategy for this data space next year—it’s a development that bears watching, and I promise to revisit this topic in due course.