Healthcare: Overwhelmed by the mass of big data, we are lacking data
From Wikimedia Commons, the free media repository

Healthcare: Overwhelmed by the mass of big data, we are lacking data

Genomics, multi-omics, imaging data, behavioral patterns, longitudinal healthcare data, environmental factors, and their influence (potentially mediated through the gut microbiome) on patient outcome when treating cancer with immune checkpoint inhibitors: How to investigate and understand the crosstalk between different data classes and between different levels of data granularity?

In many cases we are drowning in a vast ocean of data classes – an ocean of unknown depth.

Because our data sources out of randomized clinical trials (RCTs) are often comparatively small, we are practically sailing across this gigantic ocean on a tiny raft. If these framework conditions are translated into quantitative (statistical / biometric) parameters, the question arises whether the necessary basis for the use of artificial intelligence (AI) is even given.

Mathematics can be very relentless but a few days ago, I discovered an interesting review article (doi:10.1002/1878‐0261.12920):

“Artificial Intelligence in Cancer Research: learning at different levels of data granularity”

The authors, Davide Cirillo, Iker Nú?ez‐Carpintero and Alfonso Valencia from the Barcelona Supercomputing Center, claim: “This review introduces the current challenges, limitations and solutions of artificial intelligence in the heterogeneous landscape of data granularity in cancer research…”

The introduction of this intriguing paper starts promising because realistic: “Data granularity refers to the level of detail observable in the data. The finer the granularity, the more detailed are the observations. In cancer research, data granularity reflects the amount of molecular and clinical information that is collected about a patient or a group of patients, not only in terms of dataset size but also in terms of diversity of measurements, scales, and data types. At present, the available data in cancer research may not always provide the level of granularity required for effective decision-making. For instance, health care resources exhibit a shortage of information about specific cancer subtypes…”. A few lines later the authors hit the nail on the head:

“Despite the availability of cancer big data, a prominent feature of the current data landscape in oncology is the imbalance in the depth of data per patient versus the cohort size. Indeed, while thousands to millions observables per patient are routinely generated, a typical cohort size of specific groups of patients is relatively small.”

I have summarized it as follows for myself personally: Precision Medicine, the advocates of multi-omics approaches and the RWD apologists: we have a data-source issue, and we are all in the same boat with this. And even if the data and connections are sourced, they are still hidden in unstructured health records / patient files. Furthermore, they are sometimes hidden in the course of time, for example in the change of certain parameters such as laboratory values or images within (digital) pathology.

"DEEP LEARNING ARCHITECTURE FOR ANALYZING UNSTRUCTURED DATA"

To pass the time, I entered the search phrase "deep learning analyzing unstructured data medical records" in a well-known search engine. What I found was the almost identical patent application (USPTO 20210027894) filed several months ago (July 2020) from a well-known American (east coast-based) company. The patent application from inventors at Flatiron Health comes with extensive descriptions, such as:

“machine learning architectures have been developed for analysis of relatively short documents. These techniques, however, often do not translate well to longer documents, such as patient medical records. For example, the implementation of long short-term memory (LSTM) models and other forms of recurrent neural networks may not be feasible using traditional architectures, due to the volume of text in the unstructured documents…”

or

“FIG. 2 illustrates an exemplary medical record 200 for a patient. Medical record 200 may be received from data sources 120 and processed by system 130 to identify whether a patient is associated with particular attributes, as described above. The records received from data sources 120 (or elsewhere) may include both structured data 210 and unstructured data 220, as shown in FIG. 2. Structured data 210 may include quantifiable or classifiable data about the patient, such as gender, age, race, weight, vital signs, lab results, date of diagnosis, diagnosis type, disease staging (e.g., billing codes), therapy timing, procedures performed, visit date, practice type, insurance carrier and start date, medication orders, medication administrations, or any other measurable data about the patient. Unstructured data may include information about the patient that is not quantifiable or easily classified, such as physician's notes or the patient's lab reports. Unstructured data 220 may include information such as a physician's description of a treatment plan, notes describing what happened at a visit, statements or accounts from a patient, subjective evaluations or descriptions of a patient's well-being, radiology reports, pathology reports, etc. …”

I find innovations like this very encouraging. But while there is a tremendous hype about big data, eHealth, digital health businesses and AI, we are still struggling with the problems that this patent application and others describe.

Personally, I suspect that health data that are unnecessarily unstructured lead to innovative therapeutic approaches having an unnecessarily high non-responder rate.

Given the digital health hype, one would think that these problems are antiquated. And above all, that these problems have long been resolved. But they exist to this day and will continue to be for a long time. No matter if you are a patient, politician, entrepreneur, or investor, you should always keep that in mind.

 

要查看或添加评论,请登录

Rudi Schmidt的更多文章

社区洞察

其他会员也浏览了