登录查看更多内容

Healthcare: Overwhelmed by the mass of big data, we are lacking data

Rudi Schmidt

Precision Medicine, Real-World Data / Evidence, immune system, multiomics, AI, aging

发布日期: 2021年2月20日

Genomics, multi-omics, imaging data, behavioral patterns, longitudinal healthcare data, environmental factors, and their influence (potentially mediated through the gut microbiome) on patient outcome when treating cancer with immune checkpoint inhibitors: How to investigate and understand the crosstalk between different data classes and between different levels of data granularity?

In many cases we are drowning in a vast ocean of data classes – an ocean of unknown depth.

Because our data sources out of randomized clinical trials (RCTs) are often comparatively small, we are practically sailing across this gigantic ocean on a tiny raft. If these framework conditions are translated into quantitative (statistical / biometric) parameters, the question arises whether the necessary basis for the use of artificial intelligence (AI) is even given.

Mathematics can be very relentless but a few days ago, I discovered an interesting review article (doi:10.1002/1878‐0261.12920):

“Artificial Intelligence in Cancer Research: learning at different levels of data granularity”

The authors, Davide Cirillo, Iker Nú?ez‐Carpintero and Alfonso Valencia from the Barcelona Supercomputing Center, claim: “This review introduces the current challenges, limitations and solutions of artificial intelligence in the heterogeneous landscape of data granularity in cancer research…”

The introduction of this intriguing paper starts promising because realistic: “Data granularity refers to the level of detail observable in the data. The finer the granularity, the more detailed are the observations. In cancer research, data granularity reflects the amount of molecular and clinical information that is collected about a patient or a group of patients, not only in terms of dataset size but also in terms of diversity of measurements, scales, and data types. At present, the available data in cancer research may not always provide the level of granularity required for effective decision-making. For instance, health care resources exhibit a shortage of information about specific cancer subtypes…”. A few lines later the authors hit the nail on the head:

“Despite the availability of cancer big data, a prominent feature of the current data landscape in oncology is the imbalance in the depth of data per patient versus the cohort size. Indeed, while thousands to millions observables per patient are routinely generated, a typical cohort size of specific groups of patients is relatively small.”

I have summarized it as follows for myself personally: Precision Medicine, the advocates of multi-omics approaches and the RWD apologists: we have a data-source issue, and we are all in the same boat with this. And even if the data and connections are sourced, they are still hidden in unstructured health records / patient files. Furthermore, they are sometimes hidden in the course of time, for example in the change of certain parameters such as laboratory values or images within (digital) pathology.

"DEEP LEARNING ARCHITECTURE FOR ANALYZING UNSTRUCTURED DATA"

To pass the time, I entered the search phrase "deep learning analyzing unstructured data medical records" in a well-known search engine. What I found was the almost identical patent application (USPTO 20210027894) filed several months ago (July 2020) from a well-known American (east coast-based) company. The patent application from inventors at Flatiron Health comes with extensive descriptions, such as:

“machine learning architectures have been developed for analysis of relatively short documents. These techniques, however, often do not translate well to longer documents, such as patient medical records. For example, the implementation of long short-term memory (LSTM) models and other forms of recurrent neural networks may not be feasible using traditional architectures, due to the volume of text in the unstructured documents…”

“FIG. 2 illustrates an exemplary medical record 200 for a patient. Medical record 200 may be received from data sources 120 and processed by system 130 to identify whether a patient is associated with particular attributes, as described above. The records received from data sources 120 (or elsewhere) may include both structured data 210 and unstructured data 220, as shown in FIG. 2. Structured data 210 may include quantifiable or classifiable data about the patient, such as gender, age, race, weight, vital signs, lab results, date of diagnosis, diagnosis type, disease staging (e.g., billing codes), therapy timing, procedures performed, visit date, practice type, insurance carrier and start date, medication orders, medication administrations, or any other measurable data about the patient. Unstructured data may include information about the patient that is not quantifiable or easily classified, such as physician's notes or the patient's lab reports. Unstructured data 220 may include information such as a physician's description of a treatment plan, notes describing what happened at a visit, statements or accounts from a patient, subjective evaluations or descriptions of a patient's well-being, radiology reports, pathology reports, etc. …”

I find innovations like this very encouraging. But while there is a tremendous hype about big data, eHealth, digital health businesses and AI, we are still struggling with the problems that this patent application and others describe.

Personally, I suspect that health data that are unnecessarily unstructured lead to innovative therapeutic approaches having an unnecessarily high non-responder rate.

Given the digital health hype, one would think that these problems are antiquated. And above all, that these problems have long been resolved. But they exist to this day and will continue to be for a long time. No matter if you are a patient, politician, entrepreneur, or investor, you should always keep that in mind.

要查看或添加评论，请登录

Rudi Schmidt的更多文章

How exascale supercomputing could turn the strategy of pharma and biotech upside down

2021年3月31日

How exascale supercomputing could turn the strategy of pharma and biotech upside down

Targeted therapies (e.g.

2 条评论
Is Europe′s Gaia-X excluding American healthcare providers?

2021年2月27日

Is Europe′s Gaia-X excluding American healthcare providers?

For many innovative providers in the American healthcare sector, the European Economic Area is a tough nut to crack…
Is your clinical trial biased due to ECOG-PS?

2021年2月24日

Is your clinical trial biased due to ECOG-PS?

Over the weekend, I came to read the February issue of Cancer (American Cancer Society Journals). In this issue there…
WHY REAL WORLD DATA CLASSES WILL AFFECT BUSINESS MODELS IN HEALTHCARE

2021年2月15日

WHY REAL WORLD DATA CLASSES WILL AFFECT BUSINESS MODELS IN HEALTHCARE

Working at a large German hospital chain I find it remarkable how little research literature there is on the…

5 条评论
Why it is worth to read "The negative impact of antibiotics on outcomes in cancer patients treated with immunotherapy…"

2019年7月4日

Why it is worth to read "The negative impact of antibiotics on outcomes in cancer patients treated with immunotherapy…"

Despite some people are still more or less skeptical on microbiome, the accumulation of evidence becomes stronger and…

See all articles

Healthcare: Overwhelmed by the mass of big data, we are lacking data

Rudi Schmidt

Precision Medicine, Real-World Data / Evidence, immune system, multiomics, AI, aging

Rudi Schmidt的更多文章

社区洞察

其他会员也浏览了

?FaST & SpatialQC??for Spatial Transcriptomics Data,???nf-core/airrflow: Immune Receptor Workflow, SMC3 in Heart Development??

Bioinformation Newsletter - October 24

November'23 Edition

TIMM9 in Cancer Prognosis??, Spacia for Cell-Cell Communication??, ChatMol for Molecular Discovery???, AI Tool for Cell Metabolism??

Introducing CAILIN: The Latest Development from COTA’s AI Lab

Developing a reproducible, scalable, and shareable pipeline for alternative splicing analysis using Nextflow

Unai helps SNOMED CT users realize its full value as an ontology

May update: MSD joins Open Targets!

AI-Driven Personalized Oncology: Advancing Phase III Clinical Trial Outcomes Through Comprehensive Genomic and Multimodal Integration

Unraveling the Future of Healthcare: How Technology and Genomics are Revolutionizing Personalized Medicine

Rudi Schmidt的更多文章

How exascale supercomputing could turn the strategy of pharma and biotech upside down

Is Europe′s Gaia-X excluding American healthcare providers?

Is your clinical trial biased due to ECOG-PS?

WHY REAL WORLD DATA CLASSES WILL AFFECT BUSINESS MODELS IN HEALTHCARE

Why it is worth to read "The negative impact of antibiotics on outcomes in cancer patients treated with immunotherapy…"

社区洞察

其他会员也浏览了

?FaST & SpatialQC??for Spatial Transcriptomics Data,???nf-core/airrflow: Immune Receptor Workflow, SMC3 in Heart Development??

Bioinformation Newsletter - October 24

November'23 Edition

TIMM9 in Cancer Prognosis??, Spacia for Cell-Cell Communication??, ChatMol for Molecular Discovery???, AI Tool for Cell Metabolism??

Introducing CAILIN: The Latest Development from COTA’s AI Lab

Developing a reproducible, scalable, and shareable pipeline for alternative splicing analysis using Nextflow

Unai helps SNOMED CT users realize its full value as an ontology

May update: MSD joins Open Targets!

AI-Driven Personalized Oncology: Advancing Phase III Clinical Trial Outcomes Through Comprehensive Genomic and Multimodal Integration

Unraveling the Future of Healthcare: How Technology and Genomics are Revolutionizing Personalized Medicine