#PrecisionMedicine or "garbage in - garbage out"?
Thomas Wilckens (托馬斯)
MD #PrecisionMedicine 精密医学 thought & technology leader, Keynote Speaker, industry advisor 30K+ Followers #Biotech #Diagnostics #DrugDiscovery #Innovation #StartUps #ArticialIntelligence #Investing
Following the development of data sciences and the hype around Big DATA’s potential to solve major problems in particular in medicine I wonder if data scientists will be regarded as modern alchemists that can convert mercury into gold? Are scientist fooling themselves?
The following post summarizes some background of a roundtable discussion @PrecisionMedicine Leaders Summit San Diego August 2016
Is most published Science crap? To get started read this:The Economics of Reproducibility in Preclinical Research...US$28,000,000,000 (US$28B)/year spent on preclinical research that is not reproducible. The reproducibility of biomedical research: Sleepers awake! How scientists fool themselves – and how they can stop. Securing reliability and validity in biomedical research: an essential task. "If the natural cortisol would have been included in research as a logic control in glucocorticoid research life saving drugs like eplerenone would have been found 40 years ago.." while many still pursue illusive R&D programs on an ill defined and incomplete knowlege on the plethora of actions of natural cortisol Activation of the Glucocorticoid Receptor in Acute Inflammation: the SEDIGRAM Concept (We still lack a comprehensive view on natural cortisol orchestrated actions and thus, targetting speculative actions on a molecular level is highly speculative and most likely misleading, not that cortisol affects more than 80% of our genes in a highly regulated manner).
Do not get me wrong, this is a statement pro data scientist as enablers of PRECISION MEDICINE! However, to me it seems we expect data scientist to transmute base metals into gold or platinum, or even worse, garbage into value and insight. We seem to expect miracles and the discovery of the Holy Grail. How do I come to this judgment? To corroborate this contention I would first like to ask some questions:
- If most published research findings are false, which data should be used for retrospective analysis and interpretation? What can Watson learn from public domain data?
- Furthermore John P. A. Ioannidis asks Why Most Clinical Research Is Not Useful; what can be the basis for knowledge mining using AI?
- Since there are no standards to make data comparable, in particular regarding new omics derived data, how should they be managed to make them comparable?
In a recent editorial the National?Biomarker?Development?Alliance?(NBDA) calls biomarker R&D since 1990 a systematic failure.
“While acknowledging that most flawed biomarker studies may involve multiple, cascading errors, the penalty for mistakes in early discovery, especially bias in sample collection or lack of standards for collection and storage is manifest in late stage failure.”
- What data can be used to evaluate new R&D project in drug discovery?
The NBDA further comments: ”Most biomarker discovery still takes place in government-funded university laboratories that are ill equipped to undertake the myriad procedures required for stringent biomarker profiling on the scale required to achieve the evidence needed to attract larger investment in clinical trials”.
- Who else, if not academia should engage in harvesting reliable, valid and constantly reproducible data?
- In more general terms, if Science is not self-correcting and dogmas are perpetuated continuously, how will we know what data to choose, which paradigm or hypothesis to apply?
A logic consequence would be to start from scratch. Further to the concept developed by the NBDA and a recent review on data integration I would like to add a few suggestions:
- Develop and apply new (your own) industry standards (standard operation procedures, SOPs) covering all steps from sample collection, processing, biobanking over omics-analysis to data analysis, management to storage including solutions re data safety & security. SOPs will become a key value driver in the creation of actionable information beyond biopharmaceutical R&D.
- Generate new, unbiased, not contaminated datasets for distinct syndromes to enable new molecular definition of disease without relying on previous disease classifications; compare trends in cancer diagnostics away from tissue to genomics guidance and classification.
- Think big, start lean, plan to scale a priori, move fast, use and adopt what’s there and what works reliably and validly, maybe in other industries; adoption may require converging domain competencies.
Obviously this approach seems bold and it is quite a technical challenge, not even considering costs, but it is doable if rigorously planned and all essential intersections are vertically integrated in a flagship approach; i.e. identify solutions and related business cases, while addressing unprecedented frictions in lean, vertically integrated approaches with the predefined strategy to build for scale.
How many companies apply such an approach to create value and content from proprietary, raw data? Some companies curate data manually, which is obviously one of the best ways to secure value, but not applicable to true Big DATA solutions, other rely on trends, i.e. confirmation of research result by repetition. Thus might be applicable to some mainly raw data sets, but not necessarily the delineated contextual interpretation that may simply reflect mainstream thinking, but not necessarily be correct; compare the “Cortisol Story”.
We see a plethora of companies engaging in data integration, generation of content, ontologies, data analysis and interpretation (also from published or open source resources with the obvious bias to integrate irrelevant/false data and thus questionable value) with methods like machine learning to tackle the data deluge. Whatever data and sources of data can be used bears the risk to somehow be analyzed for a steep prize to the customers, if the raw data have not be generated with standardized SOPs and under conditions which make them comparable: An inherent risk to generate successively invalid fundaments for future R&D and related investments. Thus, the question arises if these approaches can be successful in value creation a priori?
Looking closely at current trends in large companies as well as start-ups it seems some business models and data strategies may not realize the dimensions of the raw data problem. In fact, I am concerned that too few are aware that we currently risk contaminating new, high profile research data with “garbage” from the past.
This problem seems even more pressing and complex for new players like Calico, Human Longevity or NantHealth just to name few that actually claim to tackle the holy grail of aging or others that simply wish to deliver truly innovative Precision Medicine-based diagnostics or drugs as well as for established players in Biotech/Pharma. Somehow it seems inevitable to rely on data generated mainly by academia in particular in early stage translational research or clinical trials. What different approaches should those players pursue to secure optimal ROI? Molecular Health or Foundation Medicine, just to name two examples for successful start-ups in PRECISION MEDICINE actually generated scalable business models based on rigid internal SOPs and standardized data processing, other have standardized biobanking, while it must be emphasized that sample processing in cancer analytics is comparably straight forward, as compared to multi-omics approaches and other emerging sensors regarding an individual’s health status.
In more general terms I wonder if the basic and complex problem of sample collection, processing and then data integration is actually recognized and addressed appropriately in particular by executives that allocate investments, but also by those that generate and work with the data and related analysis. It seems the value of new SOPs to enable multi-omics analysis to become part of a clinical analytics ecosystem seems highly underappreciated. Value creation starts with "simple" procedures like blood/sample/tissue collection...
The challenge is up for innovation and related first movers including start-ups and global leaders as well, while those that pay the prize must push for value being generated from the 250 Billion US$ annually invested in biomedical R&D.
Currently chances are high that Big DATA will continue to fail medicine:
The following statement by Cleveland Clinic's Douglas Johnston in a recent panel at the MIT applies not only to published data or health records, but the way we continue to harvest data in general.
“In healthcare in general we’ve been applying data science poorly. We have a medical literature that is contradictory, and we are relying on 100 year old transcription technology for our records. We still have to dig through those records to get the data. I see the results are failing because it's garbage in and garbage out.”
I do not see why data scientist would want to be branded modern alchemists, getting the blame for failure and waste, while the problem is routed much deeper, although solvable! Big DATA has the potential to change medicine and eventually deliver PRECISION MEDICINE, but this will require creative initiatives, flagship projects, a concerted effort of all players engaged in data harvesting & analytics as well as significant investments...
Further reading:
- Precision Medicine: Disrupting (pre-)clinicial development for chronic debilitating diseases
- Fixing a broken R&D model in Precision Medicine
- PRECISION MEDICINE
- SYMBIOTIC INNOVATION: A paradigm shift in R&D
- Reinventing Biomedical/-pharmaceutical R&D
- The 128 US$ billion (US only) Arthritis Conundrum:
- Will Google disrupt Medicine, Health Care
Disclosure:
InnVentis is a multi-omics/machine learning company was developed alongside the “Symbiotic Innovation” paradigm with the support of deep innovation GmbH, which is gratefully acknowledged. InnVentis goal is to build a vertically integrated B2B and B2C solution for precise diagnostics and therapeutic decision making & disease monitoring in real-time (supported by artificial intelligence) for major chronic inflammatory diseases; i.e. enable Precision Medicine.
For further discussions please join the LinkedIn group:
CSO at GeneCentric Therapeutics, Inc.
6 年Excellent post. I plead with many entities I work with to stop rummaging through the dumpster of junk studies (or hope that a team of data scientists will sort it all out) and just roll up their sleeves and do it right themselves. Most don’t like this suggestion and turn back to the dumpster, hoping to find an old masters painting in the bottom. Pity.
Principal Broker- New Frontier Real Estate
7 年well garbage service company is good profit generating company I think, maybe income generating kinds, You see when I saw operations in nys., they pick garages up even recycled items seperately and sells for profits, they also charges to dumps, each house holds pay trash pick ups, this types of company needs capital investments on trucks, few personals to do the jobs and just about every where people pays for service, I explain sometimes to our children about investing on bread and butter kind things compsnies does, serves, like eletricites, waters stuffs which everyone uses, needs services, yup way I see garbage serviceing company is pretty nippy business is???????? to generate company profits,
Health is a crisis in a world w/o feelings and concerns(experiments are no good when no answers can be found(a sad/sad reality)