Next Generation Clinical Data Platform
https://ckg.readthedocs.io/en/latest/INTRO.html

Next Generation Clinical Data Platform

The pandemic has disrupted the clinical research domain to possibly the greatest extent. The drug pipeline across different phases was abruptly halted, trials were put on hold, which led to a business continuity crisis for life sciences enterprises. Programs and business requirements which were generally categorized as ‘good to have’, became ‘must haves’ for enterprises, such as virtual trials, decentralized trials, and telehealth, to name a few.

Efficient clinical trial data analysis is essential to prove efficacy and safety of new investigative products and therapies. To achieve the efficiency of clinical data management practices, it is paramount that platforms keep pace with the evolving landscape of study protocols, increased regulatory constraints, and globally distributed data management teams.

In the current context, commercially available clinical data management platforms offer a containerized model, with variable extensibility, scalability features, and minimal or no artificial intelligence or machine learning (AI/ML) capabilities. This blog post highlights the salient features of knowledge graph-based clinical data platforms and compares it with the data lake and cluster approaches. 

Multiple technologies and standards are used to collect clinical trial data 

For various business requirements including monitoring and reporting, there’s a need for integrating, consolidating, transforming, and managing clinical data. The need to process voluminous data with inconsistent formats and changing business requirements brings significant challenges. Non-electronic data capture (EDC) data categorized as ‘laboratory’, ‘biomarker’, ‘imaging’, and ‘patient-reported’ account for most of the information collected during a clinical trial.

Typical clinical data platforms are designed as per traditional ways of working— the EDC platform is designed to collect data as per the electronic case report form (eCRF); the clinical trial management system (CTMS) system defines visits, tracks progress, and more; the eCRF designer helps with questionnaire configuration and edit checks; and the clinical data warehouse enables holistic clinical trial data analysis; among many other examples.  

The current implementation of clinical data platforms offers manual or semi-automatic data management and transformation processes. The ability to validate clinical trial data integrity is limited to anticipated edit checks and significant manual data review processes. Furthermore, the current processes do not intrinsically improve or self-learn as data is processed due to manual and tedious data management, multiple data copies, raw/review models, inadequate metadata management; all of which extend cycle times.

With the surge in wearables and mobile apps, along with 5G connectivity, the amount of patient data being generated is rising multifold. It becomes even more challenging for the enterprises to perform clinical data management on such an unprecedented number of data sets from diverse sources— electronic case report forms (eCRF), wearables, apps, devices, and more.

Question to ponder on — does the current technology platform support transition to clinical data science? 

Traditional data management techniques rely on manual effort coupled with some programmable routines offered within the platform, generally at the level of a domain. A library maintained at the therapeutic area, domain, or at questionnaire level is used to configure a protocol.

Clinicians, statisticians, and medical monitors, along with safety and other business groups, expect the ‘clean’ version of clinical data to be available as soon as it is collected. The surge in data from various sources (more than 70% is generated outside of EDC), along with business expectations of having a ‘clean’ version of data available at the earliest, becomes an uphill task. Sometimes these demands don’t seem feasible to achieve. It may be possible if the platform offers the deployment of newer technologies such as AI/ML/NLP across the business processes that support managing large sets of data.

Can next-generation technologies be deployed on the current implementation of clinical data platforms?

The answer to this question is most likely yes if all data is consolidated into a data lake. However, a data lake has its own challenges— it struggles with structured query language (SQL) concurrency and in the rush to ingest incoming and essential data, it often gets stored without domain and precise metadata, which hinders its meaningful retrieval.

The other approach of spark-based clusters entails huge data engineering efforts leading to higher cost, complexity, and a lack of lineage, which is often not repeatable.

To meet the desired CDM paradigm, a new avatar of the clinical data platform needs to be envisioned. Such an avatar would need to ingest a variety of data from several sources and provide a longitudinal view of patient data to the business for several reasons. These include data management, safety reviews, and other business activities on a real-time basis with minimal manual intervention. This new avatar would also need to have the ability to leverage technological advancements such as AI/ML/NLP algorithms.

Introducing the knowledge graph 

Relationship or connectivity is the most important characteristic of today’s data and entities, from power grids, retail to supply chains, or patient care data in electronic medical records (EMRs) to patient data in EDC. As the ecosystem becomes increasingly interconnected and complex, using technologies to leverage relationships and their characteristics becomes significant. The ability to define and store relationships as part of data itself is the single most differentiating factor in a graph database besides scalability, performance, the ability to change, and others.

Graph database technologies provide flexible ways to extend study data models to new definitions, preserving the contextual meaning of the original data.

This helps uncover relationships amongst clinical and broader R&D data and build the ability to analyze data that one did not know that it needed to be analyzed. Few examples are given below:

1.      Relationship between concomitant medication and AE 

2.      Studies where subjects were treated with dosed compound 1 and compound 2 - what other adverse events have been reported for patients with elevated liver values who received compound 1? 

3.      Which of the male/female patients that were given a dose of drug A have had high blood pressure measurements during episodes of severe headache? 

A next-generation graph-powered clinical data management platform enables the possibility of rapid analyses of clinical trial patient data, both within individual trials and across multiple clinical trials for meta-analyses. Clinical Data Interchange Standards Consortium (CDISC) - based clinical data ontology enables quick integration of patient data from EDC to a submission ready format. Pooling analysis and combining real-world data is seamlessly enabled via ontology which helps guide future clinical trial designs and adaptations.

The road ahead for clinical trial data management

Efficient clinical trial data management is crucial to the continued success of biopharmaceutical enterprises. The complexity and rapid changes to the clinical data management landscape requires new sets of tools to keep pace with business initiatives. Graph technology provides a strong foundation for next-generation clinical data platforms with the ability to develop and deploy AI/ML solutions leveraging the inherent graph characteristics– metadata and data together, less sensitive to structural changes, as well as scalable and robust performance. 

要查看或添加评论,请登录

Ajay Tomar的更多文章

社区洞察

其他会员也浏览了