ClinTech - Doing Better (2) - Data Integration with AI
Second in my series of posts on how we can improve technologies used in clinical trials. I cover quite a lot of ground here and its a bit techie for a blog post, so I would suggest bookmarking this one and re-reading later.
I am going to suggest how effective integration can help resolve some of the problems we see in clinical research.
Technology in Clinical Trials
We have had clinical research technologies such as EDC, RTSM, eCOA, CTMS and eTMF for 20 years. They have met a relatively constant set of functional requirements over this time. The Clinical Trial Management Systems (CTMS) that I worked on at IBM back in the late 90's is not massively different from the systems you see today. What has changed is the evolution and power of the internet.
What has not significantly improved is the ease of integration. As mentioned in Part (1) the lack of good integration between these different technologies is often the underlying cause of inefficiencies, quality problems and delays.
Fragile / Non validated Integrations
I would class many of the clintech application integrations used in clinical trials today in a questionable state of validation.
In my definition of 'validated', if the end-user can adversely effective the safe running of software, then it is not validated. There are 2 common situations for this;
Most clintech validations rely on the synchronisation of meta information ('external keys') such as study id, site id, patient id or visit id. If either app on either side of an integration allows any of these external identifiers to be changed, then that breaks the integration.
Now, I can hear the 'get out of jail' argument against this - if the integration fails due to a key change, then an alert is raised and a manual intervention occurs. Downstream, this typically means that reports and dashboards are out of date until the integration is fixed.
To do these properly, the integrations need to be more intelligent, and the applications that are being integrated need to maintain internal id's that do not change if the external id changes. This means that integrations are NOT broken if the user changes a key on the front end. It is not 'as easy' and given the limited time to implement an integration probably the reason we do not routinely do this.
Data Lag
A lot of the cost in clinical trials is measured in 'lag'. We have the time to complete an activity and the lag between each instance of each activity. Cost is measured by the time spent performing a task, and the overhead of time not spent but being prepared for during the lag. Costs can also be measured the inability to make timely decisions. If it takes, lets say, 6 weeks to gather information on monitoring visits - that is execute, record, clean and report - that is 6 weeks before decision might be taken on the information contained. These delays accumulate.
Integration Lag
Clinical trials are slow because data is slow. Data is slow in part because integrations are poor or non existent.
With no integration, you have the lag between the data in System A before it is manually re-keyed into System B. In many cases, following this transposition, it cannot be classed as 'clean' until QC checked. The time to QC check amplifies the delay before data is considered trustworthy and usable.
Lag can fall back to the 'lowest common denominator'. The laggiest data determines the timeline. To some degree, a patients data is not considered clean until all the patients data is clean. Classifying data importance (e.g. primary endpoint significant) as a consideration in the determination of 'clean' occurs in places - Biostats for example - but rarely in status roll-up and data cleaning workflows.
领英推荐
Reducing variability to reduce complexity
The greater the variance between systems, technologies and standards, the more difficult it is to have System A speak reliably to System B. One most obvious answer to that problem is to have System A and System B part of the same product - single platform solutions. We see solutions like this from companies such as Clincase where traditionally separate software modules EDC and IVRS are part of the same instance. These systems share the same study, site and patient records, so interfacing is less complex.
The second form of simplification is for software products to share the same platform. For example with Veeva. The Veeva Vault Platform helps ensure a level of consistency for a software product. If you are able to use Veeva Vault CTMS, you are likely to be able to pick-up another Veeva Vault product with limited training. This also extends to integration. Vault to Vault integration software is similar.
A hybrid solution that we (I) have not seen yet is the application of platform solutions combined with the centralization of common (Master) data used by all connected applications.
We do not need multiple copies of a study, a site, a patient, a patient visit or even key data within a patient except where we need to manifest this data 'somewhere else'. Within a clinical trial platform the most efficient and reliable method is a means to be able to refer to the same master data.
The challenges of complex data mapping
Having previously worked on the designs of mapping EHR data from large scale primary care repositories to clinical trial systems, I have seen first hand the variability and complexity of the mapping of data. This is especially the case when the EHR sources differ from site to site. To elaborate, here are 3 examples of where mapping can be nasty;
None of the above issues are insurmountable, but, they are multiplied if each site operates with differing EHR or source record systems.
When it comes to mapping between 2 products used in a single clinical trial, we often revert to programming. Products such as Veeva Vault and Medidata Rave have well established application programming interfaces (API's), however, as both their products are 'configured' with metadata specific for each clinical trial, any interface code that you may write needs to be advanced enough to read this configured metadata and use that as the basis for the mapping and rules to transfer data from system A to system B. That is far from easy. If you are a CRO thrown a set of disparate technologies with only weeks between configured systems ready and First Patient In, a validated integration is hard, if not impossible to deliver.
The role of AI
A solution to this problem is for vendors to implement visual high end integration components within their product ideally supported by Retrieval-augmented generation (RAG) enhanced AI used to help smooth out the complexity of mapping that cannot be defined up front.
That is a big sentence. Let me break this down. Integrations should have a user interface. Without a user interface they remain with the techies. One complexity of a user interface is the representation and configuration of mapping - how data from one system maps to data in another system. RAG enhanced Artificial Intelligence can be used to automate the default mapping between systems - partly from convention, partly from loaded business knowledge.
Human involvement will continue to initially play a role where weak meta information on either side of an integration is insufficient for AI based mapping. The excellent work carried out by Andrew Mitchell and team at Yeza integrating SAE PDF's to Safety case management systems is a perfect example of this semi-automating AI based data mapping.
Conclusion
Poor or non existent integration between both systems and processes is leading to hidden costs and delays. The proliferation of autonomous modular technologies is compounding this issue leading to inefficiencies across the clinical trial lifecycle. Augmented AI has the potential to smooth out some of the problem areas.
Our ability to effectively manage change in clinical R&D is not as good as it should be. We tend to implement the 'new' without phasing out the 'old'. AI's success relies on it NOT becoming another one of these long term additive layers that add further costs and complexity.
I will describe the impact of a lack of process integration in a following post.