The Power of Data Fusion: Linking Clinical, Claims, and Study Data for More (and Better) Research
Bryan Farrow
Senior Director of Product Marketing leading a cross-functional marketing team at Certara
Smash together enough hydrogen atoms, and you can light up eight planets. That’s the simple version of solar fusion. Dive below the surface, and you’ll find gamma rays, neutrinos, and a bit of quantum tunneling. Suffice it to say, the process is complex, but for anyone looking for the best experience here on earth, so very worthwhile.
The same could be said for data fusion in clinical research. EHR, claims, study datasets, registries – joining these pieces together could unleash knowledge with life-giving power. Like the stellar variety, fusion in our field takes a lot of kinetic energy (read: will and ingenuity, and tokens in the place in protons). But I’d like to show, with the examples that follow, just how much heat and light we stand to generate.
Linking EHR and claims data
We know that levostatin reduces the risk of cardiovascular disease in patients with high cholesterol. Hundreds of statin trials, not to mention 30-plus years of clinical practice, have confirmed the benefit and safety of these drugs. But could either sources of insight give us precise models of how, for example, treatment duration, total cholesterol, and risk of MI all relate?
Not likely. EHR data doesn’t reflect activity outside of the care setting, such as prescription fills. For a patient treated at more than one hospital, any one EHR record will be incomplete. Claims data, on the other hand, tells a continuous clinical story. Sourced at the level of the payer, claims data represents care across providers and pharmacies. The story is rich in plot, covering nearly all diagnoses, procedures, and prescriptions. But it’s short on detail. Lab values, for example, aren’t reported in claims.
Datasets assembled from both sources allow researchers to define cohorts characterized by clinical factors and treatment as it occurred in and outside of the hospital. Best of all, the data-rich picture that emerges has virtually no missed events—the scourge of large, outcome-based studies—allowing researchers to place more confidence in their results.
Linking trial and clinical care data
Nothing rivals the randomized controlled trial for assessing the safety and efficacy of an intervention. But this gold standard has its limitations. Consider the level of detail typically captured in screening CRFs, compared to the lifetime of symptoms, findings, and treatments participants bring to a study. Unless required by the protocol, forms for medical history and concomitant medications don’t usually collect details like genomic variants or complete blood panel results. But these details, which frequently are captured in electronic health records (EHR), biobanks, and registries, could explain differences in how participants respond to treatment or experience adverse events. What looks like normal variation in response may, in the light of richer baseline data, point to important biomarkers.
And what about long-term data? All studies need to end eventually, but real-world data can extend their reach, and enrich their results, with no additional burden on participants. How might time-to-event curves change if biostatisticians could factor in progressions, remissions, and hospitalizations captured after the end of study visit? For some studies, this kind of trial extension could be critical. The FDA will likely approve a COVID-19 vaccine under an emergency use authorization (EUA), with limited data on the product’s long-term safety and efficacy. If so, they will almost certainly require the drug maker to collect more data through a post-market study. Real-world data holds the key to powering these studies efficiently, allowing biostatisticians to monitor infection and symptoms among treated cohorts for years after the official end of the Phase III trial.
Who benefits?
Solar and wind power may yet improve life for all of us. But who benefits from the power of fused data? The short answer is, everyone who contributes. Hospitals and clinics willing to link their EHR data with data from claims, study datasets, and other sources become attractive partners to trial sponsors. Drug developers, too, will have better access to the data needed to conduct pragmatic trials, tying creatinine levels to co-pays to help payers, providers, and regulators make more informed decisions.
Patients have the most to gain. That’s as it should be. Arguably, no possible dataset can completely describe a human being. Certainly, no isolated one can. If combining data sources can shed light on which treatments improve our lives, not just our labs, the project deserves our time. With costs rising and access to care in short supply, data fusion may just light our path to a better system.
Want in on the energy source of the future? Ask me about our mission at TriNetX, or visit trinetx.com.