Real-world data: A brief review of the methods, applications, challenges and opportunities
Naveen Kumar Yethirajula
Junior Scientific writer || M.s Pharmacoinformatics || scientific writing ||Bioinformatics Enthusiast
Real-world data (RWD) in the medical and healthcare field “are the data relating to patient health status and/or the delivery of health care routinely collected from a variety of sources”
The wide usage of the internet, social media, wearable devices and mobile devices, claims and billing activities, (disease) registries, electronic health records (EHRs), product and disease registries, e-health services, and other technology-driven services, together with increased capacity in data storage, have led to the rapid generation and availability of digital RWD.
Characteristics, types and?applications of?RWD
RWD have several characteristics as compared to data collected from randomized trials in controlled settings.
First, RWD are observational as opposed to data gathered in a controlled setting. Second, many types of RWD are unstructured (e.g., texts, imaging, networks) and at times inconsistent due to entry variations across providers and health systems.
Third, RWD may be generated in a high-frequency manner (e.g., measurements at the millisecond level from wearables), resulting in voluminous and dynamic data.
Fourth, RWD may be incomplete and lack key endpoints for an analysis given that the original collection is not for such a purpose.
Fifth, RWD may be subject to bias and measurement errors (random and non-random).
Few common RWD types, i.e., EHRs, registry data, claims data, patient-reported outcome (PRO) data, and data collected from wearables, as examples to demonstrate the variety of RWD and how they can be used for what purposes.
Registries data enable identification and sharing best clinical practices, improve accuracy of estimates, provide valuable data for supporting regulatory decision-making.
Claims data refer to data generated during processing healthcare claims in health insurance plans or from practice management systems.
PRO data refer to data reported directly by patients on their health status. PRO data have been used to provide RWE on effectiveness of interventions, symptoms monitoring, relationships between exposure and outcomes, among others.
Pragmatic clinical trials are trials designed to test the effectiveness of an intervention in the real-world clinical setting.
Target trial emulation is the application of trial design and analysis principles from (target) randomized trials to the analysis of observational data.
领英推荐
Challenges and?opportunities
Various challenges from data gathering to data quality control to decision making – still exist in all stages of a RWD life cycle despite all the excitement around their transformative potentials.
Data quality: RWD are now often used for other purposes than what they are originally collected for and thus may lack information for critical endpoints and not always be positioned for generating regulatory-grade evidence.
Efficient and practical ML and statistical procedures: Fast growth of digital medical data and the fact that workforce and investment food into the field also drive the rapid development and adoption of modern statistical procedures and ML algorithms to analyse the data.
Explainability and interpretability: Modern ML approaches are often employed in a black-box fashion and there a lack of understanding of the relationships between input and output and causal effects.
Reproducibility and replicability: Reproducibility and replicability2 are major principles in scientific research, RWD included. If an analytical procedure is not robust and its output is not reproducible or replicable, the public would call into questions the scientific rigor of the work and doubt the conclusion from a RWD-based study.
Privacy: Information in RWD is often sensitive, such as medical histories, disease status, financial situations, and social behaviours, among others. Privacy risk can increase dramatically when different databases (e.g., EHR, wearables, claims) are linked together, a common practice in the analysis of RWD.
Diversity, Equity, Algorithmic fairness, and Transparency (DEAT):? RWD may contain information from various demographic groups, which can be used to generate RWE with improved generalizability compared to data collected in controlled settings.
Conclusions
RWD provide a valuable and rich data source beyond the confines of traditional epidemiological studies, clinical trials, and lab-based experiments, with lower cost in data collection compared to the latter.
If used and analyzed appropriately, RWD have the potential to generate valid and unbiased RWE with savings in both cost and time, compared to controlled trials, and to enhance the efficiency of medical and health-related research and decision-making.
Procedures that improve the quality of the data and overcome the limitation of RWD to make the best of them have been and will continue to be developed.
?