Creating data ecosystem - best practices from the Childhood Cancer Data Initiative Symposium
Boris Bikbov
Senior Researcher | Public and global health | Data management and data visualization | Advanced statistical analysis of both real-world evidence databases, clinical trials and surveys
The Childhood Cancer Data Initiative (CCDI) Data Ecosystem connects new and existing data that comes from different institutions and is stored on various platforms through a centralized portal. Data connected through this ecosystem need to follow certain standards. Standardizing clinical and research data from multiple sources will make it easier and quicker for researchers to find the data they need to answer key scientific questions, which in turn can speed up childhood cancer research progress.
The CCDI 2023 Annual Symposium highlighted different aspects of creating data ecosystem and data harmonization between different entities. Organizers and participants put in the public access a short information about symposium, complete 6-hours recording, and all presentations slides.
In this article I would like to highlight some key concepts and information presented at the symposium.
Web resources related to the CCDI Data Ecosystem
CCDI Participant Index (CPI)
The CCDI Participant Index will help researchers study cohorts by linking patient data from many institutions to a single ID that will be easily searchable, enhance interoperability of data, and preserve patient privacy.
CCDI Participant Index (CPI) that allows digital ID mapping and matching reference service to the CCDI Data Ecosystem. It leverages direct and transitive associations between known identifiers that represent the same person in different databases and in different institutions that care about this person. CPI uses two parts of information: (1) Publicly shareable research IDs (available for all consortia participants), and (2) Personal Identifying Information ID (stored only in a separate database or institution), with Privacy Reserving Record Linkage (PPRL) service allowing to link these two components (sensitive data is replaced with random numbers, i.e. tokenized). Finally, institutions retain full control of Personal Identifying Information ID, metadata, and datasets. But each consortia participant will know that, in advance of the information they have at the local database, other pieces of data for the same person exist in other institution or databases.
HL7 Fast Healthcare Interoperability Resources (FHIR)
HL7 Fast Healthcare Interoperability Resources (FHIR) is used for exchange of EHR data across platforms.
领英推荐
First, EHR vendor must support FHIR, and its installation requires mapping to identify data elements of interest. Second, FHIR facilitates data extraction but does not normalize data post-extraction, and thus for making data usable additional post-extraction data processing is required. Cleaning/processing packages such as CleanEHR and GradeEHR can standardize post-extraction cleaning across sites. Subsequent Natural Language Processing (NLP) can process extracted unstructured data with 90-95% accuracy.
Recently Released FDA RWE Guidances
(from the lecture of Donna R. Rivera)
?? LinkedIn's algorithm responds this way: reacting??, commenting??, or sharing?? will help to reach the right person. Let's make this happen, and thank you for being a part of these efforts!
?? Enjoyed this? Join my free monthly newsletter that helps you to stay up to date about latest publications in public health, career development information, free online courses on data analysis and health care, and grants for funding health research. Subscribe free and ?? share with your peers