Creating data ecosystem - best practices from the Childhood Cancer Data Initiative Symposium
Slide from the Childhood Cancer Data Initiative Symposium

Creating data ecosystem - best practices from the Childhood Cancer Data Initiative Symposium

The Childhood Cancer Data Initiative (CCDI) Data Ecosystem connects new and existing data that comes from different institutions and is stored on various platforms through a centralized portal. Data connected through this ecosystem need to follow certain standards. Standardizing clinical and research data from multiple sources will make it easier and quicker for researchers to find the data they need to answer key scientific questions, which in turn can speed up childhood cancer research progress.

The CCDI 2023 Annual Symposium highlighted different aspects of creating data ecosystem and data harmonization between different entities. Organizers and participants put in the public access a short information about symposium, complete 6-hours recording, and all presentations slides.

In this article I would like to highlight some key concepts and information presented at the symposium.

Web resources related to the CCDI Data Ecosystem

  • Childhood Cancer Data Initiative Hub The CCDI Hub is an entry point for basic scientists, doctors, data scientists, advocates, patients, and families looking to use and connect with CCDI-related data.
  • The CCDI Data Ecosystem, which also includes platforms and tools?that?researchers, doctors, and patients can use to access and explore childhood cancer data. Tools include the Childhood Cancer Data Catalog, National Childhood Cancer Registry and NCCR*Explorer, and Molecular Targets Platform.
  • Childhood Cancer Data Catalog An inventory of pediatric oncology data resources. Each resource page includes a summary description, data content types, and links to access the data. The inventory includes childhood cancer repositories, registries, data commons, websites, tools, and catalogs that manage and refer to data.
  • National Childhood Cancer Registry NCCR data come from hospitals, research centers, health care administrations, and other sources and are accessible through NCCR*Explorer. NCCR’s primary goal is to collect data from children, adolescents, and young adults with cancer—regardless of where they receive care—to better understand the causes, outcomes, effective treatments, and late effects of childhood cancer.
  • Cancer Ontologies
  • Molecular Targets Platform (MTP) Browse and identify associations between molecular targets, diseases, and drugs.
  • Childhood Cancer Clinical Data Commons (C3DC) Allows researchers to search for participant-level data collected from multiple studies. Facilitates longitudinal data collection and analysis. C3DC data model at GitHub.
  • NCI Data Archive has publicly available studies/metadata
  • Pediatric Cancer Data Commons (PCDC) PCDC harnesses pediatric, AYA, and adult cancer clinical data from around the world into a single unified platform for research. It maintains consensus-based data dictionaries and maps all clinical data in the PCDC to standardized terms.

CCDI Participant Index (CPI)

The CCDI Participant Index will help researchers study cohorts by linking patient data from many institutions to a single ID that will be easily searchable, enhance interoperability of data, and preserve patient privacy.

Slide from the lecture of Subhashini Jagu
Slide from the lecture of Subhashini Jagu

CCDI Participant Index (CPI) that allows digital ID mapping and matching reference service to the CCDI Data Ecosystem. It leverages direct and transitive associations between known identifiers that represent the same person in different databases and in different institutions that care about this person. CPI uses two parts of information: (1) Publicly shareable research IDs (available for all consortia participants), and (2) Personal Identifying Information ID (stored only in a separate database or institution), with Privacy Reserving Record Linkage (PPRL) service allowing to link these two components (sensitive data is replaced with random numbers, i.e. tokenized). Finally, institutions retain full control of Personal Identifying Information ID, metadata, and datasets. But each consortia participant will know that, in advance of the information they have at the local database, other pieces of data for the same person exist in other institution or databases.

Slide from the lecture of Subhashini Jagu

HL7 Fast Healthcare Interoperability Resources (FHIR)

HL7 Fast Healthcare Interoperability Resources (FHIR) is used for exchange of EHR data across platforms.

First, EHR vendor must support FHIR, and its installation requires mapping to identify data elements of interest. Second, FHIR facilitates data extraction but does not normalize data post-extraction, and thus for making data usable additional post-extraction data processing is required. Cleaning/processing packages such as CleanEHR and GradeEHR can standardize post-extraction cleaning across sites. Subsequent Natural Language Processing (NLP) can process extracted unstructured data with 90-95% accuracy.

Slide from the lecture of Tamara P. Miller
Slide from the lecture of Tamara P. Miller
Slide from the lecture of Tamara P. Miller


Recently Released FDA RWE Guidances

(from the lecture of Donna R. Rivera)

  • Real-World Data: Assessing Electronic Health Records and Medical Claims Data To Support Regulatory Decision-Making for Drug and Biological Products. Draft Guidance for Industry, September 2021
  • Data Standards for Drug and Biological Product Submissions Containing Real-World Data. Draft Guidance for Industry, October 2021
  • Real-World Data: Assessing Registries to Support Regulatory Decision-Making for Drug and Biological Products Guidance for Industry. Draft Guidance for Industry, November 2021
  • Considerations for the Use of Real-World Data and Real-World Evidence To Support Regulatory Decision-Making for Drug and Biological Products. Draft Guidance for Industry, December 2021
  • Submitting Documents Using Real-World Data and Real-World Evidence to FDA for Drug and Biological Products. Guidance for Industry, September 2022
  • Considerations for the Design and Conduct of Externally Controlled Trials for Drug and Biological Products. Draft Guidance for Industry February 2023


?? LinkedIn's algorithm responds this way: reacting??, commenting??, or sharing?? will help to reach the right person. Let's make this happen, and thank you for being a part of these efforts!

?? Enjoyed this? Join my free monthly newsletter that helps you to stay up to date about latest publications in public health, career development information, free online courses on data analysis and health care, and grants for funding health research. Subscribe free and ?? share with your peers


要查看或添加评论,请登录

社区洞察

其他会员也浏览了