The Data Puzzle Defined
In my two decades in the area of real world data (RWD) generation, I heard two clear messages: ‘I have a limited budget’ and ‘I need data quickly.’? I spent hours on the phone with clients in life sciences talking about ways to meet their data generation needs; coming up with creative ways to reduce timelines and stretch budgets without reducing quality.? One area was particularly difficult to work in – rare diseases.? Almost always the right solution to RWD generation required more time and bigger budgets.? The solution to this time and cost data puzzle was not existent.
This needs to change – and change quickly.? Drug development in rare diseases, often referred to as orphan diseases, is on the rise. ?The number of drugs receiving orphan drug designations (ODDs) has increased to over 300 per year, a clear indicator of growing activity in the sector.? Recent years have seen approvals of ground-breaking therapies like Zolgensma for spinal muscular atrophy and Luxturna for a rare form of inherited vision loss.?
Despite these developments, the unmet need at the macro level is clear – according to the European Commission estimate, there are between 6,000 and 8,000 distinct rare diseases, many without any treatment options. ?Developing a credible evidence package in rare diseases is often done with limited data.? This challenge has led to historically greater use of non-clinical trial evidence, particularly RWD, and the openness by regulators to view it as a key evidence piece.
Over the next few weeks, I will explore the evolving landscape of RWD generation as it applies to rare diseases, highlighting innovative approaches and solutions that address the unique challenges faced by researchers, healthcare providers, and drug developers. The low incidence and prevalence add to significant data generation challenges and result in approaches that often produce sub-standard answers.? To solve the data puzzle, I must first describe the three critical factors that impact those involved in RWD generation in rare diseases: patient dispersal, specific aspects of data fragmentation, and the issue of data ownership and accessibility.
Patient Dispersal
Let’s start with the obvious and yet most challenging issue.? The rarity of these conditions implies affected individuals are scattered across different regions and countries; therefore, to perform any type of robust research in rare diseases, wide geographical coverage must be available. To do this, working across multiple locations and healthcare systems is needed. And here's where the first piece of the time and cost data puzzle appears.
Traditional data collection methods are particularly ineffective in this context. Natural history studies, which track patients over long periods, often wait for new patients to present (become incident cases) and thus require geographically diverse locations. These studies often gather valuable, rich clinical information, biomarkers and outcomes; however, the slow accrual of data can significantly delay research progress and the development of new treatments.? For instance, a study on a rare genetic disorder might only recruit a handful of patients each year, making it impossible to gather information that would yield statistically significant results, without waiting many years. ?
Large-scale data collection initiatives, such as international patient registries, can help mitigate this issue but require substantial funding and coordination. Decentralization of data collection is helping to overcome these challenges but will not remove them completely.? This is not to say that traditional data collection methods do not have value – once completed, large scale data collection studies lead to high-quality information and impactful publications.? My point here is that rare disease researchers will always struggle with time and budget if relying on traditional approaches.? I will cover more novel approaches to data generation in subsequent posts.
Fragmented Data Sources
RWD fragmentation is a well-established phenomenon. Existing data sets may only cover one health care sector (e.g., primary care) or one geography.? For rare diseases, this means that many underlying data sources must be used and/or acquired to achieve even the smallest credible sample size with the necessary data elements.? The cost of such endeavours is substantial; another piece of the time and cost data puzzle.? ???
But other challenges also exist.? The need for highly specific clinically rich data means that for rare diseases data fragmentation space is even greater.? These data exist across diverse sources such as patient registries, electronic medical records, and claims databases, each using different formats and standards. This heterogeneity complicates the creation of comprehensive and standardized datasets necessary for robust analysis. Additionally, many rare diseases require detailed genetic and molecular data, which are often stored in specialized databases that are not easily integrated with clinical data. This need for specialized data adds another layer of complexity, making it difficult to gain a full understanding of the disease.
Regulatory and ethical considerations further exacerbate this challenge. Strict privacy regulations and the need for informed consent due to the sensitive nature of rare disease data create barriers to data sharing and integration. Obtaining consent varies depending on geography.? Variability of data recording practices is inherently greater when we consider that many providers only see one or two cases with a given condition in their lifetime.? These challenges require solutions such as European Health Data Space, another topic for a future post.
Data Ownership and Accessibility
Another layer of complexity in rare disease research is the issue of data ownership. RWD are held by various entities, including hospitals, clinics, insurance companies, and research institutions. Fundamentally, the data are owned (or should be) by the patients themselves, a fact that has profound implications for rare disease research. Patients have a vested interest in contributing to research and may be more willing to share their data if they are assured of its use for advancing understanding and treatment of their condition. However, strict application of regulations like the General Data Protection Regulation (GDPR) which emphasises patient consent and control, while essential for privacy protection, can create additional hurdles for researchers who need to access and use this data for scientific purposes. Do I need to mention time and cost again?
Establishing clear guidelines and frameworks for data ownership and sharing is therefore essential. These frameworks, as applied to rare diseases, should ensure that patient data are used ethically and effectively while protecting patient privacy. Solutions such as patient consent management systems and secure data sharing platforms can help balance these needs and move the field further towards more collaborative and cost-efficient data use.
Next steps
The goal of this series is to highlight the urgent need for innovative methods and approaches to collect and analyse RWD for rare diseases. ?I will return to the challenges highlighted in this post throughout the series.? By exploring new strategies and technologies, particularly AI, I hope to foster a deeper understanding of rare conditions and support the development of effective treatments. Stay tuned for my next post (due August 8th), where I will dive deeper into the issue of patient dispersal and why it matters so much in rare disease research.?
CEO, Inka Health | University of Toronto | Epidemiology PhD
6 个月Thanks, Radek, for this insightful post. There is definitely a need to highlight the (perennial?) critical challenges in rare disease research. As you note, time and cost constraints have long been barriers to progress. I've been keen to explore the potential of synthetic data to complement traditional methods. While not a perfect solution, I wonder whether synthetic data could help address both the question of cost and timeliness by generating large, high-quality datasets that reflect real-world complexities without the need for prolonged data collection efforts (and addressing patient privacy as it's not "real" data, but it is real information). Of course, transparency will be key, and implementing peek and test approaches may require some form of pre-specification that could help ensure that synthetic data is both credible and robust. I’m looking forward to your next post on patient dispersal and the ways we might overcome these challenges. Thanks for starting this series!
Healthtech Founder | Real-World Evidence | Synthetic Data
7 个月Interesting series! Would love to see more.
Co-founder & CEO
7 个月Good stuff - definitely count me in!
President/Founder | Board Member | Innovative Technology Pioneer | Executive Management & Team Leadership | Strategy & Execution | Access & Commercialization | HEOR & RWE | Integrator | Health System Change
7 个月Sign me up Radek!
HEOR/RWE leader with a passion for publication
7 个月Sign me up! Another barrier to research on rare diseases is that some conditions do not have specific ICD coding to identify patients even when a source - such as US claims data - might actually contain a reasonable sample size.