data42 - unlocking value from clinical data

data42 - unlocking value from clinical data

Four years ago, we created Novartis’ seminal data42 program to deepen our understanding of diseases, medicines, and patients with insights from existing clinical trial data. Since then, we have linked Novartis’ clinical, omics and image data, harmonized data for findability and analysis, added pre-clinical and real world data, provided access at scale, and enabled hundreds of research projects, all while ensuring patient privacy and compliant handling of data. Our users have created new insights on topics ranging from natural history of disease to biomarkers to patient subgroups to polygenic risk scores to external control arms, all with the aim of bringing more medicines to more patients, faster. As I transition from data42 to an exciting new role in the Novartis Foundation, I wanted to share my view on four key success factors for a program which tackles complex questions with large, diverse, and sensitive data.

  1. Define your project – and value story – carefully
  2. Understand your audience and obsess about your users
  3. Be agile as you curate data and develop tools and platform
  4. Collaborate, collaborate, collaborate

In data42, we have built an entirely new platform to enable innovative governance, data pipelines, scalability, and flexibility. However, the success factors apply generally to complex, data-driven projects, from comprehensive curation of data in a data lake or data mesh to analytical use cases.

Define your project – and value story – carefully

With many business and scientific questions and a lot of data, as well as much hype around AI, it is tempting to jump to action too quickly. This can turn into a Sisyphean data curation effort or insights that are not actionable, so it is essential to carefully define the problem you are aiming to solve. Make sure you engage the right stakeholders (including users!) in the discussion, from senior leaders to technical experts in the relevant functions. Engage them early and continue engaging them systematically throughout the project.

  1. Prioritize which evidence gaps and use cases you will address, based on value opportunity, company priorities, and existing evidence plans.
  2. Identify the data currently available to generate the evidence, as well as opportunities to close data gaps. Keep in mind that analyzing health data requires careful approaches to ensuring patient privacy and protecting company assets.
  3. Identify the analysis team(s) who will turn data into evidence. Whether they sit in your project team or are users of your data, make sure there will be sufficient (wo)manpower to turn data into evidence. I’m calling this analysis and not data science because not all scientific questions require data science, machine learning or AI.
  4. Define the technology context and evaluate whether existing tools and platforms will address your needs, or whether you need an entirely new platform. Make sure that your setup can provide the required governance, scalability, flexibility and – for data going to regulators – validation.

This is a highly iterative process where additional data, new analytic methods, or cutting-edge tools can enable entirely new solutions. Once you have the project defined, capture your ambitions succinctly. Objectives & Key Results (OKRs) are a very pragmatic approach which helps to be both aspirational and specific on deliverables (see John Doerr’s excellent book “Measure what matters”). OKRs should include key results for value generation, but also for specific steps as you turn data into evidence into value. And remember that you may have limited control over the actual value generation, as your users will use – or even generate – the evidence and turn it into value. As you review your overall progress, you may need to go back to the drawing board, re-invent the project, and pivot.

Side note on our program name: in Douglas Adams’ comedy science fiction franchise “The Hitchhiker’s Guide to the Galaxy”, supercomputer Deep Thought is tasked with answering the "ultimate question of life, the universe, and everything". The answer is perplexing: 42!? As the computer wisely adds: if you don’t understand the answer, you probably didn’t understand the question in the first place. This holds true for use cases requiring a clear scientific question, as well as for the project overall which needs a clear project definition.

No alt text provided for this image
Photo by Jason Goodman on Unsplash

Understand your audience and obsess about your users

The use of your data will be multi-faceted. Users may range from scientists formulating hypotheses to data scientists implementing analytics to senior executives using evidence for decision making. Therefore, it is essential to identify your key user groups and avoid building everything for all people. Prioritize users based on the likelihood of them adopting your platform and leveraging your data and evidence to generate value. Understanding how these (key) users work, use data, and collaborate with others is essential for a successful project.

  1. Conduct in-depth user research to identify and characterize targeted user segments, in our case across the drug development continuum from research to development to launch to patient access. Understand their key needs, motivations, and collaboration patterns, and review currently available evidence, data, and analytics tools. Research operations (ResOps) methodologies can help you hit the ground running.
  2. Leverage user experience (UX) methodologies to build the best possible experience. An important tool are personas, i.e., user archetypes capturing key needs and characteristics along with (made-up) personal details to make the persona relatable for product teams, e.g., “Mary, the Data Scientist”.
  3. Encourage, collect, and act on feedback frequently and systematically, from product design and development to ongoing use (FeedbackOps), and make acting on feedback a priority for your team(s).
  4. Provide effective services and support and be crystal clear how (and how fast) users can get help. There will often be a data file or software package missing, and users need clarity how fast they can be unblocked.

In data42, we created a dedicated Customer Success team. The team ensured user centricity and user research, engaged and onboarded users across the organization, managed feedback in the program, and helped users accomplish their goals on our platform, from trainings to problem solving to data and analytic services to community engagement.?

Be agile as you curate data and develop data, tools, and platform

Innovating with personal health data isn’t easy. Data and analytic approaches can be complex, the scale of data can be staggering, and trying to implement a final, fully scaled solution immediately is likely to fail. Implementing data curation and products in an incremental, iterative fashion is more likely to help you develop the best solution.

  1. Get first versions of data and product in front of users early: develop products in iteration from proof of concept (PoC) showing feasibility to minimum viable product (MVP) showing scalability and utility to full product. Collecting user feedback early allows you to adjust or even stop development when appropriate. As LinkedIn Co-Founder Reid Hoffmann put it: “If you are not embarrassed by the first version of your product, you have launched too late.”
  2. Ensure that each product has a clearly defined problem statement and target user group. Implement them with “two-pizza teams” led by product owners fully accountable that products are fit-for-purpose.
  3. Create a project-wide rhythm (in the agile methodology called “sprints”) to agree on a set amount of work to be completed, synchronize activities across teams, and ensure regular demos of new features to users.

As you build your project in an agile fashion, make sure you communicate your successes and value stories systematically to users and other stakeholders. This will help get your project team(s) and users to further push the envelope and get additional funding to take your project to the next level.?

No alt text provided for this image
Photo by Matteo Vistocco on Unsplash

Collaborate, collaborate, collaborate

Unlocking value from complex data is a team sport.

  1. Enable close collaboration across science, data, analytics, and tech as you define, design, and implement your project. In data42, we were lucky to have all this expertise in one team, which was vital for breakthrough innovation.
  2. Collaborate with users to design products and efficiently get data ready for analysis. It is not practical to make clinical trial data fully FAIR (findable, accessible, interoperable, and reusable). Instead, work closely with users to create the data needed for analysis, and FAIRify where possible.
  3. Enable users to build on the work of others, connect, and innovate together by creating communities, providing tools like project and code libraries, and sharing best practices and success stories.
  4. Engage with the health data ecosystem beyond your organization: use industry standards to enable collaboration, share data where possible, and engage in collaborative research opportunities with other partners.

It was terrific to be part of the leadership team bringing an ambitious idea to fruition as an internal startup at Novartis. In entrepreneurial fashion, we pivoted several times as we balanced democratization of health data across Novartis (going broad) with comprehensively enabling complex research agendas in selected disease areas (going deep). I took on different roles as the program expanded and matured.?We collaborated with world-class experts from science to data to analytics to technology across Novartis and beyond. But most importantly, we tackled these challenges together with a highly skilled, cross-functional and close-knit team passionate about improving health by accelerating pharma R&D with data and analytics.

In that spirit, let’s collaborate on improving health with data. I spent the last four years unlocking value from clinical trial (and related) data as Head of Products and Customer Success with our innovative data42 internal startup team at Novartis. I spent seven years enabling insights with health data from over 200 countries working as Chief Data & Technology Officer with the incredible team at IHME on the Global Burden of Disease study. And I am always happy to share insights and learn from you. I look forward to your comments, and feel free to ping me for a virtual or in-person discussion.


#health #analytics #innovation #pharma #healthcare

Sebastien Carnel

Global Head of IT

4 个月

Great article, thanks for sharing! In our search for the 'Great Machine,' data science is driving pharma research with unprecedented ambition and speed towards faster new treatments.

Joan Wong

Data Strategy Leader | Bioinformatics Expert | Driving Insights in Plant, Microbe & Infectious Disease Research | 20+ Years Experience

4 个月

Thanks Peter for this very insightful article! We will be taking these lessons to heart as we build out our data strategy.

Denis Davydov

Data Scientist and Analyst | BI architect | Customer insight researcher | Power BI developer | Computational psychometrics expert | 10+ years AI Implementation | PhD

8 个月

data42 is an impressive project and I was glad to hear about it. It’s great that you are overcoming both organizational and legal reasons as well the inconsistency of the RWD. I once tried to put together a data lake for a large medical organization, but everything was drowned in bureaucracy. Also it was difficult to trust medical records because sometimes you have to “read between the lines.” It seems to me that today, when LLM extracts structured data from the RWD, we can use this data with great confidence. And the models will be more robust. Good luck to data42!

回复
Charlie Gordon Fuller

Strategic Account Executive EMEA @ Dotmatics | Lab Informatics Specialist

1 年

Very insightful Peter! It's been a while since I caught up on Data42 and previously had some conversations about the initiative with several teams at Novartis back in 2020. You mention identifying 'data gaps' as a key piece of the puzzle. We found that only 12% of R&D scientist observations get captured straight in ELN - with 88% of info going on paper or staying in someone's head. Obviously, that can lead to a huge data gap which could make any subsequent AI/ML/Advanced Analytics pretty sub-optimal. Do you see this challenge at Novartis? How do you address this? Happy to connect and share some thoughts and ideas if you are interested.

要查看或添加评论,请登录

Peter Speyer的更多文章

社区洞察

其他会员也浏览了