How could we use generative-ai to empower key personas from clinical trials?
Image credit: DALL-E 3

How could we use generative-ai to empower key personas from clinical trials?

There are 3 key personas we will consider for this experiment and the scope will be limited to protocol-based information access:

  • Patients: Patients are the participants in a clinical trial. Their role is to provide informed consent to participate, adhere to the trial protocols (such as taking medications, undergoing tests, or attending appointments), and provide feedback or data about their experiences, side effects, or outcomes. They are the central focus of the trial, as the data collected from them is critical for assessing the safety and efficacy of the treatment being studied.
  • Clinical Research Coordinators (CRCs): CRCs are responsible for the day-to-day operations and management of the clinical trial at the study site. They work closely with the investigators to ensure the trial runs smoothly. Their duties include recruiting and screening patients, obtaining patient consent, ensuring protocol compliance, collecting and managing patient data, coordinating with other staff, and maintaining accurate records. They serve as a key liaison between the patients, the research team, and the sponsors.
  • Sponsors: Sponsors are typically pharmaceutical companies, biotechnology firms, or research institutions that fund the clinical trial. They are responsible for designing the study protocol, providing the treatments or devices being tested, and analyzing the collected data. Sponsors also ensure compliance with regulatory requirements, monitor the progress of the trial, and report the results. Their role is crucial in providing the necessary resources and oversight to conduct the trial and in ensuring that the trial meets its objectives and adheres to ethical standards.


The two primary goals are:

?? Developing a method for handling sensitive data in a secure way.

?? Providing clinical trial stakeholders with on-demand protocol-based information

The use of retrieval augmented generation (RAG) architecture has been a popular approach for reducing hallucinations. By deploying RAG with a local high-quality data source and then integrating with a local open-access LLM, we can securely and accurately extract insights without relying on any external services, including the internet. Although some enterprise products (GPT4 for developers) do offer LLM access which does not use your data to train the model, in our case the privacy by design eliminates many worst-case scenarios by putting you in control.

Regarding our second goal, we cannot expect patients to always read and understand these lengthy documents, or clinical research coordinators to compose complex queries for the information they require. So, if implemented correctly, this system can greatly benefit patients, clinical research coordinators, and sponsors leading to a well-coordinated trial, potentially with fewer protocol deviations and patient dropouts.

The queries from these stakeholders are in natural language, and the answers they seek are often embedded in unstructured text. This scenario is ideally suited for LLM-based retrieval.


Here is a setup used for the testing:

- on a 97-page open-access clinical trial protocol

- by setting up a local instance of LLM

- along with the RAG pipeline (loading, splitting, and storing the embeddings)

- using 3 models LlaMA2 7B, LlaMA2 13B, Mistral 7B (using 4-bit integer quantization)


A quick summary of the experiment:

  • The LlaMA2 13B model provides much more precise answers than the 7B model
  • The LlaMA2 13B model also corrects some of the wrong or ambiguous answers provided by the 7B model (study arms and post-enrollment pregnancy)
  • The 13B model still fails to answer some questions like countries associated with the trial and lead investigator details.


Details of the experiment:

Sharing the responses from LlaMa2 13B only to keep this post short. The code and other responses are here.

Sample Patient Queries:

This certainly has the potential to provide on-demand information to the trial participants and build trust by providing objective answers.

Sample CRC queries:

Comments:

  1. It is impressive that for the first query, the model not only identified vaccines that might affect the patient's eligibility but also confirmed the specific vaccine (MMR) the patient received is from the attenuated class prohibited in this protocol.
  2. This can also allow investigators to query the study design aspects.

Sponsor Queries:

The site count is correct but models failed to get the country-related information. We are basically asking 2 questions and all 3 models failed to provide the list of associated counties.

Here, the goal was not only to find all visits but to count them like a math agent, as the count information is not available in the protocol. Although the model provided all visits associated correctly with vital signs data collection from page 45 of the protocol, it failed to calculate the count.

Considering that we are using a watered-down version (4-bit quantization) of low-end models from the LlaMA family, these results are not bad at all. I would imagine the 70B model with float16 would do much better.

Note: Opinions expressed here are my own.

Rafael Campo

Senior Manager, Data Oprations at Medidata AI, Medidata Solutions

1 年

This is great Akshay!

要查看或添加评论,请登录

Akshay Chougule的更多文章

社区洞察

其他会员也浏览了