How could we use generative-ai to empower key personas from clinical trials?
There are 3 key personas we will consider for this experiment and the scope will be limited to protocol-based information access:
The two primary goals are:
?? Developing a method for handling sensitive data in a secure way.
?? Providing clinical trial stakeholders with on-demand protocol-based information
The use of retrieval augmented generation (RAG) architecture has been a popular approach for reducing hallucinations. By deploying RAG with a local high-quality data source and then integrating with a local open-access LLM, we can securely and accurately extract insights without relying on any external services, including the internet. Although some enterprise products (GPT4 for developers) do offer LLM access which does not use your data to train the model, in our case the privacy by design eliminates many worst-case scenarios by putting you in control.
Regarding our second goal, we cannot expect patients to always read and understand these lengthy documents, or clinical research coordinators to compose complex queries for the information they require. So, if implemented correctly, this system can greatly benefit patients, clinical research coordinators, and sponsors leading to a well-coordinated trial, potentially with fewer protocol deviations and patient dropouts.
The queries from these stakeholders are in natural language, and the answers they seek are often embedded in unstructured text. This scenario is ideally suited for LLM-based retrieval.
Here is a setup used for the testing:
- on a 97-page open-access clinical trial protocol
- by setting up a local instance of LLM
- along with the RAG pipeline (loading, splitting, and storing the embeddings)
- using 3 models LlaMA2 7B, LlaMA2 13B, Mistral 7B (using 4-bit integer quantization)
A quick summary of the experiment:
领英推荐
Details of the experiment:
Sharing the responses from LlaMa2 13B only to keep this post short. The code and other responses are here.
Sample Patient Queries:
This certainly has the potential to provide on-demand information to the trial participants and build trust by providing objective answers.
Sample CRC queries:
Comments:
Sponsor Queries:
The site count is correct but models failed to get the country-related information. We are basically asking 2 questions and all 3 models failed to provide the list of associated counties.
Here, the goal was not only to find all visits but to count them like a math agent, as the count information is not available in the protocol. Although the model provided all visits associated correctly with vital signs data collection from page 45 of the protocol, it failed to calculate the count.
Considering that we are using a watered-down version (4-bit quantization) of low-end models from the LlaMA family, these results are not bad at all. I would imagine the 70B model with float16 would do much better.
Note: Opinions expressed here are my own.
Senior Manager, Data Oprations at Medidata AI, Medidata Solutions
1 年This is great Akshay!