Gemini Pro's Unexpected Success in Identifying Patient Eligibility for Clinical Trials: A Comparative Analysis
RxDataScience
Pioneering life sciences transformation through AI, advanced analytics, and data science
As more Generative AI (Gen AI) tools become available and existing ones continue to evolve, our team is continuously testing and experimenting across each one to ensure we are using the most effective application to yield the best results. This week we took the just-released Gemini Pro for a test spin to see how it stacks up against similar Gen AI models in interpreting patient eligibility for clinical trials. Our test was a success and came with a few surprises! Read on for more about the experiment lead by Dr. Rajeshwari Punekar, MPH, Ph.D. and Nataraj Dasgupta ??
Objective: To assess the comparative accuracy of multiple large language models (LLMs) identifying patients eligible for clinical trials based on their demographic (e.g., age, gender, etc.) and clinical characteristics (e.g., stage, medication history, allergies, etc.) and inclusion/exclusion (I/E) criteria of a sampling of non-small cell lung cancer (NSCLC) studies.
Data: The RxDataScience team selected 10 NSCLC drug-only interventional clinical trials located in the US from clinicaltrials.gov. Two sets of adult patient profiles (50 eligible patients and 50 non-eligible patients for a total of 100 patients) were manually created by an epidemiologist who ensured we included a mix of very complex and more straightforward patient profiles.
Analysis: The team used 5-shot prompting, a prompting technique in LLMs that includes worked-out question-answer examples in a prompt before asking the new (unknown) question. In this experiment, the prompt included a sample trial’s I/E criteria, 5 patient profiles, and corresponding answers related to the patient’s eligibility followed by the I/E criteria of the new trial (NSCLC). In this experiment, five LLMs were evaluated: GPT4, Gemini Pro, GPT4-Turbo, LIama2, and GPT 3.5. The evaluation was based on the probability of accurately identifying eligible patient profiles.?
Results: OpenAI’s GPT 4 (96%) was most successful in interpreting the complex I/E criteria and identifying eligible patient profiles, followed by Google’s Gemini Pro (88%).
领英推荐
Conclusion:? The results from Gemini Pro surprised us as they were at par with GPT 4 Turbo. Gemini Pro, available for free, has been described as Google’s equivalent of OpenAI’s GPT 3.5 which scored only 30% in this experiment. By contrast, at 88%, Gemini Pro’s results are even better than GPT4-Turbo, OpenAI’s premium offering, which scored 85%.
Although the output format from Gemini Pro was not always consistent, the determination of eligibility was surprisingly accurate even across the very complex patient profiles. The study findings are somewhat counterintuitive as public benchmarks have shown suboptimal results for Gemini Pro relative to GPT 4, but there might be specific and nuanced use cases where the new LLM from Google may yield surprising results. This is the first study that compares the accuracy of patient eligibility identification across five well-known LLMs, however, these results are preliminary, and the generalizability of this study is very limited. Therefore, further experimentation with larger sample sizes and real-world patient profiles from electronic medical records (EMR) or electronic health records (EHR) is needed to formally assess the performance.
What do you think? Have you tried Gemini Pro yet? Do you have another experiment idea for the RxDataScience team? We’d love to hear from you!