登录查看更多内容

LLM Prompting For NMA Feasibility Assessments

Brian Hutton

Research scientist - network meta-analysis, indirect comparisons, real-world evidence, medical writing, systematic reviews

发布日期: 2024年9月18日

Overview

Use of large language models (LLMs) like ChatGPT in the context of health economics and outcomes research (HEOR) work is rapidly expanding while simultaneously changing the workflows historically followed by many. A current barrier that many researchers still wrestle with is approaches to prompt generation to get the most out of their work with LLMs like ChatGPT. Learning opportunities and user experiences are rapidly growing, however case studies in one’s field of work can often prove especially helpful toward enhancing the pace of learning and experience.

I’ve long had a research focus in knowledge syntheses like systematic reviews, where the race to develop use of LLMs for efficiencies in both SR completion and budget reduction is extremely competitive. I’ve also had a long-lasting interest in network meta-analysis (NMA) for the performance of indirect treatment comparisons (ITC). While the use of LLMs to generate and/or de-bug statistical code for platforms like R during the conduct of NMAs is perhaps more familiar to biostatistical researchers working in this space, its use to explore NMA data sets for non-statistical team members appears less well explored.

This article is intended for those who may not have experience and/or an interest in statistical programming, but who may still have a keen interest to explore their NMA datasets! The example presented is based upon an anonymized NMA data set from past research, which was prepared in Microsoft Excel software and provided to ChatGPT in this format. The article presents rationale, prompts used within ChatGPT-4o and corresponding visualizations that resulted. I hope this can help other researchers in the exploration of their own NMA data sets.?

Step 1: Read the Data into ChatGPT

Task summary:

Initially we need to read in our data set to be inspected, with a header of variable names to identify the study names, comparators, outcome data, and other measures such as study-level covariates. We can provide the data in spreadsheet format and identify these inputs by name to facilitate subsequent steps in the process. The screenshot below illustrates our data set format for this example.

The prompt:

Please review the attached file that contains a data set I wish to inspect as I prepare to perform an NMA. The file contains columns for a study identifier (called ‘study’), identifiers for the treatments compared (‘t1’, ‘t2’, ‘t3’), the numbers of patients with the event of interest per treatment group (‘r1’, ‘r2’, ‘r3’), the sample size per treatment group (‘n1’, ‘n2’, ‘n3’), and study-level covariates that will be of interest to investigate when judging the comparability of trials (‘Age’, ‘Female%’, ‘Surgery_type’, ‘RoB’).

Step 2: Understand Your Data and Establish Connectivity

The task:

Let’s visualize the evidence for this outcome. Preparing a network diagram can: show us how many treatments we’re comparing and what comparisons have direct data available; help us establish whether a connected network of trials is present; determine the scope of the analysis (i.e., treatments to be compared); and can also possibly shed light on considerations for our analysis based on the sparsity of the network (e.g., fixed versus random effects analysis).

The Prompt:

I would like to generate a publication-quality network diagram with the following specifications: 1. Each node represents a treatment and should be sized proportionally to the total number of patients in the studies; 2. Each edge represents a direct comparison between treatments and should be weighted by the number of trials; 3. Use a circular layout to evenly distribute the nodes; 4. Ensure the treatment labels appear within the nodes; 5. Use a clear and readable font size for all labels to make sure the plot is readable. Make the diagram aesthetically pleasing and suitable for publication.

Bonus prompt:

Sometimes the plot will be made too large for the plot window; if so, try the following prompt: “Please proportionally shrink the sizes of the nodes so that the network diagram fits in the plot window”. Adjustments for network edges and font can similarly be made when needed.

The takeaway: we have a connected network available for an NMA. Most evidence is against inactive treatment A, and thus the availability of direct evidence is sparse.

Step 3: Investigate Distributions of Effect Modifiers

The Task:

Developing an understanding of whether the study populations from the included trials are sufficiently similar (i.e., ‘jointly randomizable’) is an important task toward producing a valid NMA. Comparison of patient eligibility criteria is one important step. Another is to inspect visualizations such as box plots and bar plots of established effect modifiers to evaluate differences between study populations. In this example, the distributions of age, % female participants and surgery type are known to be clinically important.

The Prompt:

Using the data provided, please create a boxplot where the x-axis is the categories of treatment comparison (e.g. A vs B), and the y-axis is the average age across studies for the set of studies that corresponds to each comparison. Please also generate a similar plot for the ‘Female%’ variable. For the surgery variable, please generate a stacked bar chart which reflects the proportion of trials for each type of surgery (i.e., oncology, orthopedic, major non-oncology, cardio-vascular) for each treatment comparison.

The takeaway: we have some variability in both mean age and % of female patients across treatment comparisons, including a small number of outlier studies. The mixture of surgical populations also varies by treatment comparison. The clinical relevance of these differences and the need to consider regression adjustments, subgroup analyses and/or sensitivity analyses should be discussed with clinical experts from the team.

Bonus Prompt:

Some researchers find box plots a bit challenging to interpret and prefer a look at the distribution of a covariate across studies more simply. One can do this with a bar plot. A plot to do this for the age covariate is as follows: “Please generate a bar plot of the age covariate (y-axis is age, x-axis is study). Please calculate a weighted average of age across all studies, and add a horizontal red dashed line to the bar plot to indicate this value. slightly shrink the font size to enhance readability on the x axis. Also, please order the studies in decreasing order of age.”

Step 4: Investigate Risk of Bias of Included Studies in the Network

The Task:

It can be of interest to explore the extent of potential bias within different components/ comparisons of the evidence network, as this can have implications for the ability to generate findings from NMA of high validity. Reviewing findings from RoB assessments can provide the research team with helpful insights in this regard.

领英推荐

Fine-tune LLM to Teach AI Knowledge

Blockchain Council 6 个月前

The Science of Detecting LLM-Generated Text

ACM, Association for Computing Machinery 11 个月前

FOD#50: The Rise of Self-Evolving Language Models

TuringPost 10 个月前

The Prompt:

Generate a horizontal stacked bar chart summarizing the risk of bias for each treatment comparison in the clinical trials dataset. Each bar should represent one unique treatment comparison, with segments showing the percentages of studies classified as high risk (red), low risk (green), and unclear risk (yellow) for bias. Use columns 't1', 't2', 't3' for treatments and 'RoB' for risk of bias (H=high, L=low, U=unclear). Calculate the percentages based on the total number of studies for each comparison. Please also label each segment of each bar with the percentage, rounded to one decimal place.

Variations in the proportion of studies at high risk of bias across comparisons may inform decisions about sensitivity analyses within the systematic review and warrant discussion by the review team.

Step 5: Investigate Outcome Event Rates By Treatment and Comparison

The Task:

Exploring variability in event rates across studies for each treatment within the evidence network can be informative toward judgements about between-trial heterogeneity. For example, variability in event rates may be a marker for differences between trials in outcome definitions, study populations, settings, and/or co-interventions.

The Prompt:

Please assess the comparisons made within each of the trials in the dataset using columns t1, t2 and t3. Tell me how many treatments are in the network and list their names. Also tell me the numbers of trials, patients and events that are associated with each treatment in the network. Please calculate the risk of the event in each treatment group of each trial internally. Please add to the table above the minimum risk, median risk and maximum risk observed across trials for each treatment. Output the summary of trials, patients, events and risk levels to a table containing one row per treatment; for any decimal values, round to two decimal places.

The takeaways: we gain understanding of the different amounts of evidence available for each intervention. Differences in the min/max risk for several interventions suggest it is worthwhile to ensure similarity of outcomes, populations, & settings.

Bonus prompt:

It can also be informative to examine the extent of evidence within each treatment comparison in the network. This can be done with the following prompt: “Next please summarize the numbers of patients (n1, n2, n3), events (r1, r2, r3), and number of studies underlying each pairwise comparison of treatments in the data. Also summarize the proportion of all patients experiencing an outcome event within each pairwise comparison. Please use the data to generate a table summarizing this information, rounding to two decimal places where necessary.”

Step 6: Exploration of Control Group Event Rates

The Task:

When dealing with binary outcomes, investigation of the event rates within trials can be helpful. Identification of trials associated with single or all zero cells can identify the potential need to exclude certain trials from analyses. Detection of the presence of significant variation in the event rate for the control group in an evidence network (e.g., placebo or treatment as usual) can also help identify situations where a meta-regression analysis accounting for control group risk may be needed. This is known to be especially important in some clinical areas such as psoriasis and rheumatoid arthritis.

The prompt:

I wish to create a bar plot that allows me to examine the risk of the outcome across trials in treatment group A. For all studies with t1 having a value of "A", please calculate the risk in the A group by computing it as r1/n1. Please then generate a bar plot of the risk in the A group restricting to only studies where t1 has a value of A. Please calculate a weighted average of this risk across this set of studies, and add a red dashed line to the bar plot to indicate this value. slightly shrink the font size to enhance readability on the x axis. Second, please order the studies in decreasing order of risk.

Takeaways: we can see from this plot that there exists considerable variability in even rates within the control group. This appears to suggest that review of outcome definitions, measurement approaches, populations, settings and possibly a network meta-regression adjustment for control group risk could be worthwhile.

Bonus prompt: It can be of interest to identify studies associated with one or multiple zero cells, which can require special handling during analysis. The following prompt can identify these studies: “Tell me any studies that have a value of 0 for either r1, r2 or r3. Please generate a list of these studies that includes the values for ‘study’, t1-t3, r1-r3 and n1-n3.”

What Else Might be Helpful and Intriguing for Exploration?

This article presents some general approaches to LLM use to explore an NMA data set, however a variety of other additional strategies may be useful. It’s worth exploring these and other ideas to see if they might be suitable to your workflow.

Review of eligibility criteria, and patient/study traits

LLMs can perform well in reviewing tables of information and identifying/synthesizing nuances based on user-provided prompts. It’s ability to contribute to the assessment of evidence tables to identify variations in patient eligibility criteria, baseline patient measures and study design information based on user-provided tables is likely to also be helpful. This may be an additional helpful 'trick' to improve the way we assess the key assumptions that underlie NMA.

Development of plots within subgroups

The principles applied above also work well to develop similar plots within subgroups of interest (e.g., surgery type within the example used above). Looking deeper into the data can enhance decision-making about the analytic plan to be carried out for an NMA of interest.

Developing code for additional statistical explorations

LLMs like ChatGPT are very helpful to develop, for example, R code for tasks such as pairwise meta-analyses (to explore the ‘direct’ evidence within the NMA) and to assess the consistency of direct and indirect evidence in NMAs to allow for additional insights beyond those shared above. Both can play valuable roles when deciding upon the appropriateness of performing and presenting an NMA.

And of course, always use prompts to refine output....?

As I’ve found in the use of LLMs and prompting for many different purposes, often it can be helpful to provide additional instruction about tasks when you're not entirely pleased with the initial output (e.g., coloring within plots; font sizing; and other such details). All such guidance can serve useful toward optimizing your outputs from the above prompts. Having LLMs do most of the heavy lifting makes life a little easier!

Wrap-up

I'd love to hear from others exploring the use of LLMs in different ways to explore the conduct of NMAs and other ITC approaches. Don't hesitate to reach out!

Kristian Thorlund

5 个月

Brilliant article, Brian! Very soon, everyone will be able to do SLRs and NMAs in just a few minutes.

Mohd.Kashif Siddiqui

5 个月

Thank you for sharing this insightful post, Brian. I completely agree with the potential of AI and LLMs in transforming tasks like screening of abstracts and titles, full text screening and to some extent data extractions for systematic reviews. There are a few developments in indirect treatment comparisons (ITC) and HE-modelling space too. At our end, we've developed internal tools that are highly efficient, allowing us to run processes like network meta-analyses and survival extrapolations with just a single click. In a matter of minutes, we get all the outputs needed without worrying about the programming behind it. Though these are not still utilizing any AI/ML algorithms. In fact, we’re already incorporating AI into our tools to make them even faster and more robust, taking things to the next level. Looking forward to seeing how this space evolves and to learn more from others in the field!

1 次回应

Stephen Brown

5 个月

Great read Brian! It is also very timely as I was just having this conversation today.

1 次回应

查看更多评论

要查看或添加评论，请登录

Brian Hutton的更多文章

AI-Informed Systematic Reviews and “Humans in the Loop”

2024年11月18日

AI-Informed Systematic Reviews and “Humans in the Loop”

What’s This Article About? The introductory post to this article shared some introductory considerations about the…

2 条评论
Getting to know ChatGPT for medical article review: illustration for clinicians and researchers who have not taken the plunge

2024年10月11日

Getting to know ChatGPT for medical article review: illustration for clinicians and researchers who have not taken the plunge

What’s this Article About? In recent years, the integration of AI-driven tools and structured methodologies for…

4 条评论
Selection of Effect Modifiers and Prognostic Factors for Indirect Treatment Comparisons

2024年10月4日

Selection of Effect Modifiers and Prognostic Factors for Indirect Treatment Comparisons

In the world of comparative effectiveness research, indirect treatment comparisons (ITCs) are essential when…

2 条评论
Case Finding Algorithms Using Real-World Data

2024年9月27日

Case Finding Algorithms Using Real-World Data

Introduction The increasing availability of real-world data (RWD) from electronic health records (EHRs), claims…

2 条评论
Overview: ITC Software for New Users

2024年5月6日

Overview: ITC Software for New Users

What's This Article About? As a professor of graduate courses and workshops related to health technology assessment…

6 条评论
Adapting to AI in HEOR

2024年5月3日

Adapting to AI in HEOR

LLMs and AI have Arrived with Authority! Many academics, research groups, consultancies and pharmaceutical companies…

2 条评论
Reporting Considerations for Matching Adjusted Indirect Comparisons

2024年2月23日

Reporting Considerations for Matching Adjusted Indirect Comparisons

Over the past decade, the use of Matching-Adjusted Indirect Comparisons (MAICs) for Health Technology Assessment (HTA)…

2 条评论
Population Adjusted Indirect Comparisons: Room to Improve

2023年10月23日

Population Adjusted Indirect Comparisons: Room to Improve

What is this article about? Population Adjusted Indirect Comparisons (PAIC; including approaches such as matching…
Component NMA: Solving Puzzles of Complex Interventions

2023年10月20日

Component NMA: Solving Puzzles of Complex Interventions

What is this article about? While the research and health technology assessment (HTA) crowds have continued to build…
Systematic Literature Reviews and Appraisal of Indirect Treatment Comparisons

2023年7月17日

Systematic Literature Reviews and Appraisal of Indirect Treatment Comparisons

SLRs of Indirect Treatment Comparisons: Value Gained and Methodologic Considerations While systematic literature…

See all articles

Overview

Step 1: Read the Data into ChatGPT

Task summary:

The prompt:

Step 2: Understand Your Data and Establish Connectivity

The task:

The Prompt:

Bonus prompt:

Step 3: Investigate Distributions of Effect Modifiers

The Task:

The Prompt:

Bonus Prompt:

Step 4: Investigate Risk of Bias of Included Studies in the Network

The Task:

领英推荐

The Prompt:

Step 5: Investigate Outcome Event Rates By Treatment and Comparison

The Task:

The Prompt:

Bonus prompt:

Step 6: Exploration of Control Group Event Rates

The Task:

The prompt:

What Else Might be Helpful and Intriguing for Exploration?

Review of eligibility criteria, and patient/study traits

Development of plots within subgroups

Developing code for additional statistical explorations

And of course, always use prompts to refine output....?

Wrap-up

Brian Hutton的更多文章

AI-Informed Systematic Reviews and “Humans in the Loop”

Getting to know ChatGPT for medical article review: illustration for clinicians and researchers who have not taken the plunge

Selection of Effect Modifiers and Prognostic Factors for Indirect Treatment Comparisons

Case Finding Algorithms Using Real-World Data

Overview: ITC Software for New Users

Adapting to AI in HEOR

Reporting Considerations for Matching Adjusted Indirect Comparisons

Population Adjusted Indirect Comparisons: Room to Improve

Component NMA: Solving Puzzles of Complex Interventions

Systematic Literature Reviews and Appraisal of Indirect Treatment Comparisons

社区洞察

其他会员也浏览了

Embrace it or reject it? Academics disagree about ChatGPT

AlphaLLM: An LLM that Learns and Improves Itself

How ChatGPT And Natural Language Technology Might Affect Your Job If You Are A Computer Programmer

Unlocking Precision: The Art of Fine-Tuning Language Models

Leveraging the Power of Knowledge Graphs: Enhancing Large Language Models with Structured Knowledge

LLMs Continue to Hallucinate – and accurate measurement of how much exactly, reveals a stunning picture

Make Work Simpler with Large Language Models (LLMs)

Natural Language and how it can be used Part 1

RAFT: The Synergy of RAG and Fine-Tuning in Language Models

How Chat GPT is changing the landscape of software development