登录查看更多内容

Exemplar project/tutorial. Sample size calculation for multi-arm Clinical trials using R

Darko Medin

Data Scientist and a Biostatistician. Developer of ML/AI models. Researcher in the fields of Biology and Clinical Research. Helping companies with Digital products, Artificial intelligence, Machine Learning.

发布日期: 2022年11月30日

One of the most important aspects of any Clinical study is determining the required sample size for the study to work out in the end. This is a critical in multi-arm and multi-stage Clinical trials where sample size calculation is a bit more complex. However in this exemplar tutorial i will try to simplify the process using R[1] and packages 'MAMS'[2] and 'multiarm'[3]. For this tutorial we will use RStudio IDE [4]

So lets start:

The function we will be using is very straightforward mams() and set arguments for it.

Here are some syntax rules to learn first.

K - Use this argument to set the number of experimental arms (note that these do not include any control arms). So if we set the K to 3, the actual number of defined arms in a study are at least 4, because we need at least one control arm.
J - with J we can se number of stages (lets keep it at 1 stage for now).
alpha - Use this to set the level of statistical significance. The general standard is 0.05 in academic world, which mean all values of p<0.05 will be considered as significant. So the design of the sample size calculation is based on 0.05 threshold. If we were to lower the threshold the larger the required sample size would be and vice versa.
delta - This is the effect of interest we want to identify. Note this is defined in communication with Medical experts, but there are also statistical scales on what the small, medium or large effect is. The smaller the effect of interest we are looking for, the larger the required sample will be.
delta0 - The effect that we consider as not interesting. This is very important for the algorithm to differentiate a spectrum of uninteresting vs interesting effects.
sd - sample size is also determined using the standard deviation of the effect. Keep in mind, the larger the standard deviation, the larger variability in the data and the larger the required sample size will be needed.
power - This is on of the most important arguments. Setting the right power can define the success or failure of the sample size calculation. Most researchers think that 80% or 0.8 power would be enough, but this is not the case most of the time. I will initially set this argument to 90% or 0.9.
For now just notice how i set p=NULL and p0=NULL. Will talk about this later.

One of the biggest mistakes at this point would be to output the results directly. We have to account for potential uncertainty around the dropout rates. Typically the dropout rate is defined in communication with medical experts running the Clinical trial, but let say for this tutorial the they gave me a feedback that expected dropout rate is 10%. I typically like to add a bit more dropout on top what the experts would say to be more certain. So i will consider a potential dropout to actually be 15%. You can observe this on the part of the code where i multiplied the output results using onsestage$n *1.15 and onsestage$N *1.15. These would correspond to the sample sizes with added 15% dropout rates.

You can see that the initial per arm calculated sample size was 102 and 408 overall ( as i said this package K is meant for experimental arm and plus one control arm mean per arm * 4)

But this would not be a good estimate if i didnt add 15% - the dropout rate, so the final per arm required sample size is actually 118 and 472 overall. So to interpret this we have to get back to the code - K was 3, J was 1. For a one stage 3-arm design with 0.44 effect and standard deviation of 0.9 it would be needed to have at least (note the word at least) 118 subject in each of the arms or 472 overall to achieve 90% Statistical power. I said at least because the sample could be larger and it would still achieve the 90% power or more

Now lets try to build a two stage design with 1:2 allocation ratio.

Before i start explaining the entire block of code, take a look how i set again p=NULL and p0=NULL. If i didn't set these and still used the delta argument i would receive an error which would to tell me to specify this.

Now to the entire block of code...

So this time, K=4, J=2 for 4-arm two stage design. I will keep the same alpha, but for multiple stage designs and multiple groups i prefer two approaches, either to have a a high powered calculation or add a multiple correction method. For this block of code, i will add the 0.98 power and will complement that with lowering the effect size a bit compared to the defined one (0.44). This will make sure the sample size i calculate can compensate for some of the uncertainty around the multi stage design. Let see the results...

The final sample size with added dropout rates is 152 per arm and 608 overall for stage 1 but for stage two its actually 1320 overall with accounting the allocation ratio. One important aspect that i used in both designs is the ceiling() when calculating the final per arm sample size. this function will round the values to have integer number of subjects, but on the top side. Do not use round() function for this as you might end up with -1 subject calculated if the decimal places for rounding are bellow 0.5 for the last subject.

Next lesson - using another pacakge - 'multiarm' and adding the multiple comparison correction. Note to install this package we will need to use devtools and install_github() fuunction.

Next - creating the one stage 7 arm design with Dunnett's multiple comparison correction. Check the code-block bellow...

领英推荐

A Revolutionary AI-Driven Framework Integrating LLMs…

Anand Ramachandran 5 个月前

The Rational Candidate’s Guide to Clinical Research…

Dan Sfera 3 个月前

FDA’s 2024 Guidance for Decentralized Clinical Trials…

Alex Benjamin 2 个月前

Notice how i had to add 7 sigmas eventough K=6. Exactly as in the 'MAMS' package K=6 means this is at least 7 arm design because we need at least 1 control group which is not included in K. This means i have to define sigma or standard deviation for 6 experimental arms + 1 control arm. I will make all sigmas 0.9 and have other setting same as before, but you may notice new syntax now. 'beta' is used to define the level of power but in a reversed setting compared to 'MAMS' where we had power argument set to 0.9 or 0.98. Here we have 'beta' which is used to set 1-power. This means if i set beta=0.1, the power will be 0.9. The power argument in 'multiarm' is used to define the type of Statistical power (eg. Conjunctive vs Disjunctive). For this tutorial i will use Conjuctive power argument.

The most important part of the syntax in this case is the argument for multiple comparison correction which is essential with 7 arms. To define the correction simply type : 'correction=' . I will use Dunnett's correction which is well optimized for multi-arm Clinical trials, so 'correction=dunnett's' will do the work.

Lets finalize the calculation by adding the 15% dropout rate and view the results...

The per arm required corrected sample size is 176 and 1232 overall for the study. Now lets try another correction method - Bonferroni multiple testing correction method.

As you can see i changed the correction argument to 'correction=bonferroni'. The Bonferroni correction is more conservative than Dunnett's, so we can expect a larger required sample size using this method.

Lets check the result...

As expected the required sample size is larger with Bonferroni correction, so the final result for this hypothetical one stage 7 arm trial is 184 per arm or 1288 overall.

In this article different multi arm design packages use principle were shown in RStudio and using R programming language. In the next tutorial in Sample size calculation series, Simulations for Sample size requirements will be the main theme. Thank for reading!

By Darko Medin,

Biostatistician and a Data scientist

References :

1.https://www.r-project.org/

2.https://cran.r-project.org/web/packages/MAMS/MAMS.pdf

3.https://mjg211.github.io/multiarm/

4.https://posit.co/

Nouran Hamza

RWE |Biostatistics | Public Health | Clinical Research | Medical writing (Protocol-CSR-Consensus - white papers - ....)

2 年

Rahma Sweedy Marina Saleeb

4 次回应

要查看或添加评论，请登录

Darko Medin的更多文章

OncoNeo400 - New AI Confidence Interval feature

2025年3月25日

OncoNeo400 - New AI Confidence Interval feature

What's one of the main aspects that can bring a Statistical Advantage to an AI model? Improving individual predictions…
OncoNeo400 - A new Precision Oncology Research AI tool on BioAIWorks

2025年3月16日

OncoNeo400 - A new Precision Oncology Research AI tool on BioAIWorks

In this edition the OncoNeo400, novel Precision Oncology Research AI tool on BioAIWorks platform (bioaiworks.com).

7 条评论
LARVOL CLIN - New modules

2025年3月3日

LARVOL CLIN - New modules

This featuring article is about the new modules Larvol Pseudo-IPD and Larvol NMA on https://clin.larvol.

1 条评论
AI Developer tech skillsets.

2025年2月24日

AI Developer tech skillsets.

While these skills may vary according to the role, i will discuss the most significant ones that almost every AI…

2 条评论
Featuring article - the book : How To Be an Effective Statistician by Dr. Alexander Schacht

2025年2月16日

Featuring article - the book : How To Be an Effective Statistician by Dr. Alexander Schacht

The book How To Be an Effective Statistician: A Guide for Statisticians, Data Scientists, and Other Quantitative…

2 条评论
Causal Inference II Live - The ORIENTATION

2025年2月11日

Causal Inference II Live - The ORIENTATION

Causal Inference II is a Live Linkedin Event by Justin Bélair and Darko Medin . Here is the orientation on how and when…

9 条评论
Simulated and Synthetic Data Generation - Edition 1

2024年10月31日

Simulated and Synthetic Data Generation - Edition 1

The first in the series for Simulated and Synthetic Data Generation - by Darko Medin. Where to read :…
Simulated and Synthetic Data Series by Darko Medin - An ORIENTATION

2024年10月20日

Simulated and Synthetic Data Series by Darko Medin - An ORIENTATION

This is the orientation for my upcoming Series on Simulated and Synthetic Data. If you have any additional suggestions…

5 条评论
Simulated and Synthetic Data Generation - The Effective Statistician Workshop ORIENTATION - Lead by Darko Medin

2024年10月13日

Simulated and Synthetic Data Generation - The Effective Statistician Workshop ORIENTATION - Lead by Darko Medin

In today's data-driven world ability to generate Simulated and Synthetic data is one of the most important Data Science…
INTRODUCTION TO DEEP LEARNING

2024年10月3日

INTRODUCTION TO DEEP LEARNING

The INTRODUCTION TO DEEP LEARNING tutorial. Where to find? adatascience.

See all articles

Exemplar project/tutorial. Sample size calculation for multi-arm Clinical trials using R

Darko Medin

Data Scientist and a Biostatistician. Developer of ML/AI models. Researcher in the fields of Biology and Clinical Research. Helping companies with Digital products, Artificial intelligence, Machine Learning.

领英推荐

Darko Medin的更多文章

社区洞察

其他会员也浏览了

The DIY Clinical Evaluation Report - When to Go For It

Clinical Data Requirements: EU vs. US

Understanding Literature Review and Clinical Data Trends

Medical Coding Process

RBQM in E6(R3) Made Simple. The Smart Guide.

Is the High Cost of Medical Science Liaisons Justified? Let's Talk About It.

CER Checklist: 12 Attributes of an Excellent CER Writer to meet the Clinical Evaluation Requirements

Can Anyone Write a Clinical Evaluation Report?

Modernizing Good Clinical Practices with ICH E6(R3)

Sunday Read: Rules & Risks of Medical Experimentation

领英推荐

Darko Medin的更多文章

OncoNeo400 - New AI Confidence Interval feature

OncoNeo400 - A new Precision Oncology Research AI tool on BioAIWorks

LARVOL CLIN - New modules

AI Developer tech skillsets.

Featuring article - the book : How To Be an Effective Statistician by Dr. Alexander Schacht

Causal Inference II Live - The ORIENTATION

Simulated and Synthetic Data Generation - Edition 1

Simulated and Synthetic Data Series by Darko Medin - An ORIENTATION

Simulated and Synthetic Data Generation - The Effective Statistician Workshop ORIENTATION - Lead by Darko Medin

INTRODUCTION TO DEEP LEARNING

社区洞察

其他会员也浏览了

The DIY Clinical Evaluation Report - When to Go For It

Clinical Data Requirements: EU vs. US

Understanding Literature Review and Clinical Data Trends

Medical Coding Process

RBQM in E6(R3) Made Simple. The Smart Guide.

Is the High Cost of Medical Science Liaisons Justified? Let's Talk About It.

CER Checklist: 12 Attributes of an Excellent CER Writer to meet the Clinical Evaluation Requirements

Can Anyone Write a Clinical Evaluation Report?

Modernizing Good Clinical Practices with ICH E6(R3)

Sunday Read: Rules & Risks of Medical Experimentation