Network Update #62
Statistical Programming
PharmaSUG 2024 - #BestPaperAward recipient for #AdvancedProgramming is by David Bosak, Archytas Clinical Solutions for his paper about the PROCS package. David demonstrates that this R package encapsulates commonly used statistical procedures, such as PROC FREQ, PROC MEANS, PROC TTEST, and PROC REG. The package also replicates PROC TRANSPOSE, PROC SORT, and a bit of PROC PRINT. Read the full paper in conference proceedings: Link
Also from David Bosak Dear SAS and R Friends:Here is a short video on data libraries in R. ?? This concept is implemented in the libr package, and was inspired by the LIBNAME statement in SAS. This package allows you to load all the datasets in a directory in one line of code. It supports several file formats, such as sas7bdat, xlsx, and xpt. There are also functions to perform basic data management tasks, for example: copying a library, adding data to a library, and exporting to a different file format! ?? Link
Edoardo Mancini {admiral} 1.1.1 is now out on CRAN!
Take a look at what awaits you in this new release of package, including:- Improved error messaging across the admiral family- Improved "Visits and Period" vignettes- Overhauled "Get started" pages- New functionality to use ISO3166 country codes- ... Much, much more!
Full blog post here: Link, where we also share updates from all the other members of the admiral family... including our most two most recent additions, {admiralpeds} and {admiralmetabolic}!
Justin Bélair - #R #tip - Ever got stuck waiting for a long block of code to run? I've got the trick ;) I've been working on a massive health records dataset. One of the steps in my analysis takes about 2 hours to run - how do I know?I use the tic() and toc() functions to count the time - it starts counting on tic() and stops counting on toc(). Pretty neat!
Check the post to see why Justin uses the beepr::beep() function! (something to do with the Legend of Zelda).
Original post here - Link
Hengwei Liu - I drafted a short paper about some basic data wrangling steps in R programming that are needed to create tables in clinical trials, such as keep, drop and rename variables, transpose, merge, left join, cross join, subset, sort, etc.. If you just started learning to use R to create tables, this paper can be helpful. Link
Stu Sztukowski - #SASTipTuesday: Creating a sequential list of macro variables from a dataset in #SAS is as easy as a single #SQL clause: "into :macvar1-". That's it! SAS handles the rest for you.
Sometimes you need to create a sequential list of macro variables out of rows of a dataset. For example: &macvar1, &macvar2, &macvar3. You don't need to write any complex multi-step queries or macro logic to do this. Here are a few very simple examples of my two favorite ways to create these using SQL and the DATA Step. Full article - Link
Biostatistics
Kuo Meng Lin - I recently saw a post on our corporate Teams channel requesting support for Tendril plots. After doing some research, I found the concept of this plot to be quite intriguing and worth exploring.
The origin of the tendril plot algorithm is inspired by an impressive piece of work by Stefaner et al., whose goal was to visualize the flow of discussions over time about whether to keep or delete articles on Wikipedia.
What Question Does the Plot Address?The Tendril plot, invented by Martin Karpefors and James Weatherall, is a method to simultaneously represent the relative significance of risks and the temporal pattern of Adverse Events (AEs) in a study.
Check this link to learn more about Tendril plots - Link
Robert Rachford writes about estimands and makes comments on a very intersting paper - "Liraglutide 3.0 mg and Intensive Behavioral Therapy (IBT) for Obesity in Primary Care: The SCALE IBT Randomized Controlled Trial" In this article, the researchers looked at different scientific questions about weight management. And the best part? Different scientific questions means multiple estimands so this paper is chock-full of examples to help our understanding of what Estimands look like in practice. To estimate the intervention effect of the study drug, the researchers implemented the treatment policy estimand. This estimand evaluated the effect of study drug vs placebo at week 56 for all randomized individuals regardless of premature discontinuation of trial product (recall that treatment policy strategy is the most basic "nothing done" strategy for dealing with ICEs). See the full article - Link
Robert Rachford also wrote about a very important part of growin your career as a biostatistician - experiencing different study designs. Please, read this post to learn more about the Factorial Design: A factorial study is a clinical trial design used to evaluate the effects of two or more independent variables (factors) on a dependent variable. In a factorial design, all possible combinations of the levels of the factors are investigated, allowing us to examine not only the individual effects of each factor but also any interactions between the factors. Link
Estimating the variance of covariate-adjusted estimators of average treatment effects in clinical trials with binary endpoints. Mark Baillie, Alexander Przybylski, Craig Wang, Dominic Magirr released a paper (together with simulation code and R package) on the topic of variance estimation for covariate-adjusted estimators of average treatment effects in clinical trials with binary endpoints. We hope this work package provides much needed clarity on a fundamental issue in the primary analysis of clinical trials and enables straightforward and robust implementation. Link
Josie Hayes PhD writes about a predictive biomarker and explains how to correctly establish such a biomarker. Most importantly, you need to compare it to a standard therapy. Read more in her original post - Link
Ryan Batten, PhD(c) - Using regression can sometimes be challenging to interpret, however, you can write the equations out and plot a few points to get the base understanding of how the equation is going to work. Link
Kaspar Rufibach - Writes about non-proportional hazards in drug development. Last week I was teaching in a summer school of the IBS German and Austrian-Swiss Region in Strobl at the beautiful Wolfgangsee. The theme of the summer school was “Time-to-Event Analysis”. Thanks again to the organizers for inviting me! One of the topics I discussed was non-proportional hazards (NPH) in drug development. - Part 1 Part 2
领英推荐
Kaspar Rufibach , Marcel Wolbers, Ray Lin, yi liu, Godwin Yung, PhD - Balancing events, not patients, maximizes power of the logrank test: and other insights on unequal randomization in survival trials. We revisit the question of what randomization ratio (RR) maximizes power of the logrank test in event-driven survival trials under proportional hazards (PH). By comparing three approximations of the logrank test (Schoenfeld, Freedman, Rubinstein) to empirical simulations, we find that the RR that maximizes power is the RR that balances number of events across treatment arms at the end of the trial. This contradicts the common misconception implied by Schoenfeld's approximation that 1:1 randomization maximizes power. We perform simulations to better understand how unequal randomization might impact these factors in practice. Altogether, we derive 6 insights to guide statisticians in the design of survival trials considering unequal randomization. Link
Assessing the performance of methods for central statistical monitoring of a binary or continuous outcome in multi-center trials: A simulation study. Quality study monitoring is fundamental to patient safety and data integrity. Regulators and industry consortia have increasingly advocated for risk-based monitoring (RBM) and central statistical monitoring (CSM) for more effective and efficient monitoring. Our evaluation explored the merits and drawbacks of multiple CSM methods, and found that relying on sensitivity and specificity alone is likely insufficient to fully measure predictive performance. The finite mixture method demonstrated more consistent performance across scenarios by mitigating the influence of outliers. In practice, considering the study-specific costs of false positives/negatives with available resources for monitoring is important. Zhongkai (Kai) Wang, Ph.D. et all. Link
Zhimao Weng, Xun Zhang, Yingqiu Li - The score-goldilocks design for phase 3 clinical trials. In this paper, we propose a new Bayesian adaptive design, score-goldilocks design, which has the same algorithmic idea as goldilocks design. The score-goldilocks design leads to a uniform formula for calculating the probability of trial success for different endpoint trials by using the normal approximation. The simulation results show that the score-goldilocks design is not only very similar to the goldilocks design in terms of operating characteristics such as type 1 error, power, average sample size, probability of stop for futility, and probability of early stop for success, but also greatly saves the calculation time and improves the operation efficiency. Link
Designing a Bayesian adaptive clinical trial to evaluate novel mechanical ventilation strategies in acute respiratory failure using integrated nested Laplace approximations. Adaptive trials usually require simulations to determine values for design parameters, demonstrate error rates, and establish the sample size. We designed a Bayesian adaptive trial comparing ventilation strategies for patients with acute hypoxemic respiratory failure using simulations. The complexity of the analysis would usually require computationally expensive Markov Chain Monte Carlo methods but this barrier to simulation was overcome using the Integrated Nested Laplace Approximations (INLA) algorithm to provide fast, approximate Bayesian inference. Link
Reyhaneh Hosseini, Ziming(Jocelyn) Chen, Ewan Goligher, Eddy Fan, Niall Ferguson, Michael Harhay, Sarina Sahetya, Martin Urner, Christopher Yarnell, Anna Heath
Why you should avoid using multiple Fine–Gray models: insights from (attempts at) simulating proportional subdistribution hazards data. Studies considering competing risks will often aim to estimate the cumulative incidence functions conditional on an individual’s baseline characteristics. While the Fine–Gray subdistribution hazard model is tailor-made for analysing only one of the competing events, it may still be used in settings where multiple competing events are of scientific interest, where it is specified for each cause in turn. In this work, we provide an overview of data-generating mechanisms where proportional subdistribution hazards hold for at least one cause. We use these to motivate why the use of multiple Fine–Gray models should be avoided in favour of better alternatives such as cause-specific hazard models. Link
Edouard Bonneville, Liesbeth de Wreede, Hein Putter Journal of the Royal Statistical Society Series A: Statistics in Society
Jonathan Bartlett et all. Fitting a Fine & Gray model for competing risks but face the problem of missing values in some covariates? Multiple imputation of missing covariates when using the Fine-Gray model. Check out Edouard Bonneville's new paper exploring this question and a short blog post from Jonathan - Link
The Fine-Gray model for the subdistribution hazard is commonly used for estimating associations between covariates and competing risks outcomes. When there are missing values in the covariates included in a given model, researchers may wish to multiply impute them. Assuming interest lies in estimating the risk of only one of the competing events, this paper develops a substantive-model-compatible multiple imputation approach that exploits the parallels between the Fine-Gray model and the standard (single-event) Cox model.
Peter Austin - Multiple imputation with competing risk outcomes. In time-to-event analyses, a competing risk is an event whose occurrence precludes the occurrence of the event of interest. Settings with competing risks occur frequently in clinical research. Missing data, which is a common problem in research, occurs when the value of a variable is recorded for some, but not all, records in the dataset. Multiple Imputation (MI) is a popular method to address the presence of missing data. MI uses an imputation model to generate M (M?>?1) values for each variable that is missing, resulting in the creation of M complete datasets. A popular algorithm for imputing missing data is multivariate imputation using chained equations (MICE). We used a complex simulation design with covariates and missing data patterns reflective of patients hospitalized with acute myocardial infarction (AMI) to compare three strategies for imputing missing predictor variables when the analysis model is a cause-specific hazard when there were three different event types. Link
Real World Evidence
Alexandros S. - comments on a paper Examining the Use of Real World Evidence in the Regulatory Process:
- Authors discuss 3-type of studies used for comparative evidence: a) #virtual comparative effectiveness studies, b) single arm studies using #historical #secondary_data as control; c) studies using #synthetic_arm to pair an uncontrolled arm;
I have very much enjoyed reading this paper which reminds me that serving the RWE space should not make us over defending its use, but rather acknowledging its limitations and jointly working with regulators on developing new & more efficient solutions.
Please, check this Link to see the full comment.
Yoshita Paliwal, PhD - E-value is a relatively new concept in observational research that is still in the early stages of being explored and applied. Sharing below two articles presenting contradictory views on the concept of E-value and their interpretations. See the post to learn more about the E-value - Link
Events&Webinars
Hands On Clinical Reporting Using R is now launched on Coursera ?? ! After years of building the end-to-end open-source pipeline for clinical insights generation and sharing our tools publicly with the community, we are now launching this course for free to introduce all these tools and concepts to people who'd like to learn more about clinical study reporting in the pharmaceutical industry. Link
Sunil Gupta Presenting on Time to Explore Pharmaverse; What does R have to Offer? R offers SAS statistical programmers an opportunity to expand their programming skill set by applying new trending R language and Pharmaverse packages to traditional clinical study programming tasks.?Instead of developing silo SAS macros, smarter organizations leverage Pharmaverse packages for common programming tasks.?This webinar will explore several Pharmaverse resources including tidyCDISC from R-Guru.com/pharma that can directly streamline the clinical workflow process. Link
July PHUSE Wednesday Webinar on July 31 at 7 am PT!
Causal Machine Learning for Biomarker Subgroup Discovery in Randomised Trials. Paul Newcombe at London School of Hygiene and Tropical Medicine, U. of London and online, 16th July. The speaker will describe three causal machine learning methods for responder subgroup detection; the “Modified covariate Lasso”1, “Causal Forests”2, and the “X-Learner”3. He will compare and assess their performance in a modest simulation study motivated by real biomarker trial datasets being generated within GlaxoSmithKline. He will then share some early (gene-anonymised) results from an on-going application of these methods to detect and predict responder subgroups from transcriptomic data measured in two Phase 3 Lupus trials. The speaker will close with a discussion on the benefits, and limitations, that he found with existing methods in this space.?Link
Boston Area SAS Users Group (BASUG) - "Validating User-Submitted Data Files with Base SAS," with Michael Raithel from Westat on July 17.
This paper presents a rigorous methodology for validating user-submitted data sets using Base SAS. Readers can use this methodology and the SAS code examples to set up their own data QC regimen. Link
Ready to dive deeper into the world of basket trials? Join Cytel 's upcoming webinar on August 1, 2024, at 9 AM ET / 3 PM CET to explore cutting-edge advancements in basket trials for oncology drug development! Learn how Bayesian methods can enhance trial precision and efficiency.?? Key Topics:
Clinical Data Scientist | Python | SAS | CDISC | SDTM | SQL | MACRO | ADaM | TLFs | MS in Clinical Research |
4 个月Thanks for sharing, very useful and informative. Keep going Krzysztof Orzechowski -- You gained a new subscriber ??
Sr. Statistical Programmer at Cytel
4 个月Very informative .Thank you for sharing ,??
Biomarker Dev Strategies for Therapeutics| Preclinical to Phase 2 | Bioinformatics | ex Revolution Medicines| ex Clinical Cytogeneticist | Bridging Bench to Bedside: Precision in Every Step
4 个月Thanks for the mention Krzysztof!
Curious about causality and sharing what I learn
4 个月Thanks for including me Krzysztof! These are super helpful (especially for some of the great posts that I miss!)
Statistician at Genentech/Roche
4 个月Thank you for mentioning our paper!