The Advantages of Using R in Clinical Statistics for Longitudinal Data for Mixed Effects Models
MAHESH DIVAKARAN
Statistician || Researcher || Lecturer || Data Analyst || Public Speaker ||| Data Science || Project Management || IQVIAN
Clinical research frequently uses longitudinal data, which are records that are repeatedly gathered from the same individuals over time. In a randomized trial, patients might undergo multiple assessments of an outcome variable after getting an intervention or a placebo. Because there are intricate dependencies between repetitive measurements and the possibility of missing data from dropout or attrition, analyzing longitudinal data can be difficult.
Mixed effects models, also called linear mixed models or hierarchical linear models, are a common and effective method for analyzing continuous data. These models can take into consideration both the random effects of particular subjects and the fixed effects of explanatory variables (such as treatment group, time, or baseline covariates). (such as their baseline level or their rate of change over time). Under the premise that missing data is absent at random, which means that the probability of missingness is independent of the unobserved outcome values, mixed effects models can also manage missing data.
In clinical research, it's essential to take into account the complex relationships between repeated assessments and significant data gaps while analyzing longitudinal data. One successful tactic for analyzing such data is to use mixed effects models. Even though many statistical software programs can fit mixed effects models, this blog proposes that R is the best choice for this use.
Advantages of using R for mixed effects models:
Many statistical software packages can fit mixed effects models, but R has several advantages over other software packages, such as:
ibrary(lme4
model <- lmer(y ~ time + (time | subject), data = data))
Examples of fitting mixed effects models in R
Let's now look at some examples of how to fit mixed effects models in R using different packages and techniques.
The lme4 package in R is one of the most popular packages for fitting linear mixed models, which are models that assume normally distributed random effects and a linear relationship between the outcome variable and the fixed effects. To illustrate how to use this package, let's consider a hypothetical example of a clinical trial investigating the effect of a new drug on blood pressure over time.
First, we will load the required packages and simulate some data:
library(lme4)
set.seed(123)
n <- 100 # number of patients
t <- 4 # number of time points
id <- rep(1:n, each=t) # patient ID
time <- rep(1:t, n) # time
group <- rep(c("drug", "placebo"), each=n*t/2) # treatment group
bp <- rnorm(n*t, mean = 120 + 10*group + 2*time, sd = 5) # blood pressure
data <- data.frame(id, time, group, bp)
This code creates a dataset with 100 patients who have blood pressure measurements at four-time points. Half of the patients receive the new drug, and half receive a placebo. The blood pressure measurements are generally distributed with a mean that depends on the treatment group and time and a standard deviation of 5.
Next, we can fit a linear mixed model using the lme4 package with a random intercept and slope for each patient and a fixed effect of treatment group and time:
领英推荐
model <- lmer(bp ~ group*time + (1+time|id), data=data)
summary(model)
The formula bp ~ group*time + (1+time|id) specifies that the outcome variable bp is modeled as a function of the fixed effects of the treatment group and time and the random effects of intercept and slope for each patient (id). The (1+time|id) notation indicates that we want to model the random effects of intercept and slope concerning time. The summary of the model output shows the estimated coefficients, standard errors, t-values, and p-values for each fixed effect and random effect.
We can also use the predict function to obtain the predicted blood pressure values for each patient at each time point:
newdata <- expand.grid(id=unique(data$id), time=1:t, group=unique(data$group))
newdata$bp_pred <- predict(model, newdata)
This code creates a new dataset with all possible combinations of patient IDs, time points, and treatment groups. It uses the predict function to calculate each combination's predicted blood pressure values based on the fitted model.
Using nlme package for nonlinear mixed models
The nlme package allows us to fit a wide range of nonlinear mixed effects models, including models with non-linear functions of the predictors or the response variable, models with non-constant variances, and models with correlated random effects. To fit a nonlinear mixed effects model using nlme, we need to specify the model structure using a formula similar to the lme4 package.
Here's an example of how we can fit a nonlinear mixed effects model to the blood pressure data using the nlme package:
# Load nlme package
library(nlme)
# Define the nonlinear model function
nls_model <- nlme(fixed = BP ~ a * (1 - exp(-b * time)) + c,
? ? ? ? ? ? ? ? ? random = list(a = pdLogChol(~time), b = pdLogChol(~1), c = pdDiag(~1)),
? ? ? ? ? ? ? ? ? data = data,
? ? ? ? ? ? ? ? ? fixed = list(a ~ group, b ~ 1, c ~ 1),
? ? ? ? ? ? ? ? ? start = c(a = 100, b = 0.1, c = 120),
? ? ? ? ? ? ? ? ? na.action = na.exclude)
# Print model summary
summary(nls_model)
This model uses a nonlinear model function that includes the fixed effects of the treatment group and time. The random effects for the model include a subject-specific intercept and a slope for time, which are allowed to be correlated. We also specify the starting values for the parameters in the model.
The pdLogChol and pdDiag functions are used to specify the variance-covariance structure for the random effects. pdLogChol specifies a positive-definite matrix with a Cholesky factorization, while pdDiag specifies a diagonal matrix.
The na.exclude argument is used to handle missing data, which will exclude any observations with missing values in the response variable.
Finally, we print the summary of the model to see the estimated fixed and random effects, as well as other model diagnostics.
Overall, the nlme package provides a powerful tool for fitting nonlinear mixed effects models to longitudinal data. By specifying the appropriate model structure and starting values, we can capture the complex relationships between repeated assessments and the presence of missing data.
Conclusion:
In conclusion, longitudinal data analysis is a vital aspect of clinical research that requires sophisticated modeling techniques to account for complex dependencies and missing data. Mixed effects models, also known as linear mixed models or hierarchical linear models, are a common and effective approach for analyzing continuous data that involves both fixed and random effects. R, a powerful and flexible statistical software package, is an excellent choice for fitting mixed-effects models due to its rich set of packages, user-friendly syntax, data manipulation, visualization, reporting capabilities, and active community.
In this blog post, we have shown how to use R and two popular packages, lme4, and nlme, to fit linear and nonlinear mixed effects models using a simulated example dataset. We hope this blog post has provided a useful introduction to the topic and encouraged readers to explore mixed effects models further in their own research.
Clinical Trials Biostatistician at 2KMM (100% R-based CRO) ? Frequentist (non-Bayesian) paradigm ? NOT a Data Scientist (no ML/AI), no SAS ? Against anti-car/-meat/-cash restrictions ? In memory of The Volhynian Mаssасrе
1 年MAHESH DIVAKARAN Oh, mixed models. I haven't used them for about a decade for longitudinal studies, replacing them entirely with MMRM (GLS and GEE). In R it's nlme::gls() or - recently - the mmrm package + emmeans. In SAS it's the "REPEATED" clause rather than "RANDOM". The biggest problem was that I couldn't mimic properly even simple compound symmetry (it won't work for negative correlations, which I faced a number of times, but mixed models zero them, biasing the results as hell) or unstructured covariance, not to mention other structures, also these models imposed additional assumptions on the random effects. Also for the non-Gaussian conditional responses, GLMM doesn't give us the marginal results, necessary in RCTs (GEE does) or one has to integrate over the random effects. The equivalence in interpretation holds only for the LMM. I already counted 4 rejections from the statistical reviewers, who expected the covariance to be modelled exclusively via residual with certain structures. Sadly, lme4 cannot do this. I use GLMM however, "implicitly" during the MICE imputation for longitudinal studies with binary outcomes.