Code
#install.packages("remotes")
#remotes::install_github("DevPsyLab/petersenlab")#install.packages("remotes")
#remotes::install_github("DevPsyLab/petersenlab")mydata <- read.csv("https://osf.io/cqn3d/download")These models go by a variety of different terms:
It can be helpful to center the age/time variable so that the intercept in a growth curve model has meaning. For instance, we can subtract the youngest participant age to set the intercepts to be the earliest age in the sample.
mydata$ageYears <- mydata$age / 12
mydata$ageMonthsCentered <- mydata$age - min(mydata$age, na.rm = TRUE)
mydata$ageYearsCentered <- mydata$ageMonthsCentered / 12
mydata$ageYearsCenteredSquared <- mydata$ageYearsCentered ^ 2
mydata$sex <- factor(
mydata$female,
levels = c(0, 1),
labels = c("male", "female")
)For small sample sizes, restricted maximum likelihood (REML) is preferred over maximum likelihood (ML). ML preferred when there is a small number (< 4) of fixed effects; REML is preferred when there are more (> 4) fixed effects. The greater the number of fixed effects, the greater the difference between REML and ML estimates. Likelihood ratio (LR) tests for REML require exactly the same fixed effects specification in both models. So, to compare models with different fixed effects with an LR test (to determine whether to include a particular fixed effect), ML must be used. In contrast to the maximum likelihood estimation, REML can produce unbiased estimates of variance and covariance parameters, variance estimates are larger in REML than ML. To compare whether an effect should be fixed or random, use REML. To simultaneously compare fixed and random effects, use ML.
The following models are models that are fit in a linear mixed modeling framework.
ggplot(
data = mydata,
mapping = aes(
x = ageYears,
y = math,
group = id)) +
geom_line() +
labs(
x = "Age (years)",
y = "Math Score"
) +
theme_classic()lme4
linearMixedModel <- lmer(
math ~ female + ageYearsCentered + female:ageYearsCentered + (1 + ageYearsCentered | id), # random intercepts and slopes; sex as a fixed-effect predictor of the intercepts and slopes
data = mydata,
REML = FALSE, #for ML
na.action = na.exclude,
control = lmerControl(optimizer = "bobyqa"))
summary(linearMixedModel)Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
method [lmerModLmerTest]
Formula: math ~ female + ageYearsCentered + female:ageYearsCentered +
(1 + ageYearsCentered | id)
Data: mydata
Control: lmerControl(optimizer = "bobyqa")
AIC BIC logLik -2*log(L) df.resid
15857.9 15903.5 -7920.9 15841.9 2213
Scaled residuals:
Min 1Q Median 3Q Max
-3.3750 -0.5174 0.0051 0.5239 2.6396
Random effects:
Groups Name Variance Std.Dev. Corr
id (Intercept) 62.5365 7.9080
ageYearsCentered 0.6767 0.8226 0.08
Residual 32.1505 5.6701
Number of obs: 2221, groups: id, 932
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 30.51401 0.56142 752.48747 54.352 <2e-16 ***
female -0.61290 0.79482 736.39886 -0.771 0.441
ageYearsCentered 4.26792 0.11253 610.09410 37.925 <2e-16 ***
female:ageYearsCentered -0.02558 0.16092 598.89155 -0.159 0.874
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr) female agYrsC
female -0.706
ageYrsCntrd -0.635 0.448
fml:gYrsCnt 0.444 -0.631 -0.699
print(effectsize::standardize_parameters(
linearMixedModel,
method = "refit"),
digits = 2)# Standardization method: refit
Parameter | Std. Coef. | 95% CI
------------------------------------------------------
(Intercept) | 1.16e-03 | [-0.05, 0.05]
female | -0.03 | [-0.08, 0.02]
ageYearsCentered | 0.61 | [ 0.59, 0.63]
female × ageYearsCentered | -1.84e-03 | [-0.02, 0.02]
newData <- expand.grid(
female = c(0, 1),
ageYears = c(
min(mydata$ageYears, na.rm = TRUE),
max(mydata$ageYears, na.rm = TRUE))
)
newData$ageYearsCentered <- newData$ageYears - min(newData$ageYears)
newData$sex <- NA
newData$sex[which(newData$female == 0)] <- "male"
newData$sex[which(newData$female == 1)] <- "female"
newData$sex <- as.factor(newData$sex)
newData$predictedValue <- predict( # predict.merMod
linearMixedModel,
newdata = newData,
re.form = NA
)
ggplot(
data = newData,
mapping = aes(
x = ageYears,
y = predictedValue,
color = sex)) +
geom_line() +
labs(
x = "Age (years)",
y = "Math Score",
color = "Sex"
) +
theme_classic()ggplot(
data = mydata,
mapping = aes(
x = ageYears,
y = predictedValue,
group = id)) +
geom_line(
color = "gray",
alpha = 0.5
) +
geom_line(
data = newData,
mapping = aes(
x = ageYears,
y = predictedValue,
group = sex,
color = sex),
linewidth = 2) +
labs(
x = "Age (years)",
y = "Math Score",
color = "Sex"
) +
theme_classic()ranef(linearMixedModel)$id
(Intercept) ageYearsCentered
201 1.125314 0.2140257
303 -12.515509 -0.6614897
2702 12.257492 0.4307630
4303 2.727958 0.2850021
5002 1.943701 0.1707896
5005 4.045983 0.1202867
with conditional variances for "id"
nlme
linearMixedModel_nlme <- lme(
math ~ female + ageYearsCentered + female:ageYearsCentered, # sex as a fixed-effect predictor of the intercepts and slopes
random = ~ 1 + ageYearsCentered|id, # random intercepts and slopes
data = mydata,
method = "ML",
na.action = na.exclude)
summary(linearMixedModel_nlme)Linear mixed-effects model fit by maximum likelihood
Data: mydata
AIC BIC logLik
15857.85 15903.5 -7920.926
Random effects:
Formula: ~1 + ageYearsCentered | id
Structure: General positive-definite, Log-Cholesky parametrization
StdDev Corr
(Intercept) 7.9079987 (Intr)
ageYearsCentered 0.8225933 0.082
Residual 5.6701380
Fixed effects: math ~ female + ageYearsCentered + female:ageYearsCentered
Value Std.Error DF t-value p-value
(Intercept) 30.514011 0.5619217 1287 54.30296 0.0000
female -0.612896 0.7955333 930 -0.77042 0.4412
ageYearsCentered 4.267923 0.1126360 1287 37.89130 0.0000
female:ageYearsCentered -0.025585 0.1610671 1287 -0.15885 0.8738
Correlation:
(Intr) female agYrsC
female -0.706
ageYearsCentered -0.635 0.448
female:ageYearsCentered 0.444 -0.631 -0.699
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-3.375034869 -0.517409797 0.005105047 0.523910718 2.639557775
Number of Observations: 2221
Number of Groups: 932
print(effectsize::standardize_parameters(
linearMixedModel_nlme,
method = "refit"),
digits = 2)# Standardization method: refit
Parameter | Std. Coef. | 95% CI
------------------------------------------------------
(Intercept) | 1.16e-03 | [-0.05, 0.05]
female | -0.03 | [-0.08, 0.02]
ageYearsCentered | 0.61 | [ 0.59, 0.63]
female × ageYearsCentered | -1.84e-03 | [-0.02, 0.02]
Adapted from Usami & Murayama (2018):
timepointSpecificErrorsMixedModel <- lmer(
math ~ female + ageYearsCentered + female:ageYearsCentered + (1 | id) + (1 | ageYearsCentered), # timepoint-specific errors: observations are cross-classified with person and timepoint; sex as a fixed-effect predictor of the intercepts and slopes
data = mydata,
REML = FALSE, #for ML
na.action = na.exclude,
control = lmerControl(optimizer = "bobyqa"))
summary(timepointSpecificErrorsMixedModel)Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
method [lmerModLmerTest]
Formula: math ~ female + ageYearsCentered + female:ageYearsCentered +
(1 | id) + (1 | ageYearsCentered)
Data: mydata
Control: lmerControl(optimizer = "bobyqa")
AIC BIC logLik -2*log(L) df.resid
15808.6 15848.5 -7897.3 15794.6 2214
Scaled residuals:
Min 1Q Median 3Q Max
-3.5409 -0.5191 0.0051 0.5093 2.7345
Random effects:
Groups Name Variance Std.Dev.
id (Intercept) 76.219 8.730
ageYearsCentered (Intercept) 4.503 2.122
Residual 30.842 5.554
Number of obs: 2221, groups: id, 932; ageYearsCentered, 94
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 29.87438 0.75758 211.81838 39.434 <2e-16 ***
female -0.38123 0.82243 1960.43213 -0.464 0.643
ageYearsCentered 4.26419 0.14813 132.95739 28.786 <2e-16 ***
female:ageYearsCentered -0.08143 0.14596 1364.56012 -0.558 0.577
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr) female agYrsC
female -0.545
ageYrsCntrd -0.758 0.321
fml:gYrsCnt 0.356 -0.652 -0.482
print(effectsize::standardize_parameters(
timepointSpecificErrorsMixedModel,
method = "refit"),
digits = 2)# Standardization method: refit
Parameter | Std. Coef. | 95% CI
-------------------------------------------------------
(Intercept) | -1.28 | [-1.38, -1.19]
female | -0.01 | [-0.08, 0.05]
ageYearsCentered | 0.33 | [ 0.31, 0.35]
female × ageYearsCentered | -3.18e-03 | [-0.01, 0.01]
When using higher-order polynomials, we could specify contrast codes for time to reduce multicollinearity between the linear and quadratic growth factors: https://tdjorgensen.github.io/SEM-in-Ed-compendium/ch27.html#saturated-growth-model
1 2
[1,] -0.6708204 0.5
[2,] -0.2236068 -0.5
[3,] 0.2236068 -0.5
[4,] 0.6708204 0.5
attr(,"coefs")
attr(,"coefs")$alpha
[1] 1.5 1.5
attr(,"coefs")$norm2
[1] 1 4 5 4
attr(,"degree")
[1] 1 2
attr(,"class")
[1] "poly" "matrix"
linearLoadings <- factorLoadings[,1]
quadraticLoadings <- factorLoadings[,2]
linearLoadings[1] -0.6708204 -0.2236068 0.2236068 0.6708204
quadraticLoadings[1] 0.5 -0.5 -0.5 0.5
quadraticGCM <- lmer(
math ~ female + ageYearsCentered + ageYearsCenteredSquared + female:ageYearsCentered + female:ageYearsCenteredSquared + (1 + ageYearsCentered | id), # random intercepts and linear slopes; fixed quadratic slopes; sex as a fixed-effect predictor of the intercepts and slopes
data = mydata,
REML = FALSE, #for ML
na.action = na.exclude,
control = lmerControl(optimizer = "bobyqa"))
summary(quadraticGCM)Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
method [lmerModLmerTest]
Formula: math ~ female + ageYearsCentered + ageYearsCenteredSquared +
female:ageYearsCentered + female:ageYearsCenteredSquared +
(1 + ageYearsCentered | id)
Data: mydata
Control: lmerControl(optimizer = "bobyqa")
AIC BIC logLik -2*log(L) df.resid
15666.9 15724.0 -7823.5 15646.9 2211
Scaled residuals:
Min 1Q Median 3Q Max
-3.3660 -0.4945 0.0048 0.5085 2.4377
Random effects:
Groups Name Variance Std.Dev. Corr
id (Intercept) 69.8860 8.3598
ageYearsCentered 0.7099 0.8426 -0.05
Residual 27.1959 5.2150
Number of obs: 2221, groups: id, 932
Fixed effects:
Estimate Std. Error df t value
(Intercept) 23.43112 0.84914 1380.85816 27.594
female 0.61167 1.18734 1370.91791 0.515
ageYearsCentered 8.79334 0.42555 1381.21301 20.664
ageYearsCenteredSquared -0.57623 0.05255 1382.10819 -10.966
female:ageYearsCentered -0.62379 0.60301 1386.34429 -1.034
female:ageYearsCenteredSquared 0.05700 0.07582 1393.17373 0.752
Pr(>|t|)
(Intercept) <2e-16 ***
female 0.607
ageYearsCentered <2e-16 ***
ageYearsCenteredSquared <2e-16 ***
female:ageYearsCentered 0.301
female:ageYearsCenteredSquared 0.452
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr) female agYrsC agYrCS fml:YC
female -0.715
ageYrsCntrd -0.836 0.598
agYrsCntrdS 0.759 -0.543 -0.969
fml:gYrsCnt 0.590 -0.829 -0.706 0.683
fml:gYrsCnS -0.526 0.751 0.671 -0.693 -0.968
print(effectsize::standardize_parameters(
quadraticGCM,
method = "refit"),
digits = 2)# Standardization method: refit
Parameter | Std. Coef. | 95% CI
--------------------------------------------------------------
(Intercept) | -0.01 | [-0.06, 0.04]
female | -0.03 | [-0.08, 0.02]
ageYearsCentered | 1.22 | [ 1.13, 1.30]
ageYearsCenteredSquared | -0.63 | [-0.72, -0.55]
female × ageYearsCentered | -0.04 | [-0.13, 0.04]
female × ageYearsCenteredSquared | 0.03 | [-0.05, 0.12]
This is equivalent to:
quadraticGCM <- lmer(
math ~ female + ageYearsCentered + I(ageYearsCentered^2) + female:ageYearsCentered + female:I(ageYearsCentered^2) + (1 + ageYearsCentered | id), # random intercepts and slopes; sex as a fixed-effect predictor of the intercepts and slopes
data = mydata,
REML = FALSE, #for ML
na.action = na.exclude,
control = lmerControl(optimizer = "bobyqa"))
summary(quadraticGCM)Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
method [lmerModLmerTest]
Formula:
math ~ female + ageYearsCentered + I(ageYearsCentered^2) + female:ageYearsCentered +
female:I(ageYearsCentered^2) + (1 + ageYearsCentered | id)
Data: mydata
Control: lmerControl(optimizer = "bobyqa")
AIC BIC logLik -2*log(L) df.resid
15666.9 15724.0 -7823.5 15646.9 2211
Scaled residuals:
Min 1Q Median 3Q Max
-3.3660 -0.4945 0.0048 0.5085 2.4377
Random effects:
Groups Name Variance Std.Dev. Corr
id (Intercept) 69.8860 8.3598
ageYearsCentered 0.7099 0.8426 -0.05
Residual 27.1959 5.2150
Number of obs: 2221, groups: id, 932
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 23.43112 0.84914 1380.85816 27.594 <2e-16
female 0.61167 1.18734 1370.91791 0.515 0.607
ageYearsCentered 8.79334 0.42555 1381.21301 20.664 <2e-16
I(ageYearsCentered^2) -0.57623 0.05255 1382.10819 -10.966 <2e-16
female:ageYearsCentered -0.62379 0.60301 1386.34429 -1.034 0.301
female:I(ageYearsCentered^2) 0.05700 0.07582 1393.17373 0.752 0.452
(Intercept) ***
female
ageYearsCentered ***
I(ageYearsCentered^2) ***
female:ageYearsCentered
female:I(ageYearsCentered^2)
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr) female agYrsC I(YC^2 fml:YC
female -0.715
ageYrsCntrd -0.836 0.598
I(gYrsCn^2) 0.759 -0.543 -0.969
fml:gYrsCnt 0.590 -0.829 -0.706 0.683
fml:I(YC^2) -0.526 0.751 0.671 -0.693 -0.968
print(effectsize::standardize_parameters(
quadraticGCM,
method = "refit"),
digits = 2)# Standardization method: refit
Parameter | Std. Coef. | 95% CI
---------------------------------------------------------
(Intercept) | 0.13 | [ 0.08, 0.18]
female | -0.04 | [-0.09, 0.01]
ageYearsCentered | 0.63 | [ 0.61, 0.65]
ageYearsCentered^2 | -0.14 | [-0.16, -0.13]
female × ageYearsCentered | -0.01 | [-0.04, 0.01]
female × ageYearsCentered^2 | 7.53e-03 | [-0.01, 0.03]
newData <- expand.grid(
female = c(0, 1),
ageYears = seq(from = min(mydata$ageYears, na.rm = TRUE), to = max(mydata$ageYears, na.rm = TRUE), length.out = 10000))
newData$ageYearsCentered <- newData$ageYears - min(newData$ageYears)
newData$ageYearsCenteredSquared <- newData$ageYearsCentered ^ 2
newData$sex <- NA
newData$sex[which(newData$female == 0)] <- "male"
newData$sex[which(newData$female == 1)] <- "female"
newData$sex <- as.factor(newData$sex)
newData$predictedValue <- predict( # predict.merMod
quadraticGCM,
newdata = newData,
re.form = NA
)
ggplot(
data = newData,
mapping = aes(
x = ageYears,
y = predictedValue,
color = sex)) +
geom_line() +
labs(
x = "Age (years)",
y = "Math Score",
color = "Sex"
) +
theme_classic()mydata$predictedValue <- predict(
quadraticGCM,
newdata = mydata,
re.form = NULL
)
ggplot(
data = mydata,
mapping = aes(
x = ageYears,
y = predictedValue,
group = id,
color = sex)) +
geom_line(
stat = "smooth",
method = "lm",
formula = y ~ x + I(x^2),
se = FALSE,
linewidth = 0.5,
alpha = 0.4
) +
labs(
x = "Age (years)",
y = "Math Score",
color = "Sex"
) +
theme_classic()ggplot(
data = mydata,
mapping = aes(
x = ageYears,
y = predictedValue,
group = id)) +
geom_line(
stat = "smooth",
method = "lm",
formula = y ~ x + I(x^2),
se = FALSE,
linewidth = 0.5,
color = "gray",
alpha = 0.4
) +
geom_line(
data = newData,
mapping = aes(
x = ageYears,
y = predictedValue,
group = sex,
color = sex),
linewidth = 2) +
labs(
x = "Age (years)",
y = "Math Score",
color = "Sex"
) +
theme_classic()ranef(quadraticGCM)$id
(Intercept) ageYearsCentered
201 0.3392871 0.2048916
303 -14.2395541 -0.5404362
2702 13.6789236 0.6053891
4303 1.8792650 0.2622509
5002 1.6186752 0.3766872
5005 5.0034217 0.1103062
with conditional variances for "id"
splineGCM <- lmer(
math ~ female + ageYearsCentered + female:ageYearsCentered + knot + knot:ageYearsCentered + female:knot + female:knot:ageYearsCentered + (1 + ageYearsCentered | id), # random intercepts and linear slopes; fixed quadratic slopes; sex as a fixed-effect predictor of the intercepts and slopes
data = mydata,
REML = FALSE, #for ML
na.action = na.exclude,
control = lmerControl(optimizer = "bobyqa"))
summary(splineGCM)Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
method [lmerModLmerTest]
Formula: math ~ female + ageYearsCentered + female:ageYearsCentered +
knot + knot:ageYearsCentered + female:knot + female:knot:ageYearsCentered +
(1 + ageYearsCentered | id)
Data: mydata
Control: lmerControl(optimizer = "bobyqa")
AIC BIC logLik -2*log(L) df.resid
15691.2 15759.6 -7833.6 15667.2 2209
Scaled residuals:
Min 1Q Median 3Q Max
-3.3041 -0.5021 -0.0075 0.5034 2.4850
Random effects:
Groups Name Variance Std.Dev. Corr
id (Intercept) 68.3568 8.2678
ageYearsCentered 0.7018 0.8377 -0.03
Residual 27.7772 5.2704
Number of obs: 2221, groups: id, 932
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 25.9318 0.7222 1223.0139 35.909 < 2e-16
female 0.3473 1.0060 1208.6807 0.345 0.730
ageYearsCentered 6.2092 0.2328 1374.3269 26.674 < 2e-16
knot 12.6515 1.8238 1241.0685 6.937 6.43e-12
female:ageYearsCentered -0.3785 0.3240 1373.1043 -1.168 0.243
ageYearsCentered:knot -3.4908 0.3746 1283.6771 -9.318 < 2e-16
female:knot -1.2105 2.6098 1234.2550 -0.464 0.643
female:ageYearsCentered:knot 0.3530 0.5383 1279.7386 0.656 0.512
(Intercept) ***
female
ageYearsCentered ***
knot ***
female:ageYearsCentered
ageYearsCentered:knot ***
female:knot
female:ageYearsCentered:knot
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr) female agYrsC knot fml:YC agYrC: fml:kn
female -0.718
ageYrsCntrd -0.786 0.564
knot -0.267 0.192 0.266
fml:gYrsCnt 0.565 -0.776 -0.718 -0.191
agYrsCntrd: 0.470 -0.337 -0.568 -0.925 0.408
female:knot 0.187 -0.259 -0.186 -0.699 0.262 0.646
fml:gYrsCn: -0.327 0.455 0.395 0.644 -0.559 -0.696 -0.927
print(effectsize::standardize_parameters(
splineGCM,
method = "refit"),
digits = 2)# Standardization method: refit
Parameter | Std. Coef. | 95% CI
----------------------------------------------------------------
(Intercept) | 0.19 | [ 0.13, 0.24]
female | -0.04 | [-0.10, 0.02]
ageYearsCentered | 0.67 | [ 0.63, 0.71]
knot | -0.01 | [-0.05, 0.03]
female × ageYearsCentered | -0.02 | [-0.06, 0.02]
ageYearsCentered × knot | -0.23 | [-0.27, -0.20]
female × knot | 2.12e-03 | [-0.04, 0.04]
(female × ageYearsCentered) × knot | 0.01 | [-0.02, 0.05]
newData <- expand.grid(
female = c(0, 1),
ageYears = seq(from = min(mydata$ageYears, na.rm = TRUE), to = max(mydata$ageYears, na.rm = TRUE), length.out = 10000))
newData$ageYearsCentered <- newData$ageYears - min(newData$ageYears)
newData$knot <- NA
newData$knot[which(newData$ageYears <= 11)] <- 0
newData$knot[which(newData$ageYears > 11)] <- 1
newData$sex <- NA
newData$sex[which(newData$female == 0)] <- "male"
newData$sex[which(newData$female == 1)] <- "female"
newData$sex <- as.factor(newData$sex)
newData$predictedValue <- predict( # predict.merMod
splineGCM,
newdata = newData,
re.form = NA
)
ggplot(
data = newData,
mapping = aes(
x = ageYears,
y = predictedValue,
color = sex)) +
geom_line() +
labs(
x = "Age (years)",
y = "Math Score",
color = "Sex"
) +
theme_classic()ggplot(
data = mydata,
mapping = aes(
x = ageYears,
y = predictedValue,
group = id)) +
geom_line(
color = "gray",
alpha = 0.5
) +
geom_line(
data = newData,
mapping = aes(
x = ageYears,
y = predictedValue,
group = sex,
color = sex),
linewidth = 2) +
labs(
x = "Age (years)",
y = "Math Score",
color = "Sex"
) +
theme_classic()ranef(splineGCM)$id
(Intercept) ageYearsCentered
201 0.652145 0.2083235
303 -13.900383 -0.5384101
2702 13.248702 0.4885446
4303 2.123618 0.2695865
5002 1.813505 0.3311535
5005 4.914571 0.1306295
with conditional variances for "id"
https://bbolker.github.io/mixedmodels-misc/glmmFAQ.html (archived at https://perma.cc/9RFS-BCE7; source code: https://github.com/bbolker/mixedmodels-misc/blob/master/glmmFAQ.rmd)
lmer
Generalized linear mixed model fit by maximum likelihood (Laplace
Approximation) [glmerMod]
Family: poisson ( log )
Formula: outcome ~ female + ageYearsCentered + (ageYearsCentered | id)
Data: mydata
AIC BIC logLik -2*log(L) df.resid
9329.7 9363.9 -4658.8 9317.7 2215
Scaled residuals:
Min 1Q Median 3Q Max
-2.0331 -0.5546 -0.0205 0.5350 5.0222
Random effects:
Groups Name Variance Std.Dev. Corr
id (Intercept) 0.0058309 0.07636
ageYearsCentered 0.0002845 0.01687 -1.00
Number of obs: 2221, groups: id, 932
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.343986 0.026879 50.001 <2e-16 ***
female 0.007439 0.021268 0.350 0.7265
ageYearsCentered 0.010279 0.005804 1.771 0.0766 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr) female
female -0.421
ageYrsCntrd -0.831 0.036
optimizer (Nelder_Mead) convergence code: 0 (OK)
boundary (singular) fit: see help('isSingular')
print(effectsize::standardize_parameters(
generalizedLinearMixedModel,
method = "refit"),
digits = 2)# Standardization method: refit
Parameter | Std. Coef. | 95% CI
---------------------------------------------
(Intercept) | 1.39 | [ 1.37, 1.41]
female | 3.72e-03 | [-0.02, 0.02]
ageYearsCentered | 0.02 | [ 0.00, 0.04]
- Response is unstandardized.
MASS
Linear mixed-effects model fit by maximum likelihood
Data: mydata
AIC BIC logLik
NA NA NA
Random effects:
Formula: ~1 + ageYearsCentered | id
Structure: General positive-definite, Log-Cholesky parametrization
StdDev Corr
(Intercept) 5.533290e-05 (Intr)
ageYearsCentered 9.320922e-08 0
Residual 1.014184e+00
Variance function:
Structure: fixed weights
Formula: ~invwt
Fixed effects: outcome ~ female + ageYearsCentered
Value Std.Error DF t-value p-value
(Intercept) 1.3453017 0.027100741 1288 49.64077 0.0000
female 0.0074130 0.021543030 930 0.34410 0.7308
ageYearsCentered 0.0100806 0.005850753 1288 1.72296 0.0851
Correlation:
(Intr) female
female -0.423
ageYearsCentered -0.829 0.036
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-2.00970268 -0.55110605 -0.02158036 0.53179575 4.98914420
Number of Observations: 2221
Number of Groups: 932
print(effectsize::standardize_parameters(
glmmPQLmodel,
method = "refit"),
digits = 2)# Standardization method: refit
Parameter | Std. Coef. | 95% CI
---------------------------------------------
(Intercept) | 1.39 | [ 1.37, 1.41]
female | 3.71e-03 | [-0.02, 0.02]
ageYearsCentered | 0.02 | [ 0.00, 0.04]
- Response is unstandardized.
MCMCglmm
MCMC iteration = 0
Acceptance ratio for liability set 1 = 0.000410
MCMC iteration = 1000
Acceptance ratio for liability set 1 = 0.439819
MCMC iteration = 2000
Acceptance ratio for liability set 1 = 0.439977
MCMC iteration = 3000
Acceptance ratio for liability set 1 = 0.444608
MCMC iteration = 4000
Acceptance ratio for liability set 1 = 0.496749
MCMC iteration = 5000
Acceptance ratio for liability set 1 = 0.490778
MCMC iteration = 6000
Acceptance ratio for liability set 1 = 0.512362
MCMC iteration = 7000
Acceptance ratio for liability set 1 = 0.428231
MCMC iteration = 8000
Acceptance ratio for liability set 1 = 0.412336
MCMC iteration = 9000
Acceptance ratio for liability set 1 = 0.471726
MCMC iteration = 10000
Acceptance ratio for liability set 1 = 0.428491
MCMC iteration = 11000
Acceptance ratio for liability set 1 = 0.400078
MCMC iteration = 12000
Acceptance ratio for liability set 1 = 0.346435
MCMC iteration = 13000
Acceptance ratio for liability set 1 = 0.276170
summary(MCMCglmmModel)
Iterations = 3001:12991
Thinning interval = 10
Sample size = 1000
DIC: 9323.524
G-structure: ~us(ageYearsCentered):id
post.mean l-95% CI u-95% CI eff.samp
ageYearsCentered:ageYearsCentered.id 6.754e-06 1.087e-08 4.053e-05 7.788
R-structure: ~units
post.mean l-95% CI u-95% CI eff.samp
units 0.00877 0.001634 0.01728 7.797
Location effects: outcome ~ female + ageYearsCentered
post.mean l-95% CI u-95% CI eff.samp pMCMC
(Intercept) 1.338707 1.281096 1.387614 55.58 <0.001 ***
female 0.007688 -0.031193 0.052271 58.65 0.706
ageYearsCentered 0.010561 0.000813 0.021420 57.32 0.046 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
num_cores <- parallelly::availableCores() - 1
num_true_cores <- parallelly::availableCores(logical = FALSE) - 1
Family: gaussian
Link function: identity
Formula:
math ~ s(ageYearsCentered, by = sex) + s(id, ageYearsCentered,
bs = "re")
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 49.9227 0.3718 134.3 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Approximate significance of smooth terms:
edf Ref.df F p-value
s(ageYearsCentered):sexmale 2.8831 3.613 219.7 <2e-16 ***
s(ageYearsCentered):sexfemale 6.2237 6.805 120.3 <2e-16 ***
s(id,ageYearsCentered) 0.9932 1.000 155.5 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Rank: 18/20
R-sq.(adj) = 0.396 Deviance explained = 39.9%
GCV = 99.429 Scale est. = 98.933 n = 2221
print(effectsize::standardize_parameters(
gamModel,
method = "refit"),
digits = 2)# Standardization method: refit
Component | Parameter | Std. Coef. | 95% CI
--------------------------------------------------------------------------------------
conditional | (Intercept) | -1.80e-03 | [-0.04, 0.03]
smooth_terms | Smooth term (ageYearsCentered) × sexmale | |
smooth_terms | Smooth term (ageYearsCentered) × sexfemale | |
smooth_terms | Smooth term (id,ageYearsCentered) | |
newData <- expand.grid(
female = c(0, 1),
ageYears = seq(from = min(mydata$ageYears, na.rm = TRUE), to = max(mydata$ageYears, na.rm = TRUE), length.out = 10000))
newData$ageYearsCentered <- newData$ageYears - min(newData$ageYears)
newData$ageYearsCenteredSquared <- newData$ageYearsCentered ^ 2
newData$sex <- NA
newData$sex[which(newData$female == 0)] <- "male"
newData$sex[which(newData$female == 1)] <- "female"
newData$sex <- as.factor(newData$sex)
# add a dummy id
newData$id <- mydata$id[1]
newData$predictedValue <- predict(
gamModel,
newdata = newData,
exclude = "s(ageYearsCentered,id)"
)
ggplot(
data = newData,
mapping = aes(
x = ageYears,
y = predictedValue,
color = sex)) +
geom_line() +
labs(
x = "Age (years)",
y = "Math Score",
color = "Sex"
) +
theme_classic()mydata$predictedValue <- predict(
gamModel,
newdata = mydata
)
gamMethod <- function(formula, data, weights, ...) { # from here: https://stackoverflow.com/a/76762792/2029527
if(nrow(data) < 4) return(lm(y ~ x, data = data, ...))
else mgcv::gam(y ~ s(x, bs = "cs", k = 3), data = data, ...)
}
ggplot(
data = mydata,
mapping = aes(
x = ageYears,
y = predictedValue,
group = id,
color = sex)) +
geom_line(
stat = "smooth",
method = gamMethod,
se = FALSE,
linewidth = 0.5,
alpha = 0.4
) +
labs(
x = "Age (years)",
y = "Math Score",
color = "Sex"
) +
theme_classic()ggplot(
data = mydata,
mapping = aes(
x = ageYears,
y = predictedValue,
group = id)) +
geom_line(
stat = "smooth",
method = gamMethod,
se = FALSE,
linewidth = 0.5,
color = "gray",
alpha = 0.4
) +
geom_line(
data = newData,
mapping = aes(
x = ageYears,
y = predictedValue,
group = sex,
color = sex),
linewidth = 2) +
labs(
x = "Age (years)",
y = "Math Score",
color = "Sex"
) +
theme_classic()Nonlinear mixed-effects model fit by maximum likelihood
Model: height ~ SSasymp(age, Asym, R0, lrc)
Data: Loblolly
AIC BIC logLik
239.486 251.6401 -114.743
Random effects:
Formula: Asym ~ 1 | Seed
Asym Residual
StdDev: 3.650645 0.7188624
Fixed effects: list(Asym ~ 1, R0 ~ 1, lrc ~ 1)
Value Std.Error DF t-value p-value
Asym 101.44830 2.4616151 68 41.21209 0
R0 -8.62749 0.3179519 68 -27.13459 0
lrc -3.23373 0.0342695 68 -94.36168 0
Correlation:
Asym R0
R0 0.704
lrc -0.908 -0.827
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-2.23604174 -0.62389999 0.05912777 0.65724316 1.95790785
Number of Observations: 84
Number of Groups: 14
To evaluate the extent to which a finding could driven by outliers, this could be done in a number of different ways, such as:
The within-group errors:
The random effects:
Pinheiro and Bates (2000) book (p. 174, section 4.3.1)
https://stats.stackexchange.com/questions/77891/checking-assumptions-lmer-lme-mixed-models-in-r (archived at https://perma.cc/J5GC-PCUT)
Make QQ plots for each level of the random effects. Vary the level from 0, 1, to 2 so that you can check the between- and within-subject residuals.
ppPlot(linearMixedModel)plot(linearMixedModel)Linear mixed-effects model fit by maximum likelihood
Data: mydata
AIC BIC logLik
15857.83 15903.48 -7920.915
Random effects:
Formula: ~1 + ageYearsCentered | id
Structure: General positive-definite, Log-Cholesky parametrization
StdDev Corr
(Intercept) 7.9177716 (Intr)
ageYearsCentered 0.8278343 0.076
Residual 5.6410162
Variance function:
Structure: Different standard deviations per stratum
Formula: ~1 | female
Parameter estimates:
1 0
1.000000 1.009161
Fixed effects: math ~ female + ageYearsCentered
Value Std.Error DF t-value p-value
(Intercept) 30.554856 0.5040373 1288 60.62022 0.0000
female -0.692653 0.6172485 930 -1.12216 0.2621
ageYearsCentered 4.255258 0.0805531 1288 52.82553 0.0000
Correlation:
(Intr) female
female -0.614
ageYearsCentered -0.507 0.014
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-3.37982974 -0.51663213 0.00445497 0.52228733 2.63205084
Number of Observations: 2221
Number of Groups: 932
Make QQ plots for each level of the random effects. Vary the level from 0, 1, to 2 so that you can check the between- and within-subject residuals.
varFixed: fixed variancevarIdent: different variances per stratumvarPower: power of covariatevarExp: exponential of covariatevarConstPower: constant plus power of covariatevarComb: combination of variance functionscorCompSymm: compound symmetrycorSymm: generalcorAR1: autoregressive of order 1corCAR1: continuous-time AR(1)corARMA: autoregressive-moving averagecorExp: exponentialcorGaus: GaussiancorLin: linearcorRatio: rational quadraticcorSpher: sphericalR version 4.5.2 (2025-10-31)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 24.04.3 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
locale:
[1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
[4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
[7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
time zone: UTC
tzcode source: system (glibc)
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] ggplot2_4.0.1 performance_0.15.3 parallelly_1.46.0 mgcv_1.9-3
[5] MCMCglmm_2.36 ape_5.8-1 coda_0.19-4.1 MASS_7.3-65
[9] lmerTest_3.1-3 nlme_3.1-168 lme4_1.1-38 Matrix_1.7-4
[13] petersenlab_1.2.2
loaded via a namespace (and not attached):
[1] tidyselect_1.2.1 psych_2.5.6 viridisLite_0.4.2
[4] dplyr_1.1.4 farver_2.1.2 S7_0.2.1
[7] fastmap_1.2.0 tensorA_0.36.2.1 bayestestR_0.17.0
[10] digest_0.6.39 rpart_4.1.24 lifecycle_1.0.4
[13] cluster_2.1.8.1 magrittr_2.0.4 compiler_4.5.2
[16] rlang_1.1.6 Hmisc_5.2-4 tools_4.5.2
[19] yaml_2.3.12 data.table_1.18.0 knitr_1.51
[22] labeling_0.4.3 htmlwidgets_1.6.4 mnormt_2.1.1
[25] plyr_1.8.9 RColorBrewer_1.1-3 withr_3.0.2
[28] foreign_0.8-90 purrr_1.2.0 numDeriv_2016.8-1.1
[31] datawizard_1.3.0 nnet_7.3-20 grid_4.5.2
[34] stats4_4.5.2 lavaan_0.6-21 xtable_1.8-4
[37] colorspace_2.1-2 scales_1.4.0 insight_1.4.4
[40] cli_3.6.5 mvtnorm_1.3-3 rmarkdown_2.30
[43] reformulas_0.4.3 generics_0.1.4 otel_0.2.0
[46] rstudioapi_0.17.1 reshape2_1.4.5 parameters_0.28.3
[49] minqa_1.2.8 DBI_1.2.3 stringr_1.6.0
[52] splines_4.5.2 parallel_4.5.2 effectsize_1.0.1
[55] base64enc_0.1-3 mitools_2.4 vctrs_0.6.5
[58] boot_1.3-32 jsonlite_2.0.0 Formula_1.2-5
[61] htmlTable_2.4.3 glue_1.8.0 nloptr_2.2.1
[64] cubature_2.1.4-1 stringi_1.8.7 gtable_0.3.6
[67] quadprog_1.5-8 tibble_3.3.0 pillar_1.11.1
[70] htmltools_0.5.9 R6_2.6.1 Rdpack_2.6.4
[73] mix_1.0-13 evaluate_1.0.5 pbivnorm_0.6.0
[76] lattice_0.22-7 rbibutils_2.4 backports_1.5.0
[79] corpcor_1.6.10 Rcpp_1.1.0 gridExtra_2.3
[82] checkmate_2.3.3 xfun_0.55 pkgconfig_2.0.3
---
title: "Hierarchical Linear Modeling"
output:
html_document:
code_folding: show
---
# Preamble
## Install Libraries
```{r}
#install.packages("remotes")
#remotes::install_github("DevPsyLab/petersenlab")
```
## Load Libraries
```{r}
library("petersenlab")
library("lme4")
library("nlme")
library("lmerTest")
library("MASS")
library("MCMCglmm")
library("mgcv")
library("parallelly")
library("performance")
library("ggplot2")
```
## Import Data
```{r}
#| eval: false
mydata <- read.csv("https://osf.io/cqn3d/download")
```
```{r}
#| include: false
mydata <- read.csv("./data/nlsy_math_long.csv") #https://osf.io/cqn3d/download
```
## Simulate Data
```{r}
set.seed(52242)
mydata$outcome <- rpois(nrow(mydata), 4)
```
# Terms
These models go by a variety of different terms:
- hierarchical linear model (HLM)
- multilevel model (MLM)
- mixed effects model
- mixed model
# Overview
- <https://isaactpetersen.github.io/Principles-Psychological-Assessment/reliability.html#sec-mixedModels>
- <https://isaactpetersen.github.io/Fantasy-Football-Analytics-Textbook/mixed-models.html>
# Pre-Model Computation
It can be helpful to center the age/time variable so that the intercept in a growth curve model has meaning.
For instance, we can subtract the youngest participant age to set the intercepts to be the earliest age in the sample.
```{r}
mydata$ageYears <- mydata$age / 12
mydata$ageMonthsCentered <- mydata$age - min(mydata$age, na.rm = TRUE)
mydata$ageYearsCentered <- mydata$ageMonthsCentered / 12
mydata$ageYearsCenteredSquared <- mydata$ageYearsCentered ^ 2
mydata$sex <- factor(
mydata$female,
levels = c(0, 1),
labels = c("male", "female")
)
```
# Estimator: ML or REML
For small sample sizes, restricted maximum likelihood (REML) is preferred over maximum likelihood (ML).
ML preferred when there is a small number (< 4) of fixed effects; REML is preferred when there are more (> 4) fixed effects.
The greater the number of fixed effects, the greater the difference between REML and ML estimates.
Likelihood ratio (LR) tests for REML require exactly the same fixed effects specification in both models.
So, to compare models with different fixed effects with an LR test (to determine whether to include a particular fixed effect), ML must be used.
In contrast to the maximum likelihood estimation, REML can produce unbiased estimates of variance and covariance parameters, variance estimates are larger in REML than ML.
To compare whether an effect should be fixed or random, use REML.
To simultaneously compare fixed and random effects, use ML.
# Linear Mixed Models {#sec-linear}
The following models are models that are fit in a linear mixed modeling framework.
## Growth Curve Models {#sec-gcm}
### Linear Growth Curve Model {#sec-linearGCM}
#### Plot Observed Growth Curves
```{r}
ggplot(
data = mydata,
mapping = aes(
x = ageYears,
y = math,
group = id)) +
geom_line() +
labs(
x = "Age (years)",
y = "Math Score"
) +
theme_classic()
```
#### `lme4`
```{r}
linearMixedModel <- lmer(
math ~ female + ageYearsCentered + female:ageYearsCentered + (1 + ageYearsCentered | id), # random intercepts and slopes; sex as a fixed-effect predictor of the intercepts and slopes
data = mydata,
REML = FALSE, #for ML
na.action = na.exclude,
control = lmerControl(optimizer = "bobyqa"))
summary(linearMixedModel)
print(effectsize::standardize_parameters(
linearMixedModel,
method = "refit"),
digits = 2)
```
##### Prototypical Growth Curve
```{r}
newData <- expand.grid(
female = c(0, 1),
ageYears = c(
min(mydata$ageYears, na.rm = TRUE),
max(mydata$ageYears, na.rm = TRUE))
)
newData$ageYearsCentered <- newData$ageYears - min(newData$ageYears)
newData$sex <- NA
newData$sex[which(newData$female == 0)] <- "male"
newData$sex[which(newData$female == 1)] <- "female"
newData$sex <- as.factor(newData$sex)
newData$predictedValue <- predict( # predict.merMod
linearMixedModel,
newdata = newData,
re.form = NA
)
ggplot(
data = newData,
mapping = aes(
x = ageYears,
y = predictedValue,
color = sex)) +
geom_line() +
labs(
x = "Age (years)",
y = "Math Score",
color = "Sex"
) +
theme_classic()
```
##### Individuals' Growth Curves
```{r}
mydata$predictedValue <- predict(
linearMixedModel,
newdata = mydata,
re.form = NULL
)
ggplot(
data = mydata,
mapping = aes(
x = ageYears,
y = predictedValue,
group = id,
color = sex)) +
geom_line(
alpha = 0.4
) +
labs(
x = "Age (years)",
y = "Math Score",
color = "Sex"
) +
theme_classic()
```
##### Individuals' Trajectories Overlaid with Prototypical Trajectory
```{r}
ggplot(
data = mydata,
mapping = aes(
x = ageYears,
y = predictedValue,
group = id)) +
geom_line(
color = "gray",
alpha = 0.5
) +
geom_line(
data = newData,
mapping = aes(
x = ageYears,
y = predictedValue,
group = sex,
color = sex),
linewidth = 2) +
labs(
x = "Age (years)",
y = "Math Score",
color = "Sex"
) +
theme_classic()
```
##### Extract Random Effects {#sec-extractRandomEffects}
```{r}
#| output: false
ranef(linearMixedModel)
```
```{r}
#| echo: false
linearMixedModel_re <- ranef(linearMixedModel)
linearMixedModel_re$id <- head(linearMixedModel_re$id)
linearMixedModel_re
```
#### `nlme`
```{r}
linearMixedModel_nlme <- lme(
math ~ female + ageYearsCentered + female:ageYearsCentered, # sex as a fixed-effect predictor of the intercepts and slopes
random = ~ 1 + ageYearsCentered|id, # random intercepts and slopes
data = mydata,
method = "ML",
na.action = na.exclude)
summary(linearMixedModel_nlme)
print(effectsize::standardize_parameters(
linearMixedModel_nlme,
method = "refit"),
digits = 2)
```
#### Intraclass Correlation Coefficent {#sec-icc}
```{r}
icc(linearMixedModel)
icc(linearMixedModel_nlme)
```
### Growth Curve Model with Timepoint-Specific Errors {#sec-timepointSpecificErrorsGCM}
Adapted from Usami & Murayama (2018):
```{r}
timepointSpecificErrorsMixedModel <- lmer(
math ~ female + ageYearsCentered + female:ageYearsCentered + (1 | id) + (1 | ageYearsCentered), # timepoint-specific errors: observations are cross-classified with person and timepoint; sex as a fixed-effect predictor of the intercepts and slopes
data = mydata,
REML = FALSE, #for ML
na.action = na.exclude,
control = lmerControl(optimizer = "bobyqa"))
summary(timepointSpecificErrorsMixedModel)
print(effectsize::standardize_parameters(
timepointSpecificErrorsMixedModel,
method = "refit"),
digits = 2)
```
### Quadratic Growth Curve Model {#sec-quadraticGCM}
When using higher-order polynomials, we could specify contrast codes for time to reduce multicollinearity between the linear and quadratic growth factors: <https://tdjorgensen.github.io/SEM-in-Ed-compendium/ch27.html#saturated-growth-model>
```{r}
factorLoadings <- poly(
x = c(0,1,2,3), # times (can allow unequal spacing)
degree = 2)
factorLoadings
linearLoadings <- factorLoadings[,1]
quadraticLoadings <- factorLoadings[,2]
linearLoadings
quadraticLoadings
```
#### Fit Model
```{r}
quadraticGCM <- lmer(
math ~ female + ageYearsCentered + ageYearsCenteredSquared + female:ageYearsCentered + female:ageYearsCenteredSquared + (1 + ageYearsCentered | id), # random intercepts and linear slopes; fixed quadratic slopes; sex as a fixed-effect predictor of the intercepts and slopes
data = mydata,
REML = FALSE, #for ML
na.action = na.exclude,
control = lmerControl(optimizer = "bobyqa"))
summary(quadraticGCM)
print(effectsize::standardize_parameters(
quadraticGCM,
method = "refit"),
digits = 2)
```
This is equivalent to:
```{r}
quadraticGCM <- lmer(
math ~ female + ageYearsCentered + I(ageYearsCentered^2) + female:ageYearsCentered + female:I(ageYearsCentered^2) + (1 + ageYearsCentered | id), # random intercepts and slopes; sex as a fixed-effect predictor of the intercepts and slopes
data = mydata,
REML = FALSE, #for ML
na.action = na.exclude,
control = lmerControl(optimizer = "bobyqa"))
summary(quadraticGCM)
print(effectsize::standardize_parameters(
quadraticGCM,
method = "refit"),
digits = 2)
```
#### Prototypical Growth Curve
```{r}
newData <- expand.grid(
female = c(0, 1),
ageYears = seq(from = min(mydata$ageYears, na.rm = TRUE), to = max(mydata$ageYears, na.rm = TRUE), length.out = 10000))
newData$ageYearsCentered <- newData$ageYears - min(newData$ageYears)
newData$ageYearsCenteredSquared <- newData$ageYearsCentered ^ 2
newData$sex <- NA
newData$sex[which(newData$female == 0)] <- "male"
newData$sex[which(newData$female == 1)] <- "female"
newData$sex <- as.factor(newData$sex)
newData$predictedValue <- predict( # predict.merMod
quadraticGCM,
newdata = newData,
re.form = NA
)
ggplot(
data = newData,
mapping = aes(
x = ageYears,
y = predictedValue,
color = sex)) +
geom_line() +
labs(
x = "Age (years)",
y = "Math Score",
color = "Sex"
) +
theme_classic()
```
#### Individuals' Growth Curves
```{r}
mydata$predictedValue <- predict(
quadraticGCM,
newdata = mydata,
re.form = NULL
)
ggplot(
data = mydata,
mapping = aes(
x = ageYears,
y = predictedValue,
group = id,
color = sex)) +
geom_line(
stat = "smooth",
method = "lm",
formula = y ~ x + I(x^2),
se = FALSE,
linewidth = 0.5,
alpha = 0.4
) +
labs(
x = "Age (years)",
y = "Math Score",
color = "Sex"
) +
theme_classic()
```
#### Individuals' Trajectories Overlaid with Prototypical Trajectory
```{r}
ggplot(
data = mydata,
mapping = aes(
x = ageYears,
y = predictedValue,
group = id)) +
geom_line(
stat = "smooth",
method = "lm",
formula = y ~ x + I(x^2),
se = FALSE,
linewidth = 0.5,
color = "gray",
alpha = 0.4
) +
geom_line(
data = newData,
mapping = aes(
x = ageYears,
y = predictedValue,
group = sex,
color = sex),
linewidth = 2) +
labs(
x = "Age (years)",
y = "Math Score",
color = "Sex"
) +
theme_classic()
```
#### Extract Random Effects
```{r}
#| output: false
ranef(quadraticGCM)
```
```{r}
#| echo: false
quadraticGCM_re <- ranef(quadraticGCM)
quadraticGCM_re$id <- head(quadraticGCM_re$id)
quadraticGCM_re
```
### Spline Growth Curve Model {#sec-splineGCM}
#### Create Knot
```{r}
mydata$knot <- NA
mydata$knot[which(mydata$ageYears <= 11)] <- 0
mydata$knot[which(mydata$ageYears > 11)] <- 1
```
#### Fit Model
```{r}
splineGCM <- lmer(
math ~ female + ageYearsCentered + female:ageYearsCentered + knot + knot:ageYearsCentered + female:knot + female:knot:ageYearsCentered + (1 + ageYearsCentered | id), # random intercepts and linear slopes; fixed quadratic slopes; sex as a fixed-effect predictor of the intercepts and slopes
data = mydata,
REML = FALSE, #for ML
na.action = na.exclude,
control = lmerControl(optimizer = "bobyqa"))
summary(splineGCM)
print(effectsize::standardize_parameters(
splineGCM,
method = "refit"),
digits = 2)
```
#### Prototypical Growth Curve
```{r}
newData <- expand.grid(
female = c(0, 1),
ageYears = seq(from = min(mydata$ageYears, na.rm = TRUE), to = max(mydata$ageYears, na.rm = TRUE), length.out = 10000))
newData$ageYearsCentered <- newData$ageYears - min(newData$ageYears)
newData$knot <- NA
newData$knot[which(newData$ageYears <= 11)] <- 0
newData$knot[which(newData$ageYears > 11)] <- 1
newData$sex <- NA
newData$sex[which(newData$female == 0)] <- "male"
newData$sex[which(newData$female == 1)] <- "female"
newData$sex <- as.factor(newData$sex)
newData$predictedValue <- predict( # predict.merMod
splineGCM,
newdata = newData,
re.form = NA
)
ggplot(
data = newData,
mapping = aes(
x = ageYears,
y = predictedValue,
color = sex)) +
geom_line() +
labs(
x = "Age (years)",
y = "Math Score",
color = "Sex"
) +
theme_classic()
```
#### Individuals' Growth Curves
```{r}
mydata$predictedValue <- predict(
splineGCM,
newdata = mydata,
re.form = NULL
)
ggplot(
data = mydata,
mapping = aes(
x = ageYears,
y = predictedValue,
group = id,
color = sex)) +
geom_line(
alpha = 0.4
) +
labs(
x = "Age (years)",
y = "Math Score",
color = "Sex"
) +
theme_classic()
```
#### Individuals' Trajectories Overlaid with Prototypical Trajectory
```{r}
ggplot(
data = mydata,
mapping = aes(
x = ageYears,
y = predictedValue,
group = id)) +
geom_line(
color = "gray",
alpha = 0.5
) +
geom_line(
data = newData,
mapping = aes(
x = ageYears,
y = predictedValue,
group = sex,
color = sex),
linewidth = 2) +
labs(
x = "Age (years)",
y = "Math Score",
color = "Sex"
) +
theme_classic()
```
#### Extract Random Effects
```{r}
#| output: false
ranef(splineGCM)
```
```{r}
#| echo: false
splineGCM_re <- ranef(splineGCM)
splineGCM_re$id <- head(splineGCM_re$id)
splineGCM_re
```
# Generalized Linear Mixed Models {#sec-generalized}
<https://bbolker.github.io/mixedmodels-misc/glmmFAQ.html> (archived at <https://perma.cc/9RFS-BCE7>; source code: <https://github.com/bbolker/mixedmodels-misc/blob/master/glmmFAQ.rmd>)
## `lmer`
```{r}
generalizedLinearMixedModel <- glmer(
outcome ~ female + ageYearsCentered + (ageYearsCentered | id),
family = poisson(link = "log"),
data = mydata,
na.action = na.exclude)
summary(generalizedLinearMixedModel)
print(effectsize::standardize_parameters(
generalizedLinearMixedModel,
method = "refit"),
digits = 2)
```
## `MASS`
```{r}
glmmPQLmodel <- glmmPQL(
outcome ~ female + ageYearsCentered,
random = ~ 1 + ageYearsCentered|id,
family = poisson(link = "log"),
data = mydata)
summary(glmmPQLmodel)
print(effectsize::standardize_parameters(
glmmPQLmodel,
method = "refit"),
digits = 2)
```
## `MCMCglmm`
```{r}
MCMCglmmModel <- MCMCglmm(
outcome ~ female + ageYearsCentered,
random = ~ us(ageYearsCentered):id,
family = "poisson",
data = na.omit(mydata[,c("id","outcome","female","ageYearsCentered")]))
summary(MCMCglmmModel)
```
# Nonlinear Mixed Models {#sec-nonlinear}
## Generalized Additive Mixed Model {#sec-gamm}
### Fit Model
```{r}
#| label: parallel-cores-mixed-models
#| include: false
num_cores <- parallelly::availableCores()
num_true_cores <- parallelly::availableCores(logical = FALSE)
```
```{r}
#| eval: false
num_cores <- parallelly::availableCores() - 1
num_true_cores <- parallelly::availableCores(logical = FALSE) - 1
```
```{r}
gamModel <- mgcv::gam(
math ~ s(ageYearsCentered, by = sex) + s(id, ageYearsCentered, bs = "re"),
data = mydata,
nthreads = num_cores
)
summary(gamModel)
print(effectsize::standardize_parameters(
gamModel,
method = "refit"),
digits = 2)
```
### Prototypical Growth Curve
```{r}
newData <- expand.grid(
female = c(0, 1),
ageYears = seq(from = min(mydata$ageYears, na.rm = TRUE), to = max(mydata$ageYears, na.rm = TRUE), length.out = 10000))
newData$ageYearsCentered <- newData$ageYears - min(newData$ageYears)
newData$ageYearsCenteredSquared <- newData$ageYearsCentered ^ 2
newData$sex <- NA
newData$sex[which(newData$female == 0)] <- "male"
newData$sex[which(newData$female == 1)] <- "female"
newData$sex <- as.factor(newData$sex)
# add a dummy id
newData$id <- mydata$id[1]
newData$predictedValue <- predict(
gamModel,
newdata = newData,
exclude = "s(ageYearsCentered,id)"
)
ggplot(
data = newData,
mapping = aes(
x = ageYears,
y = predictedValue,
color = sex)) +
geom_line() +
labs(
x = "Age (years)",
y = "Math Score",
color = "Sex"
) +
theme_classic()
```
### Individuals' Growth Curves
```{r}
mydata$predictedValue <- predict(
gamModel,
newdata = mydata
)
gamMethod <- function(formula, data, weights, ...) { # from here: https://stackoverflow.com/a/76762792/2029527
if(nrow(data) < 4) return(lm(y ~ x, data = data, ...))
else mgcv::gam(y ~ s(x, bs = "cs", k = 3), data = data, ...)
}
ggplot(
data = mydata,
mapping = aes(
x = ageYears,
y = predictedValue,
group = id,
color = sex)) +
geom_line(
stat = "smooth",
method = gamMethod,
se = FALSE,
linewidth = 0.5,
alpha = 0.4
) +
labs(
x = "Age (years)",
y = "Math Score",
color = "Sex"
) +
theme_classic()
```
### Individuals' Trajectories Overlaid with Prototypical Trajectory
```{r}
ggplot(
data = mydata,
mapping = aes(
x = ageYears,
y = predictedValue,
group = id)) +
geom_line(
stat = "smooth",
method = gamMethod,
se = FALSE,
linewidth = 0.5,
color = "gray",
alpha = 0.4
) +
geom_line(
data = newData,
mapping = aes(
x = ageYears,
y = predictedValue,
group = sex,
color = sex),
linewidth = 2) +
labs(
x = "Age (years)",
y = "Math Score",
color = "Sex"
) +
theme_classic()
```
## Asymptotic Growth {#sec-nonlinearAsymptotic}
```{r}
nonlinearModel <- nlme(
height ~ SSasymp(age, Asym, R0, lrc),
data = Loblolly,
fixed = Asym + R0 + lrc ~ 1,
random = Asym ~ 1)
summary(nonlinearModel)
```
# Robust Mixed Models
To evaluate the extent to which a finding could driven by outliers, this could be done in a number of different ways, such as:
- identifying and excluding influential observations based on DFBETAS or Cook’s distance (Nieuwenhuis, Grotenhuis, & Pelzer, 2012)
- fitting mixed models using rank-based estimation (Bilgic & Susmann, 2013; Finch, 2017) or robust estimating equations (Koller, 2016)
- estimating robust standard errors using a sandwich estimator (Wang & Merkle, 2018)
# Assumptions
The within-group errors:
1. are independent
2. are identically normally distributed
3. have mean zero and variance sigma-squared
4. are independent of the random effects
The random effects:
5. are normally distributed
6. have mean zero and covariance matrix Psi (not depending on the group)
7. are independent for different groups
# Examining Model Assumptions
## Resources
Pinheiro and Bates (2000) book (p. 174, section 4.3.1)
<https://stats.stackexchange.com/questions/77891/checking-assumptions-lmer-lme-mixed-models-in-r> (archived at <https://perma.cc/J5GC-PCUT>)
## QQ Plots
Make QQ plots for each level of the random effects.
Vary the level from 0, 1, to 2 so that you can check the between- and within-subject residuals.
```{r}
qqnorm(
linearMixedModel_nlme,
~ ranef(., level = 1))
```
## PP Plots
```{r}
ppPlot(linearMixedModel)
```
## QQ Plot of residuals
```{r}
qqnorm(resid(linearMixedModel))
qqline(resid(linearMixedModel))
```
## Plot residuals
```{r}
plot(linearMixedModel)
```
## Plot residuals by group (in the example below, level 2 represents the individual)
```{r}
plot(
linearMixedModel,
as.factor(id) ~ resid(.),
abline = 0,
xlab = "Residuals")
```
## Plot residuals by levels of a predictor
```{r}
plot(
linearMixedModel_nlme,
resid(., type = "p") ~ fitted(.) | female) #type = "p" specifies standardized residuals
```
## Can model heteroscedasticity of the within-group error with the weights argument
```{r}
linearMixedModel_nlmeVarStructure <- lme(
math ~ female + ageYearsCentered,
random = ~ 1 + ageYearsCentered|id,
weights = varIdent(form = ~ 1 | female),
method = "ML",
data = mydata,
na.action = na.exclude)
summary(linearMixedModel_nlmeVarStructure)
```
## Plot observed and fitted values
```{r}
plot(
linearMixedModel,
math ~ fitted(.))
```
## Plot QQ plot of residuals by levels of a predictor
```{r}
qqnorm(linearMixedModel_nlme, ~ resid(.) | female)
qqnorm(linearMixedModel_nlme, ~ resid(.))
```
## QQ plot of random effects
Make QQ plots for each level of the random effects.
Vary the level from 0, 1, to 2 so that you can check the between- and within-subject residuals.
```{r}
qqnorm(
linearMixedModel_nlme,
~ ranef(., level = 0))
qqnorm(
linearMixedModel_nlme,
~ ranef(., level = 1))
qqnorm(
linearMixedModel_nlme,
~ ranef(., level = 2))
```
## QQ plot of random effects by levels of a predictor
```{r}
qqnorm(
linearMixedModel_nlme,
~ ranef(., level = 1) | female)
```
## Pairs plot
```{r}
pairs(linearMixedModel_nlme)
pairs(
linearMixedModel_nlme,
~ ranef(., level = 1) | female)
```
## Variance functions for modeling heteroscedasticity
- `varFixed`: fixed variance
- `varIdent`: different variances per stratum
- `varPower`: power of covariate
- `varExp`: exponential of covariate
- `varConstPower`: constant plus power of covariate
- `varComb`: combination of variance functions
## Correlation structures for modeling dependence
- `corCompSymm`: compound symmetry
- `corSymm`: general
- `corAR1`: autoregressive of order 1
- `corCAR1`: continuous-time AR(1)
- `corARMA`: autoregressive-moving average
- `corExp`: exponential
- `corGaus`: Gaussian
- `corLin`: linear
- `corRatio`: rational quadratic
- `corSpher`: spherical
# Power Analysis {#sec-powerAnalysis}
- `mlmpower`: <https://github.com/bkeller2/mlmpower> ([Enders et al., 2025](https://doi.org/10.1037/met0000614))
- <https://aguinis.shinyapps.io/ml_power/>
- <https://www.causalevaluation.org/power-analysis.html>
- <https://powerupr.shinyapps.io/index/>
- <https://koumurayama.shinyapps.io/tmethod_mlm/>
- <https://webpower.psychstat.org/wiki/models/index>
# Session Info
```{r}
#| code-fold: true
sessionInfo()
```