Adaptive Random Effects/Coefficients Modeling

George J. Knafl

doi:10.4236/ojs.2024.142009

Open Journal of Statistics > Vol.14 No.2, April 2024

Adaptive Random Effects/Coefficients Modeling

George J. Knafl
School of Nursing, University of North Carolina at Chapel Hill, Chapel Hill, USA.
DOI: 10.4236/ojs.2024.142009 PDF HTML XML 196 Downloads 689 Views

Abstract

Adaptive fractional polynomial modeling of general correlated outcomes is formulated to address nonlinearity in means, variances/dispersions, and correlations. Means and variances/dispersions are modeled using generalized linear models in fixed effects/coefficients. Correlations are modeled using random effects/coefficients. Nonlinearity is addressed using power transforms of primary (untransformed) predictors. Parameter estimation is based on extended linear mixed modeling generalizing both generalized estimating equations and linear mixed modeling. Models are evaluated using likelihood cross-validation (LCV) scores and are generated adaptively using a heuristic search controlled by LCV scores. Cases covered include linear, Poisson, logistic, exponential, and discrete regression of correlated continuous, count/rate, dichotomous, positive continuous, and discrete numeric outcomes treated as normally, Poisson, Bernoulli, exponentially, and discrete numerically distributed, respectively. Example analyses are also generated for these five cases to compare adaptive random effects/coefficients modeling of correlated outcomes to previously developed adaptive modeling based on directly specified covariance structures. Adaptive random effects/coefficients modeling substantially outperforms direct covariance modeling in the linear, exponential, and discrete regression example analyses. It generates equivalent results in the logistic regression example analyses and it is substantially outperformed in the Poisson regression case. Random effects/coefficients modeling of correlated outcomes can provide substantial improvements in model selection compared to directly specified covariance modeling. However, directly specified covariance modeling can generate competitive or substantially better results in some cases while usually requiring less computation time.

Keywords

Adaptive Regression, Correlated Outcomes, Extended Linear Mixed Modeling, Fractional Polynomials, Likelihood Cross-Validation, Random Effects/Coefficients

Share and Cite:

Knafl, G. (2024) Adaptive Random Effects/Coefficients Modeling. Open Journal of Statistics, 14, 179-206. doi: 10.4236/ojs.2024.142009.

1. Introduction

Covariance structures considered in linear mixed modeling (LMM) of correlated continuous outcomes can be directly specified using either unstructured covariance or covariance pattern matrices [1] . In the unstructured case, correlations are treated as all different from each other. In the covariance pattern case, correlations have well known patterns, for example, exchangeable (i.e., constant) correlations as used in repeated measures analyses and autoregressive correlations as used in time series analyses. Variances are usually treated as all the same or all different, but can be modeled more generally using generalized linear modeling. An alternative for modeling covariances is based on what is called both random effects [2] and random coefficients [3] . In this case, covariances are modeled using polynomials in random effects/coefficients combined with directly specified covariance matrices. Since these models also have means based on polynomials in fixed effects/coefficients, they are called mixed effects models. In what follows, “effects/coefficients” will be referred to as “coefficients” for brevity.

Mixed effects models for correlated continuous outcomes can be generalized to address correlated categorical outcomes. The generalized estimating equations (GEE) approach [4] combines generalized linear models for the means [5] with directly specified covariance structures. The generalized linear mixed modeling (GLMM) approach [1] is based on random coefficients. In the GEE case, variances are treated as a specific function of the means possibly multiplied by a constant dispersion parameter. The correlations are estimated using residuals. In the GLMM case, parameters can be estimated using a pseudo-likelihood approach [6] . An alternative approach used here is extended linear mixed modeling (ELMM) [7] , extending both GEE and LMM, with parameter estimation based on maximizing the multivariate normal density evaluated using residuals for general correlated outcomes, not just normally distributed correlated outcomes. A thorough assessment of the usefulness of ELMM for modeling correlated outcomes is provided in [7] including bootstrapping assessments of its effectiveness. However, ELMM analyses have so far only considered directly specified covariance structures.

Models for the means of general correlated outcomes are often theoretically based. Models for the variances are usually based on specific functions of the means. For example, count outcomes assumed to be Poisson distributed have variances equal to the means. These specific variance functions can be multiplied by dispersions, which are usually treated as constant. However, more general adaptive modeling [7] [8] is possible both for the means and the dispersions in terms of polynomials in power transforms of primary (untransformed) predictors. These are called fractional polynomials [9] . For example, the dependence of the means on time could be modeled using a polynomial based on one or more power transforms of time (e.g., time^−0.15 and time^2.4) with or without an intercept. The power transforms are chosen using a heuristic search (an overview is given in Section 2.8). Since random coefficients models are based on polynomials, these polynomials can also be fractional. For example, covariances for a longitudinal outcome might be a nonlinear function of time. The heuristic search needs only to be extended to handle an extra set of power transforms. The use of fractional polynomials is especially important to consider since it provides for nonlinearity in continuous predictors of means, variances/dispersions, and random coefficients. The consideration of nonlinearity for random coefficients is especially novel with the potential for unique insights into dependence of covariance on predictors that might also influence dependence of means on predictors.

The purpose of this paper is to extend ELMM to account for random coefficients in terms of fractional polynomials and to compare the impact of such models to directly specified covariance models using analyses of a variety of correlated outcome data sets. Cases considered include linear, Poisson, logistic, exponential, and discrete regression models for correlated continuous, count/rate, dichotomous, positive continuous, and discrete numeric outcomes, respectively. Means and variances/dispersions are based on generalized linear models using appropriate link functions. Random coefficients are treated as independent.

2. Methods

Let $y_{i, j}$ denote observed outcome measurements for matched sets i of conditions j (e.g., quality of life measurements for study participants observed at different times) for $1 \leq i \leq I$ and $j \in J (i)$ , subsets of size $J_{i}$ of the full set of possible conditions $1 \leq j \leq J$ . Let $μ_{i, j} = E y_{i, j}$ denote the expected values or means of $y_{i, j}$ satisfying $y_{i, j} = μ_{i, j} + e_{i, j}$ for errors $e_{i, j}$ . Combine $y_{i, j}$ , $μ_{i, j}$ , and $e_{i, j}$ for $j \in J (i)$ into the $J_{i} \times 1$ vectors $y_{i}$ , $μ_{i}$ , and $e_{i}$ satisfying

$y_{i} = μ_{i} + e_{i} .$

Also let $Σ_{i} = COV (y_{i})$ denote the covariance matrices for $y_{i}$ and

$n = \sum_{i = 1}^{I} J_{i}$

be the number of outcome measurements.

2.1. Generalized Linear Models for the Means

The means $μ_{i, j}$ can be modeled in terms of predictors $x_{i, j, m}$ for $1 \leq m \leq M$ . Specifically,

$g (μ_{i, j}) = x_{i, j}^{T} \cdot β$

where $x_{i, j}^{T}$ denotes the transpose of the M × 1 vector $x_{i, j}$ with entries $x_{i, j, m}$ , $β$ is a M × 1 vector of fixed coefficient parameters $β_{m}$ for $1 \leq m \leq M$ , and g is called the link function. Let $μ_{i}$ be the M × 1 vector with entries $μ_{i, j}$ . When $x_{i, j, 1} = 1$ for all i and j, $β_{1}$ is an intercept parameter. This is a generalized linear model for the means, but it allows for nonlinearity of the link transformed means in terms of primary predictors $u_{i, j}$ in the sense that model predictors can equal power transforms of $u_{i, j}$ , that is,

$x_{i, j, m} = u_{i, j}^{p}$

for powers p and for some m. A description of alternative regression models for the means follows.

2.1.1. Linear Regression

The outcome measurements $y_{i, j}$ are continuous, real-valued, and treated as normally distributed with identity link function

$g (μ_{i, j}) = μ_{i, j}$

so that

$μ_{i, j} = x_{i, j}^{T} \cdot β .$

The identity link function is the canonical choice for this case.

2.1.2. Poisson Regression

The outcome measurements $y_{i, j}$ are discrete, count-valued, and treated as Poisson distributed with natural log link function

$g (μ_{i, j}) = \log (μ_{i, j})$

so that

$μ_{i, j} = \exp (x_{i, j}^{T} \cdot β) .$

The natural log link function is the canonical choice for this case.

The model for the means of the count outcome measurements $y_{i, j}$ are sometimes converted from a model for counts to a model for rates $r_{i, j} = y_{i, j} / t_{i, j}$ relative to associated totals $t_{i, j}$ using offsets $o_{i, j} = \log (t_{i, j})$ . Formally, let

$g (μ_{i, j}) = \log (μ_{i, j}) + o_{i, j}$

so that

$μ_{i, j} = \exp (x_{i, j}^{T} \cdot β) \cdot \exp (o_{i, j}) = \exp (x_{i, j}^{T} \cdot β) \cdot t_{i, j}$

and then the means $E r_{i, j}$ of $r_{i, j}$ satisfy

$E r_{i, j} = \frac{μ_{i, j}}{t_{i, j}} = \exp (x_{i, j}^{T} \cdot β) .$

2.1.3. Logistic Regression

The outcome measurements $y_{i, j}$ are dichotomous with values 0 and 1 and treated as Bernoulli distributed with logit link function

$g (μ_{i, j}) = \log (\frac{μ_{i, j}}{1 - μ_{i, j}})$

so that

$μ_{i, j} = \frac{\exp (x_{i, j}^{T} \cdot β)}{1 + \exp (x_{i, j}^{T} \cdot β)} .$

The means satisfy

$μ_{i, j} = P (y_{i, j} = 1) .$

The logit link function is the canonical choice for this case.

2.1.4. Exponential Regression

The outcome measurements $y_{i, j}$ are continuous, positive real-valued, and treated as exponentially distributed with natural log link function

$g (μ_{i, j}) = \log (μ_{i, j})$

so that

$μ_{i, j} = \exp (x_{i, j}^{T} \cdot β) .$

The canonical choice for the link function for this case is the inverse function

$g (μ_{i, j}) = \frac{1}{μ_{i, j}}$

so that

$μ_{i, j} = \frac{1}{x_{i, j}^{T} \cdot β}$

with the shortcoming of requiring $x_{i, j}^{T} \cdot β > 0$ . The use of the natural log link function resolves this problem.

2.2. Discrete Regression Using Censored Poisson Probabilities

The outcome measurements $y_{i, j}$ are discrete and numeric-valued with increasing values $d_{u}$ for $1 \leq u \leq U$ so that the means satisfy

$μ_{i, j} = \sum_{u = 1}^{U} (d_{u} \cdot p_{i, j, u})$

where $p_{i, j, u} = P (y_{i, j} = d_{u})$ denote the probabilities of $y_{i, j}$ having values $d_{u}$ . Note that this is not a generalized linear model for the means; the probabilities are modeled instead of the means. The probabilities can be modeled in several ways [10] . The approach considered here is based on censored Poisson probabilities satisfying

$p_{i, j, u} = \exp (- λ_{i, j}) \cdot \frac{λ_{i, j}^{u}}{u!}, 0 \leq u < U,$

$p_{i, j, U} = 1 - \sum_{u = 0}^{U - 1} p_{i, j, u},$

$\log λ_{i, j} = x_{i, j}^{T} \cdot β$

2.3. Directly Specified Correlation Structures

The covariance matrix can be modeled in terms of variances combined with directly specified correlation structures. Specifically, with $Σ_{i, j, j^{'}}$ denoting the entries of $Σ_{i}$ for $j, j^{'} \in J (i)$ , let

$Σ_{i} = D_{i} \cdot R_{i} \cdot D_{i}$

where $D_{i}$ are the $J_{i} \times J_{i}$ diagonal matrices with diagonal entries the standard deviations $Σ_{i, j . j}^{1 / 2}$ and the correlation matrices $R_{i}$ are $J_{i} \times J_{i}$ submatrices of a directly specified $J \times J$ working correlation matrix $R$ with entries $ρ_{j, j^{'}}$ for $1 \leq j, j^{'} \leq J$ and diagonal entries $ρ_{j, j} = 1$ .

The most commonly considered directly specified correlation structures are the independent (IND), exchangeable (EXCH), autoregressive of order 1 (AR1), and unstructured (UN) correlation structures. The IND correlation structure has zero correlations, that is, $ρ_{j, j^{'}} = 0$ for $1 \leq j \neq j^{'} \leq J$ . The EXCH correlation structure has constant correlations, that is, $ρ_{j, j^{'}} = ρ$ for $1 \leq j \neq j^{'} \leq J$ . The AR1 correlation structure has correlations weakening for outcome measurements further apart. Specifically, for the more general spatial AR1 case, $ρ_{j, j^{'}} = ρ^{| d (j) - d (j^{'}) |}$ for $1 \leq j \neq j^{'} \leq J$ where $d (j)$ is a strictly increasing function of the index j and $| d (j) - d (j^{'}) |$ is the absolute value of the difference $d (j) - d (j^{'})$ . The more commonly used non-spatial AR1 case sets $d (j) = j$ , and so treats outcome measurements as equally-spaced even when they are not. The UN correlation structure has the $J \cdot (J - 1) / 2$ distinct correlations $ρ_{j, j^{'}} = ρ_{j^{'}, j}$ for $1 \leq j < j^{'} \leq J$ .

2.4. Variance Modeling

For the generalized linear models of Section 2.1, the variances $Σ_{i, j, j} = VAR (y_{i, j})$ can be modeled in terms of dispersions $φ_{i, j}$ combined with specific functions $V (μ_{i, j})$ of the means $μ_{i, j}$ , that is

$Σ_{i, j, j} = φ_{i, j} \cdot V (μ_{i, j}) .$

For the linear regression case of Section 2.1.1., $V (μ_{i, j}) = 1$ so that the dispersions $φ_{i, j}$ are the same as the variances. For the Poisson regression case of Section 2.1.2, $V (μ_{i, j}) = μ_{i, j}$ . For the logistic regression case of Section 2.1.3, $V (μ_{i, j}) = μ_{i, j} \cdot (1 - μ_{i, j})$ . For the exponential regression case of Section 2.1.4, $V (μ_{i, j}) = μ_{i, j}^{2}$ .

In the discrete regression case of Section 2.2, the variances formally satisfy

$V_{i, j} = \sum_{u = 1}^{U} ({(d_{u} - μ_{i, j})}^{2} \cdot p_{i, j, u}) .$

These can be combined with dispersions to model the variances $Σ_{i, j, j}$ as

$Σ_{i, j, j} = φ_{i, j} \cdot V_{i, j} .$

The dispersions $φ_{i, j}$ of $y_{i, j}$ can be modeled in terms of a second set of predictors ${x^{'}}_{i, j, m^{'}}$ for $1 \leq m^{'} \leq M^{'}$ . Combine these into the $M^{'} \times 1$ vectors ${x^{'}}_{i, j}$ . Also, let $β^{'}$ be a $M^{'} \times 1$ vector of fixed coefficient parameters ${β^{'}}_{m^{'}}$ for $1 \leq m^{'} \leq M^{'}$ and set

$\log φ_{i, j} = {x^{'}}_{i, j}^{T} \cdot β^{'} .$

When ${x^{'}}_{i, j, 1} = 1$ for all i and j, ${β^{'}}_{1}$ is an intercept parameter. This is a generalized linear model for the dispersions using the natural log link function, but it allows for nonlinearity of the log of the dispersions in terms of primary predictors ${u^{'}}_{i, j}$ in the sense that model predictors can equal power transforms of ${u^{'}}_{i, j}$ , that is,

${x^{'}}_{i, j, m^{'}} = {u^{'}}_{i, j}^{p^{'}}$

for powers $p^{'}$ and for some $m^{'}$ .

For the Poisson regression case with offsets $o_{i, j} = \log t_{i, j}$ included to convert the model for the counts $y_{i, j}$ to one for rates $r_{i, j} = y_{i, j} / t_{i, j}$ , the offsets can be added to the dispersions as well as to the means. Specifically, let the dispersions satisfy

$\log φ_{i, j} = {x^{'}}_{i, j}^{T} \cdot β^{'} + o_{i, j}$

so that

$\begin{matrix} Σ_{i, j, j} = φ_{i, j} \cdot μ_{i, j} = \exp ({x^{'}}_{i, j}^{T} \cdot β^{'}) \cdot \exp (x_{s c}^{T} \cdot β) \cdot \exp^{2} (o_{i, j}) \\ = \exp ({x^{'}}_{i, j}^{T} \cdot β^{'}) \cdot \exp (x_{s c}^{T} \cdot β) \cdot t_{i, j}^{2} \end{matrix}$

and then the variances $VAR (r_{i, j})$ of $r_{i, j}$ satisfy

$VAR (r_{i, j}) = \frac{Σ_{i, j, j}}{t_{i, j}^{2}} = \exp ({x^{'}}_{i, j}^{T} \cdot β^{'}) \cdot \exp (x_{s c}^{T} \cdot β) .$

2.5. Random Coefficients Modeling of Covariance

Let ${x^{″}}_{i, j, m^{″}}$ for $1 \leq m^{″} \leq M^{″}$ be a third set of predictors. Combine these into the $M^{″} \times 1$ vectors ${x^{″}}_{i, j}$ and let ${X^{″}}_{i}$ be the $J_{i} \times M^{″}$ matrices with rows equal to ${x^{″}}_{i, j}^{T}$ for $j \in J (i)$ . As for the means and the variances, nonlinearity in terms of primary predictors ${u^{″}}_{i, j}$ can be addressed using model predictors equal to power transforms of ${u^{″}}_{i, j}$ , that is,

${x^{″}}_{i, j, m^{″}} = {u^{″}}_{i, j}^{p^{″}}$

for powers $p^{″}$ and for some $m^{″}$ . Let $γ$ denote a $M^{″} \times 1$ vector of random coefficients $γ_{m^{″}}$ . Assume that

$y_{i} = μ_{i} + {X^{″}}_{i} \cdot γ + {e^{'}}_{i}$

so that

$e_{i} = {X^{″}}_{i} \cdot γ + {e^{'}}_{i} .$

When ${x^{″}}_{i, j, 1} = 1$ for all i and j, $γ_{1}$ is a random intercept.

Assume that $γ$ and ${e^{'}}_{i}$ are independent of each other so that the covariance matrices $Σ_{i}$ for the outcome vectors $y_{i}$ satisfy

$Σ_{i} = {X^{″}}_{i} \cdot Σ^{″} \cdot {X^{″}}_{i}^{T} + {Σ^{'}}_{i}$

where ${Σ^{'}}_{i}$ is the $J_{i} \times J_{i}$ covariance matrix for the error vectors ${e^{'}}_{i}$ and $Σ^{″}$ is the $M^{″} \times M^{″}$ covariance matrix for the random coefficients vector $γ$ . Assume as well that ${Σ^{'}}_{i}$ is a diagonal matrix with diagonal entries ${Σ^{'}}_{i}_{i, j}$ so that the entries ${e^{'}}_{i, j}$ of ${e^{'}}_{i}$ are independent of each other and that $Σ^{″}$ is a diagonal matrix with diagonal entries ${Σ^{″}}_{m^{″}}$ so that the entries $γ_{m^{″}}$ of $γ$ are independent of each other. Note that the matrices ${Σ^{'}}_{i}$ and $Σ^{″}$ can have more complex, non-diagonal structures, but such more general cases are not considered to limit model complexity.

As for dispersions in Section 2.4, the variances ${Σ^{'}}_{i, j}$ can be modeled as

$\log {Σ^{'}}_{i, j} = {x^{'}}_{i, j}^{T} \cdot β^{'} .$

To model the random coefficients, let $β^{″}$ be a $M^{″} \times 1$ vector of parameters ${β^{″}}_{m^{″}}$ for $1 \leq m^{″} \leq M^{″}$ and define the variances ${Σ^{″}}_{m^{″}}$ of $γ_{m^{″}}$ as satisfying

$\log {Σ^{″}}_{m^{″}} = {β^{″}}_{m^{″}} .$

As in Section 2.3, let $Σ_{i, j, j^{'}}$ denote the entries of $Σ_{i}$ for $j, j^{'} \in J (i)$ . The variances $Σ_{i, j, j}$ satisfy

$Σ_{i, j, j} = \sum_{m^{″} = 1}^{M^{″}} ({x^{″}}_{i, j, m^{″}}^{2} \cdot {Σ^{″}}_{m^{″}}) + {Σ^{'}}_{i, j}$

while the covariances $Σ_{i, j, j^{'}}$ for $j \neq j^{'}$ satisfy

$Σ_{i, j, j^{'}} = \sum_{m^{″} = 1}^{M^{″}} ({x^{″}}_{i, j, m^{″}} \cdot {Σ^{″}}_{m^{″}} \cdot {x^{″}}_{i, j^{'}, m^{″}}) .$

Also as in Section 2.3, the covariance matrix $Σ_{i}$ can be expressed as

$Σ_{i} = D_{i} \cdot R_{i} \cdot D_{i}$

where $D_{i}$ are the $J_{i} \times J_{i}$ diagonal matrices with diagonal entries the standard deviations $Σ_{i, j . j}^{1 / 2}$ for $j \in J (i)$ and the correlation matrices $R_{i}$ are $J_{i} \times J_{i}$ matrices with entries

$ρ_{i, j, j^{'}} = \frac{Σ_{i, j, j^{'}}}{Σ_{i, j . j}^{1 / 2} \cdot Σ_{i, j^{'} . j^{'}}^{1 / 2}}$

for $j, j^{'} \in J (i)$ so that the diagonal values satisfy $ρ_{i, j, j} = 1$ .

The special case of a constant random coefficients model with ${x^{″}}_{i, j, 1} = 1$ for all i and j as well as $M^{″} = 1$ combined with a constant variances model with ${x^{'}}_{i, j, 1} = 1$ for all i and j as well as $M^{'} = 1$ has constant variances

$Σ_{i, j, j} = {Σ^{″}}_{1} + \exp ({β^{'}}_{1})$

for $j \in J (i)$ and constant covariances

$Σ_{i, j, j^{'}} = {Σ^{″}}_{1}$

for $j, j^{'} \in J (i)$ , $j \neq j^{'}$ . Consequently, it also has constant correlations with value

$ρ_{i, j, j^{'}} = ρ = \frac{{Σ^{″}}_{1}}{{Σ^{″}}_{1} + \exp ({β^{'}}_{1})}$

for $j, j^{'} \in J (i)$ , $j \neq j^{'}$ . This is the same model as the model based on the directly specified EXCH correlation structure restricted to have constant variances.

For the Poisson regression case with offsets $o_{i, j} = \log t_{i, j}$ included to convert the model for the counts $y_{i, j}$ to one for rates $r_{i, j} = y_{i, j} / t_{i, j}$ , offsets can be added to the covariances as well as to the means. Specifically, let $T_{i}$ denote the $J_{i} \times J_{i}$ diagonal matrices with diagonal entries $t_{i, j} = \exp (o_{i, j})$ and redefine the covariance matrices for the counts $y_{i, j}$ as

$Σ_{i} = T_{i} \cdot ({X^{″}}_{i} \cdot Σ^{″} \cdot {X^{″}}_{i}^{T} + {Σ^{'}}_{i}) \cdot T_{i}$

so that the covariance matrices for the $J_{i} \times 1$ rate vectors $r_{i}$ with entries $r_{i, j}$ satisfy

$COV (r_{i}) = {X^{″}}_{i} \cdot Σ^{″} \cdot {X^{″}}_{i}^{T} + {Σ^{'}}_{i} .$

2.6. Extended Linear Mixed Modeling (ELMM)

Let $S = {i : 1 \leq i \leq I}$ denote the set of indexes for the full set of outcome vectors $y_{i}$ . Also, let

$θ = (\begin{matrix} β \\ β^{'} \\ β^{″} \end{matrix})$

denote the $(M + M^{'} + M^{″}) \times 1$ vector of model parameters with entries denoted as $θ_{v}$ for $1 \leq v \leq M + M^{'} + M^{″}$ . Define the likelihood function $L (S; θ)$ as the product over $i \in S$ of the terms $L_{i} (θ)$ satisfying

$l_{i} (θ) = \log L_{i} (θ) = - \frac{1}{2} \cdot e_{i}^{T} \cdot Σ_{i}^{- 1} \cdot e_{i} - \frac{1}{2} \cdot \log | Σ_{i} | - \frac{1}{2} \cdot J_{i} \cdot \log (2 \cdot π)$

where $| Σ_{i} |$ denotes the determinant of the matrix $Σ_{i}$ and $π$ is the usual constant, that is, using the multivariate normal density. Note that in general this is an extended likelihood and only a true likelihood when the outcomes are normally distributed. However, it is maximized like a likelihood to estimate parameters.

Denote the $(M + M^{'} + M^{″}) \times 1$ gradient vector as

$g (θ) = \sum_{i \in S} \frac{\partial l_{i} (θ)}{\partial θ}$

and the $(M + M^{'} + M^{″}) \times (M + M^{'} + M^{″})$ Hessian matrix as

$H (θ) = \frac{\partial g (θ)}{\partial θ} .$

The gradient vector and Hessian matrix can be computed using standard formulas. Derivatives of the residuals $e_{i}$ are straightforward. Derivatives of the quantities $Σ_{i}^{- 1}$ and $\log | Σ_{i} |$ satisfy

$\frac{\partial Σ_{i}^{- 1}}{\partial θ_{v}} = - Σ_{i}^{- 1} \cdot \frac{\partial Σ_{i}}{\partial θ_{v}} \cdot Σ_{i}^{- 1},$

$\frac{\partial \log | Σ_{i} |}{\partial θ_{v}} = trace (Σ_{i}^{- 1} \cdot \frac{\partial Σ_{i}}{\partial θ_{v}}),$

$\frac{\partial^{2} Σ_{i}^{- 1}}{\partial θ_{v^{'}} \partial θ_{v}} = 2 \cdot Σ_{i}^{- 1} \cdot \frac{\partial Σ_{i}}{\partial θ_{v^{'}}} \cdot Σ_{i}^{- 1} \cdot \frac{\partial Σ_{i}}{\partial θ_{v}} \cdot Σ_{i}^{- 1} - Σ_{i}^{- 1} \cdot \frac{\partial^{2} Σ_{i}}{\partial θ_{v^{'}} \partial θ_{v}} \cdot Σ_{i}^{- 1},$

$\frac{\partial^{2} \log | Σ_{i} |}{\partial θ_{v^{'}} \partial θ_{v}} = - trace (Σ_{i}^{- 1} \cdot \frac{\partial Σ_{i}}{\partial θ_{v^{'}}} \cdot Σ_{i}^{- 1} \cdot \frac{\partial Σ_{i}}{\partial θ_{v}}) + trace (Σ_{i}^{- 1} \cdot \frac{\partial^{2} Σ_{i}}{\partial θ_{v^{'}} \partial θ_{v}}) .$

for $1 \leq v, v^{'} \leq M + M^{'} + M^{″}$ [11] [12] . Let $θ (S)$ denote the solution to the estimating equations $g (θ) = 0$ computed from the measurements with indexes in the set S using an adjustment of Newton’s method (e.g., as provided in [7] ).

Inverses $Σ_{i}^{- 1}$ for covariance matrices based on random coefficients can be ill conditioned. This can be addressed with two adjustments to the estimation process. First, compute inverses $Σ_{i}^{- 1}$ from inverses $D_{i}^{- 1}$ and $R_{i}^{- 1}$ using

$Σ_{i}^{- 1} = D_{i}^{- 1} \cdot R_{i}^{- 1} \cdot D_{i}^{- 1}$

with the diagonal entries of $D_{i}^{- 1}$ bounded if necessary to avoid overflow. Second, restrict the amount of change $Δ θ_{w}$ in the estimated parameter vector from $θ_{w}$ at iteration w to $θ_{w + 1} = θ_{w} + Δ θ_{w}$ at iteration $w + 1$ . For example, let $θ_{w, v}$ denote the entries of $θ_{w}$ and $Δ θ_{w, v}$ the entries of $Δ θ_{w}$ and define

$b_{w} = \max {\frac{| Δ θ_{w, v} |}{\max {| θ_{w, v} |, 1}} : 1 \leq v \leq M + M^{'} + M^{″}},$

that is, the maximum ratio of absolute values of changes in parameter estimates $| Δ θ_{w, v} |$ relative to absolute values of those parameter estimates $| θ_{w, v} |$ (with small absolute parameter estimates increased to the value 1). If $b_{w} < 1$ , use $Δ θ_{w}$ . Otherwise, replace it by $Δ θ_{w} / (2 \cdot b_{w})$ , that is, adjust the change in the parameter estimates proportionally.

ELMM provides for the incorporation of covariance in parameter estimation for general correlated outcomes. Models generated using ELMM can be used to compute LCV scores as addressed in Section 2.7, which can then be used to control the adaptive modeling process as addressed in Section 2.8 to identify nonlinear relationships in predictors for means, variances/dispersions, and random coefficients.

2.7. Likelihood Cross-Validation

Randomly partition the full set S of matched set indexes into K disjoint subsets $S (k)$ , called folds [13] , for $1 \leq k \leq K$ . Define the K-fold likelihood-cross validation (LCV) score as

$LCV = \prod_{k = 1}^{K} L^{1 / n} (S (k); θ (S \ S (k))) .$

Informally, the LCV score equals the geometric average of likelihoods for the folds evaluated with parameter estimates computed using outcome measurements with indexes in the complement of those folds. Larger scores indicate improved models. However, models with smaller LCV scores can be preferable if the decrease in the LCV score is not substantial, especially if they are also simpler. This issue is addressed using LCV ratio tests, computed using the $χ^{2}$ distribution and expressed in terms of a cutoff for a substantial percent decrease (PD) in the LCV score. The cutoff changes with the sample size n. Specifically, if the PD in the LCV score is larger than the cutoff, the model with the larger LCV score provides a substantial improvement over the model with the smaller LCV score or, equivalently, the model with the smaller LCV score is substantially inferior. Otherwise, the model with the smaller LCV score is a competitive alternative to the model with the larger LCV score. A more complete assessment of LCV is provided in [7] .

2.8. Adaptive Regression Modeling

The adaptive regression modeling process for means and variances in the context of directly specified correlation structures of Section 2.3 is covered in detail in [7] and [8] . A brief overview of that process and its extension to handle random coefficients models of Section 2.5 is addressed here. A base model is first expanded by systematically adding in power transforms of primary predictors to either the mean, variance/dispersion, or random coefficient component of the model, whichever generates an expanded model with the best LCV score. Then the final expanded model is contracted by systematically removing extraneous transforms from either the mean, variance/dispersion, or random coefficient component, whichever generates a contracted model with the best LCV score after adjusting the powers of the remaining transforms to improve the LCV score. The expansion process can also consider the addition of geometric combinations, that is, products of power transforms of primary predictors, and power transforms of these products. Geometric combinations generalize standard interactions and so provide for the nonlinear generalization of the concept of moderation [14] . More details on the adaptive modeling process are provided in [7] and [8] . While those references only consider adaptive models for means and variances/dispersions, the modeling process is readily adjusted to also handle random coefficients. Adaptive modeling also applies to hazard regression analyses [15] [16] and to interrupted time series analyses [17] .

3. Example Analyses

Analyses reported in this section are computed using SAS version 9.4 (SAS Institute, Inc., Cary, NC) with a SAS macro available upon request from the author. LCV scores are computed using 10 folds.

The purpose of these analyses is to compare results for adaptive modeling of correlated outcomes using random coefficients to adaptive modeling using directly specified covariance structures. Generated models for these two cases are evaluated using LCV scores and tested for substantial differences using LCV ratio tests.

3.1. Analyses of the Dental Measurement Data

The analyses of this section use data provided by Potthoff and Roy [18] on 108 dental measurements in mm for 11 girls and 17 boys at ages 8, 10, 12, and 14 years old. The cutoff for a substantial percent decrease (PD) for these data is 1.76%. The more general spatial AR1 correlation structure is considered, but this has no effect since the dental measurements are equally spaced. Analyses of these data using directly specified correlation structures are also provided in [7] and [8] .

3.1.1. Adaptive Models Based on Child Age

Table 1 contains a comparison of adaptive models in child age for the IND, EXCH, AR1, and UN directly specified correlation structures and the associated random coefficients model. The random coefficients model generates the best overall 10-fold LCV score but the EXCH correlation structure generates a competitive LCV score with insubstantial PD 0.84%. The estimated constant EXCH correlation is 0.67. In contrast, estimated correlations for the random coefficients model given in Table 2 vary from 0.64 and 0.72. Interestingly, these correlations are stronger the further apart outcome measurements are in time rather than weaker as for AR1 correlations. These analyses require a total of 19.8 minutes.

3.1.2. Adaptive Additive Models Based on Child Age and Child Gender

Table 3 contains a comparison of adaptive additive models in child age and child gender (i.e., without interactions or geometric combinations in child age and child gender) for the EXCH directly specified correlation structure and the associated random coefficients model. The IND, AR1, and UN directly specified correlation structures have not been considered since they generate worse LCV scores than EXCH in Table 1. The random coefficients model generates the better 10-fold LCV score while the EXCH correlation structure generates an LCV score with substantial PD 5.28%. The estimated constant EXCH correlation is 0.77. In contrast, estimated correlations for the random coefficients model given

Table 1. Adaptive models for dental measurements in terms of child age controlling for outcome covariance.

AR1 - autoregressive of order 1; EXCH - exchangeable; IND - independent; LCV - likelihood cross-validation; UN - unstructured. a. The transform 1 corresponds to an intercept.

Table 2. Estimated means, standard deviations, and correlations for the adaptive random coefficients model of dental measurements in terms of child age.

Table 3. Adaptive additive models for dental measurements in terms of child age and child gender controlling for outcome covariance.

EXCH - exchangeable; LCV - likelihood cross-validation. a. The transform 1 corresponds to an intercept; the transform male is the indicator for the child being a boy.

Table 4. Estimated means, standard deviations, and correlations for the adaptive additive random coefficients model of dental measurements in terms of child age and child gender.

in Table 4 are different for girls and boys and vary from 0.84 and 0.92 for girls and from 0.56 and 0.66 for boys. These analyses require a total of 18.0 minutes.

3.1.3. Adaptive Moderation Models Based on Child Age and Child Gender

Table 5 contains a comparison of adaptive moderation models in child age and child gender for the EXCH directly specified correlation structure and the associated random coefficients model. The random coefficients model generates the better 10-fold LCV score while the EXCH correlation structure generates an LCV score with substantial PD 5.01%. For both the EXCH and random coefficients cases, the additive models of Table 3 generate substantial PDs 4.11% and 3.84%, respectively, compared to associated moderation models of Table 5, and so both approaches support the conclusion of a substantial moderation effect. The estimated constant EXCH correlation is 0.72. In contrast, estimated correlations for the random coefficients model given in Table 6 are different for girls and boys with constant value 0.76 for girls and values varying from 0.48 and 0.59 for boys. Means for girls increase from 22.1 mm to 24.0 mm while means are larger for boys and increasing faster from 22.5 mm to 27.4 mm. These analyses require a total of 101.6 minutes. All the analyses of Sections 3.1.1-3.1.3 require a

Table 5. Adaptive moderation models for dental measurements in terms of child age and child gender controlling for outcome covariance correlation.

EXCH - exchangeable; LCV - likelihood cross-validation. a. The transform 1 corresponds to an intercept; the transform male is the indicator for the child being a boy.

Table 6. Estimated means, standard deviations, and correlations for the adaptive moderation random coefficients model of dental measurements in terms of child age and child gender.

total of 139.4 minutes or about 2.3 hours.

3.2. Analyses of the Epilepsy Seizure Data

The analyses of this section use data provided by Thall and Vail [19] on 295 seizure counts for 59 patients, 31 in an intervention group receiving the drug progabide and 28 in a control group receiving a placebo at baseline visit 0 and subsequent visits 1 - 4. The epilepsy counts are converted to epilepsy seizure rates per week using lengths of time in weeks for periods prior to visits (8 weeks prior to visit 0 and 2 weeks prior to visits 1 - 4). The cutoff for a substantial percent decrease (PD) for these data is 0.65%. The more general spatial AR1 correlation structure is considered, but this has no effect since the seizure counts measurements are equally spaced. Estimated variances for models based on directly specified correlation structures are computed using the combination of adaptively generated dispersions and the variance function $V (μ_{i, j}) = μ_{i, j}$ . Analyses of these data using directly specified correlation structures are also provided in [7] and [8] . Analyses of other correlated count/rate outcomes are provided in [20] .

3.2.1. Adaptive Models Based on Clinic Visit

Table 7 contains a comparison of adaptive models in clinic visit for the IND, EXCH, AR1, and UN directly specified correlation structures and the associated random coefficients model. The EXCH correlation structure generates the best overall 10-fold LCV score. The random coefficients model is substantially inferior with PD 7.11% in the LCV score. The estimated constant EXCH correlation is 0.87. These analyses require a total of 83.2 minutes.

3.2.2. Adaptive Additive Models Based on Clinic Visit and Treatment Group

Table 8 contains a comparison of adaptive additive models in clinic visit and treatment group for the EXCH directly specified correlation structure and the associated random coefficients model. The IND, AR1, and UN directly specified correlation structures have not been considered since they generate worse LCV scores than EXCH in Table 7. The same models are generated as in Section 3.2.1 indicating that treatment group does not have an additive effect on seizure rates. These analyses require a total of 25.5 minutes.

3.2.3. Adaptive Moderation Models Based on Clinic Visit and Treatment Group

Table 9 contains a comparison of adaptive moderation models in clinic visit and

Table 7. Adaptive models for epilepsy seizure rates in terms of clinic visit controlling for outcome covariance.

AR1 - autoregressive of order 1; EXCH - exchangeable; IND - independent; LCV - likelihood cross-validation; UN - unstructured. a. The transform 1 corresponds to an intercept.

Table 8. Adaptive additive models for epilepsy seizure rates in terms of clinic visit and treatment group controlling for outcome covariance.

EXCH - exchangeable; LCV - likelihood cross-validation. a. The transform 1 corresponds to an intercept; the transform int is the indicator for being in the intervention group (but it is not included in the models).

Table 9. Adaptive moderation models for epilepsy seizure rates in terms of clinic visit and treatment group controlling for outcome covariance.

EXCH - exchangeable; LCV - likelihood cross-validation. a. The transform 1 corresponds to an intercept; the transform int is the indicator for being in the intervention group.

treatment group for the EXCH directly specified correlation structure and the associated random coefficients model. The same EXCH model is generated as in Sections 3.2.1-3.2.2, but an improved random coefficients model is generated. However, this revised model is substantially inferior to the EXCH model with PD 2.91%. These results indicate that treatment group does not have a substantial moderation effect on seizure rates. These analyses require a total of 56.4 minutes. Under the selected EXCH model, the mean seizure rates are constant with estimated value 2.6 while the standard deviations have estimated values 4.6, 7.6, 8.1, 8.3, and 5.5 at visits 0 - 4, respectively. All the analyses of Sections 3.2.1-3.2.3 require a total of 165.1 minutes or about 2.8 hours.

3.3. Analyses of the Dichotomous Respiratory Status Data

The analyses of this section use data provided by Koch et al. [21] on 555 dichotomous respiratory status levels classified as 0 for poor and 1 for good for 54 patients on active treatment and 57 on a placebo at baseline visit 0 and subsequent visits 1 - 4. The cutoff for a substantial percent decrease (PD) for these data is 0.35%. The more general spatial AR1 correlation structure is considered, but this has no effect since the dichotomous respiratory levels are equally spaced. Estimated variances for models based on directly specified correlation structures are computed using the combination of adaptively generated dispersions and the variance function $V (μ_{i, j}) = μ_{i, j} \cdot (1 - μ_{i, j})$ . Analyses of these data using directly specified correlation structures are also provided in [7] and [8] .

3.3.1. Adaptive Models Based on Clinic Visit

Table 10 contains a comparison of adaptive models in clinic visit for the IND, EXCH, AR1, and UN directly specified correlation structures and the associated random coefficients model. The EXCH correlation structure generates the best overall 10-fold LCV score. The random coefficients model is about the same model. The estimated constant EXCH correlation is 0.48. These analyses require a total of 35.2 minutes.

Table 10. Adaptive models for dichotomous respiratory status levels in terms of clinic visit controlling for outcome covariance.

AR1 - autoregressive of order 1; EXCH - exchangeable; IND - independent; LCV - likelihood cross-validation; UN - unstructured. a. The transform 1 corresponds to an intercept.

Table 11. Adaptive additive models for dichotomous respiratory status levels in terms of clinic visit and treatment group controlling for outcome covariance.

EXCH - exchangeable; LCV - likelihood cross-validation. a. The transform 1 corresponds to an intercept; the transform int is the indicator for being in the intervention group.

3.3.2. Adaptive Additive Models Based on Clinic Visit and Treatment Group

Table 11 contains a comparison of adaptive additive models in clinic visit and treatment group for the EXCH directly specified correlation structure and the associated random coefficients model. The IND, AR1, and UN directly specified correlation structures have not been considered since they generate worse LCV scores than EXCH in Table 10. The random coefficients model has a better LCV score while the EXCH model generates a substantial PD 0.36, just a little over the cutoff of 0.35%. The estimated constant EXCH correlation is 0.47 while the estimated random coefficients model is also constant with almost the same value 0.46. This raises the question of whether there actually is a substantial additive effect to treatment group as indicated by the random coefficients model. Note that the dependence of the means on visit is the same for both models. Adding the indicator for treatment group to the means for the EXCH model generates a better LCV score 0.56748. An adaptive contraction of this EXCH model first removes the intercept from the means generating an LCV score 0.56570, and then removes the indicator for treatment group generating a LCV score 0.56524 as well as means depending of only visit^−0.2, dispersions depending on only an intercept, estimated correlation 0.48, and requiring 0.4 minutes. These results indicate that the random coefficients model of Table 11 can be reduced to a model based only on visit since it is a special case of an EXCH model. These analyses require a total of 113.3 minutes with almost all of it accounted for by the random coefficients model.

3.3.3. Adaptive Moderation Models Based on Clinic Visit and Treatment Group

Table 12 contains a comparison of adaptive moderation models in clinic visit and treatment group for the EXCH directly specified correlation structure and the associated random coefficients model. Essentially equivalent models are generated and both models provide substantial improvements over corresponding additive models of Table 11 with PDs 1.43% and 1.01%, respectively. Consequently, active treatment moderates the effect of visit on dichotomous respiratory status. These analyses require 220.8 minutes. The EXCH model generates the better LCV score in much less time, and so is preferable. Under this EXCH model, the estimated correlation is 0.47. For participants in the control group on a placebo, the probability is constant with estimated value 0.5 over all clinic visits. In contrast, for participants in the intervention group on active treatment, the probability has estimated value starting at 0.50 at baseline visit 0, increases to 0.71 by visit 1, and decreases a little to 0.69 by visit 4. All the analyses of Sections 3.3.1-3.3.3 require a total of 369.3 minutes or about 6.2 hours.

3.4. Analyses of the Blood Lead Level Data

The analyses of this section use data from the Treatment of Lead-exposed Children (TLC) study [22] on 400 blood lead levels in mg/dL for 50 children on treatment with the chelating agent succimer and 50 on a placebo at baseline week 0 and subsequent weeks 1, 4, and 6. The cutoff for a substantial percent decrease (PD) for these data is 0.48%. The more general spatial AR1 correlation structure is important to use since the blood lead levels are not equally spaced. Estimated variances for models based on directly specified correlation structures are computed using the combination of adaptively generated dispersions and the variance function $V (μ_{i, j}) = μ_{i, j}^{2}$ . Analyses of these data using directly specified

Table 12. Adaptive moderation models for dichotomous respiratory status levels in terms of clinic visit and treatment group controlling for outcome covariance.

EXCH - exchangeable; LCV - likelihood cross-validation. a. The transform 1 corresponds to an intercept; the transform int is the indicator for being in the intervention group.

correlation structures are also provided in [7] .

3.4.1. Adaptive Models Based on Time in Weeks

Table 13 contains a comparison of adaptive models in week for the IND, EXCH, AR1, and UN directly specified correlation structures and the associated random coefficients model. The random coefficients model generates the best overall 10-fold LCV score. The UN correlation structure generates the next best LCV score but with substantial PD 1.55%. Estimated correlations for the random coefficients model given in Table 14 vary from 0.52 and 0.84. These analyses require a total of 262.0 minutes.

3.4.2. Adaptive Additive Models Based on Time in Weeks and Treatment Group

Table 15 contains a comparison of adaptive additive models in week and treatment group for the UN directly specified correlation structure and the associated random coefficients model. The IND, EC, and AR1 directly specified correlation structures have not been considered since they generate worse LCV scores than UN in Table 13. The random coefficients model generates the better 10-fold LCV score while the UN model is substantially inferior with PD 2.35% in the

Table 13. Adaptive models for blood lead levels in terms of time in week controlling for outcome covariance.

AR1 - autoregressive of order 1; EXCH - exchangeable; IND - independent; LCV - likelihood cross-validation; UN - unstructured. a. The transform 1 corresponds to an intercept.

Table 14. Estimated means, standard deviations, and correlations for the adaptive random coefficients model of blood lead levels in terms of time in weeks.

Table 15. Adaptive additive models for blood lead levels in terms of time in week and treatment group controlling for outcome covariance.

LCV - likelihood cross-validation; UN - unstructured. a. The transform 1 corresponds to an intercept; the transform int corresponds to the indicator for being in the intervention group.

Table 16. Estimated means, standard deviations, and correlations for the adaptive additive random coefficients model of blood lead levels in terms of time in weeks and treatment group.

LCV score. Estimated correlations for the random coefficients model given in Table 16 differ by treatment group with values varying from 0.76 to 0.84 for participants on a placebo and values varying from 0.28 and 0.32 for participants under succimer treatment. These analyses require a total of 315.7 minutes.

3.4.3. Adaptive Moderation Models Based on Time in Weeks and Treatment Group

Table 17 contains a comparison of adaptive moderation models in week and treatment group for the UN directly specified correlation structure and the associated random coefficients model. The random coefficients model generates the better 10-fold LCV score while the UN model is substantially inferior with PD 2.29% in the LCV score. For both the UN and random coefficients cases, the additive models of Table 15 generate substantial PDs 11.29% and 11.23%, respectively, compared to associated moderation models of Table 17, and so both approaches support the conclusion of a substantial moderation effect. Estimated correlations for the random coefficients model given in Table 18 differ by treatment group with values varying from 0.75 to 0.83 for participants on a placebo and values varying from 0.33 and 0.55 for participants under succimer treatment. For participants in the control group on a placebo, the estimated mean starts at 26.1 mg/dL at baseline week 0 and decreases a little to 24.0 mg/dL by week 6. In contrast, for participants in the intervention group on succimer treatment, the estimated mean also starts at 26.1 mg/dL at baseline week 0, decreases to 13.5 mg/dL by week 1, and increases to 20.7 mg/dL by week 6. These analyses require a total of 522.6 minutes. All the analyses of Sections 3.4.1-3.4.3 require a total of 1100.3 minutes or about 18.3 hours.

3.5. Analyses of the Trichotomous Respiratory Status Data

The analyses of this section use data provided by Koch et al. [21] on 555

Table 17. Adaptive moderation models for blood lead levels in terms of time in week and treatment group controlling for outcome covariance.

LCV - likelihood cross-validation; UN - unstructured. a. The transform 1 corresponds to an intercept; the transform int corresponds to the indicator for being in the intervention group.

Table 18. Estimated means, standard deviations, and correlations for the adaptive moderation random coefficients model of blood lead levels in terms of time in weeks and treatment group.

trichotomous respiratory status levels classified as 0 for poor, 1 for good, and 2 for excellent for 54 patients on active treatment and 57 on a placebo at baseline visit 0 and subsequent visits 1-4. The dichotomous respiratory status outcome of Section 3.3 is related; it has the good and excellent categories combined into a good category. The cutoff for a substantial percent decrease (PD) for these data is 0.35%. The more general spatial AR1 correlation structure is considered, but this has no effect since the trichotomous respiratory levels are equally spaced. Estimated variances for models based on directly specified correlation structures are computed using the combination of adaptively generated dispersions and the standard variance terms $V_{i, j}$ . Analyses of these data using directly specified correlation structures are also provided in [7] . Analyses of other correlated discrete outcomes are provided in [10] .

3.5.1. Adaptive Models Based on Clinic Visit

Table 19 contains a comparison of adaptive models in clinic visit for the IND, EXCH, AR1, and UN directly specified correlation structures and the associated random coefficients model. The random coefficients model generates the best overall 10-fold LCV score. The UN directly specified correlation structure generates the next best LCV score, but with a substantial PD 0.59%. Estimated correlations for the random coefficients model given in Table 20 vary from 0.35 and 0.67. These analyses require a total of 43.9 minutes.

3.5.2. Adaptive Additive Models Based on Clinic Visit and Treatment Group

Table 21 contains a comparison of adaptive additive models in clinic visit and treatment group for the UN directly specified correlation structure and the associated random coefficients model. The IND, EC, and AR1 directly specified correlation structures have not been considered since they generate worse LCV scores than UN in Table 19. The same models are generated as in Table 19. These analyses require a total of 110.3 minutes.

Table 19. Adaptive models for trichotomous respiratory status levels in terms of clinic visit controlling for outcome covariance.

AR1 - autoregressive of order 1; EXCH - exchangeable; IND - independent; LCV - likelihood cross-validation; UN - unstructured. a. The transform 1 corresponds to an intercept.

3.5.3. Adaptive Moderation Models Based on Clinic Visit and Treatment Group

Table 22 contains a comparison of adaptive moderation models in clinic visit and treatment group for the UN directly specified correlation structure and the associated random coefficients model. The random coefficients model generates the better 10-fold LCV score. The UN model is substantially inferior with PD 0.66% in the LCV score. For both the UN and random coefficients cases, the

Table 20. Estimated means, standard deviations, and correlations for the adaptive random coefficients model of trichotomous respiratory status in terms of clinic visit.

Table 21. Adaptive additive models for trichotomous respiratory status levels in terms of clinic visit and treatment group controlling for outcome covariance.

LCV - likelihood cross-validation; UN - unstructured. a. The transform 1 corresponds to an intercept; the transform int corresponds to the indicator for being in the intervention group (but it is not included in the models)

Table 22. Adaptive moderation models for trichotomous respiratory status levels in terms of clinic visit and treatment group controlling for outcome covariance.

LCV - likelihood cross-validation; UN - unstructured. a. The transform 1 corresponds to an intercept; the transform int corresponds to the indicator for being in the intervention group.

additive models of Table 21 generate substantial PDs 1.21% and 1.28%, respectively, compared to associated moderation models of Table 22, and so both approaches support the conclusion of a substantial moderation effect. Estimated correlations for the random coefficients model given in Table 23 are the same for both treatment groups with values varying from 0.36 to 0.66. For participants in the control group on a placebo, the estimated mean starts at 0.90 at baseline visit 0, increases to 1.15 by visit 1 and then decreases a little to 1.12 by visit 4. In contrast, for participants in the intervention group on active treatment, the estimated mean also starts at 0.90 at baseline visit 0, increases more to 1.39 by week 2, and decreases somewhat to 1.27 by visit 4, and so participants on active treatment are more likely to have post-baseline excellent status with value 2. These analyses require a total of 1658.8 minutes. All analyses of Sections 3.5.1-3.5.3 require a total of 1813.0 minutes or about 30.2 hours.

4. Discussion

Methods are formulated for conducting adaptive analyses of general correlated outcomes using heuristic search controlled by likelihood cross-validation (LCV) scores using fractional polynomial models for means, variances/dispersions, and correlations for those outcomes. Means and variances/dispersions are modeled using generalized linear models based on fixed coefficients (or effects). Correlations are modeled using either directly specified correlations or random coefficients (or effects). Parameters are estimated using extended linear mixed modeling based on maximizing the multivariate normal density evaluated using residuals and covariances for general correlated outcomes, not just normally

Table 23. Estimated means, standard deviations, and correlations for the adaptive moderation random coefficients model of trichotomous respiratory status in terms of clinic visit and treatment group.

distributed outcomes. Associated estimating equations are solved using an adjusted version of Newton’s method. Cases considered include linear, Poisson, logistic, exponential, and discrete regression for modeling continuous, count/rate, dichotomous, positive continuous, and discrete numeric outcomes treated as normally, Poisson, Bernoulli, exponentially, and discrete numerically distributed, respectively. These methods extend to any type of correlated outcomes for which variances can be specified, for example, the inverse Gaussian case [5] . However, the case of multiple event times needs to account for censoring, and so correlation in this case is more readily addressed using conditional modeling [16] .

Example analyses are provided for each of these five outcome cases comparing results for the directly specified independent (IND), exchangeable (EXCH), autoregressive of order 1 (AR1), and unstructured (UN) correlation structures to results for random coefficients. LCV ratio tests are used to identify substantial differences in LCV scores for models. Each set of example analyses considers a correlated outcome observed over a continuous time variable for two groups of individuals. Analyses are first conducted using the continuous time variable for all four directly specified correlation structures and for random coefficients. Only the directly specified correlation structure generating the best LCV score for this first analysis is considered in subsequent analyses. Results for this directly specified correlation structure are compared to those for random coefficients in terms of both the continuous time variable and an indicator for being in one of the two groups. These latter analyses allow for additive effects of time and group as well as for moderation of time effects by group considering geometric combinations generalizing standard interactions.

Random coefficients models considered so far assume independent random coefficients combined with independent errors. Further work is needed to investigate more complex situations with more general covariance structures for both random coefficients and errors.

4.1. Summary of Example Analysis Results

In analyses of dental measurements for girls and boys at ages 8, 10, 12, and 14 years old, the EXCH correlation structure is the best directly specified correlation structure. Random coefficients outperform this directly specified correlation structure in all three analyses. The improvement is not substantial for the model in age only, but is substantial for both the additive and moderation models. Child gender substantially moderates the effect of age on dental measurements.

In analyses of epilepsy seizure rates per week for individuals taking a placebo and the drug progabide at baseline visit 0 and subsequent visits 1 - 4, the EXCH correlation structure is the best directly specified correlation structure. It substantially outperforms random coefficients in all three analyses. Epilepsy seizure rates depend on visit, but treatment group does not have either an additive or moderation effect on epilepsy seizure rates.

In analyses of dichotomous respiratory status coded as 0 for poor and 1 for good for individuals taking a placebo and on active treatment at baseline visit 0 and subsequent visits 1 - 4, the EXCH correlation structure is the best directly specified correlation structure. It generates comparable results to random coefficients in all three analyses although the additive analysis results are not straightforward. Being on active treatment substantially moderates the effect of visit on dichotomous respiratory status.

In analyses of blood lead levels for children taking a placebo and the chelating agent succimer at baseline week 0 and subsequent weeks 1, 4, and 6, the UN correlation structure is the best directly specified correlation structure. Random coefficients substantially outperform this directly specified correlation structure in all three analyses. Being on succimer substantially moderates the effect of week on blood lead levels.

In analyses of trichotomous respiratory status coded as 0 for poor, 1 for good, and 2 for excellent for individuals taking a placebo and on active treatment at baseline visit 0 and subsequent visits 1 - 4, the UN correlation structure is the best directly specified correlation structure. Random coefficients substantially outperform this directly specified correlation structure in all three analyses. Being on active treatment substantially moderates the effect of visit on trichotomous respiratory status.

Clock times are reported for all these example analyses. Not surprisingly, analyses considering random coefficients can require more time and sometime much more time than analyses considering directly specified correlation since they involve heuristic search through three sets of fractional polynomials rather than just two.

4.2. Conclusions

Random coefficients models for correlation are important to consider since they can provide substantial improvements for modeling of correlated outcomes. However, directly specified correlation structures are also important to consider since they can provide competitive or even substantially better results than random coefficients and usually require less time and sometimes much less time.

The results of the example analyses indicate that adaptive random coefficients modeling can generate effective models and so can provide substantive benefits for statistical practice in the context of correlated outcomes. Researchers often have to analyze correlated outcomes. This is typically addressed by arbitrarily choosing one of exchangeable correlations as used in standard repeated measures, autoregressive correlations as used in time series modeling, or unstructured correlations as used in multivariate analysis of variance and multivariate regression. ELMM combined with LCV can be used to identify which of these directly specified correlation structures is most appropriate for the data under analysis. However, it is highly important to consider random coefficients as well. Such evaluations can be conducted using specific models for the means based on theoretical considerations as well as more general adaptive models.

Acknowledgements

Prior development of the SAS macro used to conduct reported example analyses was supported in part by grants R01 AI57043 from the National Institute of Allergy and Infectious Diseases and R03 MH086132 from the National Institute of Mental Health. Work reported on in this article was not supported by external funding.

Conflicts of Interest

The author declares no conflicts of interest regarding the publication of this paper.

References

[1]	Fitzmaurice, G.M., Laird, N.M. and Ware, J.H. (2004) Applied Longitudinal Analysis. John Wiley & Sons, Hoboken.
[2]	Laird, N.M. and Ware, J.H. (1982) Random-Effects Models for Longitudinal Data. Biometrics, 38, 963-974.[CrossRef] [PubMed]
[3]	Brown, H. and Prescott, R. (1999) Applied Mixed Models in Medicine. John Wiley & Sons Ltd., Chichester.
[4]	Liang, K.-Y. and Zeger, S.L. (1986) Longitudinal Data Analysis Using Generalized Linear Models. Biometrika, 73, 13-22.[CrossRef]
[5]	McCullagh, P. and Nelder, J.A. (1999) Generalized Linear Models. 2nd Edition, Chapman & Hall/CRC, Boca Raton.
[6]	Wolfinger, R. and O’connell, M. (1993) Generalized Linear Mixed Models a Pseudo-Likelihood Approach. Journal of Statistical Computation and Simulation, 48, 233-243.[CrossRef]
[7]	Knafl, G.J. (2023) Modeling Correlated Outcomes Using Extensions of Generalized Estimating Equations and Linear Mixed Modeling. Springer, Berlin.[CrossRef]
[8]	Knafl, G.J. and Ding, K. (2016) Adaptive Regression for Modeling Nonlinear Relationships. Springer, Berlin.[CrossRef]
[9]	Knafl, G.J. (2018) Adaptive Fractional Polynomial Modeling. Open Journal of Statistics, 8, 159-186.[CrossRef]
[10]	Knafl, G.J. and Meghani, S.H. (2022) Regression Modeling of Individual-Patient Correlated Discrete Outcomes with Applications to Cancer Pain Ratings. Open Journal of Statistics, 12, 456-485.[CrossRef] [PubMed]
[11]	Schott, J.R. (2005) Matrix Analysis for Statistics. 2nd Edition, John Wiley & Sons, Hoboken.
[12]	Wolfinger, R., Tobias, R. and Sall, J. (1994) Computing Gaussian Likelihoods and Their Derivatives for General Linear Mixed Models. SIAM Journal on Scientific Computing, 6, 1294-1310.[CrossRef]
[13]	Burman, P. (1989) A Comparative Study of Ordinary Cross-Validation, ν-Fold Cross-Validation and the Repeated Learning-Testing Methods. Biometrika, 76, 503-514.[CrossRef]
[14]	Baron, R.M. and Kenny, D.A. (1986) The Moderator-Mediator Variable Distinction in Social Psychology Research: Conceptual, Strategic, and Statistical Considerations. Journal of Personality & Social Psychology, 51, 1173-1182.[CrossRef]
[15]	Knafl, G.J. (2023) An Adaptive Approach for Hazard Regression Modeling. Open Journal of Statistics, 13, 300-315.[CrossRef]
[16]	Knafl, G.J. (2023) Adaptive Conditional Hazard Regression Modeling of Multiple Event Times. Open Journal of Statistics, 13, 492-513.[CrossRef]
[17]	Knafl, G.J. (2022) Adaptive Regression for Nonlinear Interrupted Time Series Analyses with Application to Birth Defects in Children of Vietnam War Veterans. Open Journal of Statistics, 12, 789-809.[CrossRef]
[18]	Potthoff, R. and Roy, S.N. (1964) A Generalized Multivariate Analysis of Variance Model Useful Especially for Growth Curve Problems. Biometrika, 51, 313-326.[CrossRef]
[19]	Thall, P.F. and Vail, S.C. (1990) Some Covariance Models for Longitudinal Count Data with Overdispersion. Biometrics, 46, 657-671.[CrossRef] [PubMed]
[20]	Knafl, G.J. and Meghani, S.H. (2021) Modeling Individual Patient Count/Rate Data over Time with Applications to Cancer Pain Flares and Cancer Pain Medication Usage. Open Journal of Statistics, 11, 633-654.[CrossRef] [PubMed]
[21]	Koch, G.G., Carr, C.F., Amara, I.A., Stokes, M.E. and Uryniak, T.J. (1989) Categorical Data Analysis. In: Berry, D.A., Ed., Statistical Methodology in the Pharmaceutical Sciences, Marcel Dekker, New York, 391-475.
[22]	Treatment of Lead-Exposed Children (TLC) Trial Group (2000) Safety and Efficacy of Succimer in Toddlers with Blood Lead Levels of 20-44 µg/dL. Pediatric Research, 48, 593-599.[CrossRef] [PubMed]

Journals Menu

Follow SCIRP

	customer@scirp.org
	+86 18163351462(WhatsApp)
	1655362766

	Paper Publishing WeChat

Journals Menu

Home

About SCIRP

Service

Policies