Empirical Likelihood Based Longitudinal Data Analysis

Tharshanna Nadarajah; Asokan Mulayath Variyath; J Concepción Loredo-Osti

doi:10.4236/ojs.2020.104037

Open Journal of Statistics > Vol.10 No.4, August 2020

Empirical Likelihood Based Longitudinal Data Analysis

Tharshanna Nadarajah¹, Asokan Mulayath Variyath², J Concepción Loredo-Osti²
¹Department of Mathematics and Statistics, St. Francis Xavier University, Antigonish, Canada.
²Department of Mathematics and Statistics, Memorial University of Newfoundland, St. John’s, Canada.
DOI: 10.4236/ojs.2020.104037 PDF HTML XML 788 Downloads 2,154 Views Citations

Abstract

In longitudinal data analysis, our primary interest is in the estimation of regression parameters for the marginal expectations of the longitudinal responses, and the longitudinal correlation parameters are of secondary interest. The joint likelihood function for longitudinal data is challenging, particularly due to correlated responses. Marginal models, such as generalized estimating equations (GEEs), have received much attention based on the assumption of the first two moments of the data and a working correlation structure. The confidence regions and hypothesis tests are constructed based on the asymptotic normality. This approach is sensitive to the misspecification of the variance function and the working correlation structure which may yield inefficient and inconsistent estimates leading to wrong conclusions. To overcome this problem, we propose an empirical likelihood (EL) procedure based on a set of estimating equations for the parameter of interest and discuss its characteristics and asymptotic properties. We also provide an algorithm based on EL principles for the estimation of the regression parameters and the construction of its confidence region. We have applied the proposed method in two case examples.

Keywords

Longitudinal Data, Generalized Estimating Equations, Empirical Likelihood, Adjusted Empirical Likelihood, Extended Empirical Likelihood

Share and Cite:

Nadarajah, T. , Variyath, A. and Loredo-Osti, J. (2020) Empirical Likelihood Based Longitudinal Data Analysis. Open Journal of Statistics, 10, 611-639. doi: 10.4236/ojs.2020.104037.

1. Introduction

Longitudinal studies are common in areas such as epidemiology, clinical trials, economics, agriculture, and survey sampling. In longitudinal studies, we are interested in the changes in the variables over time as a function of the covariates, generally under the assumption that observations from different individuals are independent. For example, longitudinal studies are used to characterize growth and ageing, to assess the effect of risk factors on human health, and to evaluate the effectiveness of treatments. To obtain an unbiased, efficient, and reliable estimate, we must properly model the correlation between the repeated responses for each individual. However, the modelling of correlation, especially when the responses are discrete, is a challenging task even if the responses are collected over equispaced time points.

The approaches used for the analysis of longitudinal data can be classified as mixed effects models, transitional models, and marginal regression models. A potential disadvantage of mixed effects models is that they rely on parametric assumptions, which may lead to biased parameter estimates when a model is misspecified. Moreover, the estimation of the parameters is challenging when the random effects have a high dimension; it typically involves integrals that do not have an explicit form. Transitional models are more difficult to apply when there are missing data and the repeated measurements are not equally spaced in time. In addition, the interpretation of the regression parameters varies with the order of the serial correlation, and the regression parameter estimates are sensitive to the assumption of time dependence. Because of the aforementioned difficulties in modelling and performing inference, we focus on marginal models in this paper.

We start with brief review on existing methods for longitudinal data under the framework of generalized linear models (GLMs). The longitudinal observations consist of an outcome random variable $y_{i t}$ and a p-dimensional vector of covariates $x_{i t}$ , observed for subjects $i = 1, \dots, k$ at a time point t, $t = 1, \dots, m_{i}$ . For the ith subject, let $y_{i} = {(y_{i 1}, \dots, y_{i m_{i}})}^{T}$ be the response vector, and let $X_{i} = {(x_{i 1}, x_{i 2}, \dots, x_{i t}, \dots, x_{i m_{i}})}^{T}$ be the $m_{i} \times p$ matrix of covariates. Marginal models for longitudinal data can be extended to the GLM framework. The marginal density of $y_{i t}$ is assumed to follow an exponential family [1] of the form

$f (y_{i t}) = \exp [(y_{i t} θ_{i t} - a (θ_{i t})) ϕ + b (y_{i t}, ϕ)],$ (1)

where $θ_{i t} = h (η_{i t})$ , h is a known injective function with $η_{i t} = x_{i t} β$ , $β$ is a $p \times 1$ vector of regression effects of $x_{i t}$ on $y_{i t}$ , and $a (*)$ and $b (*)$ are functions that are assumed to be known. The mean and variance of $y_{i t}$ can be written

$E (y_{i t} | x_{i t}) = a^{'} (θ_{i t}) = μ_{i t} and Var (y_{i t}) = a^{″} (θ_{i t}) = v (μ_{i t}) ϕ,$

where $ϕ$ is the unknown over-dispersion parameter and $v (*)$ is a known variance function. For simplicity, we set the nuisance scale parameter $ϕ$ to 1 for the rest of this paper.

The generalized estimating equation (GEE) approach [2] is a semiparametric method in which the estimating equations are derived without a full specification of the joint distribution of the observed data. This approach allows the user to specify any structure for the correlation matrix of the outcomes $y_{i}$ for estimating the regression parameters. In this approach, [2] introduced a “working” correlation structure based on the GEE approach to obtain consistent and efficient estimators for the regression parameter $β$ . The estimates of parameters are obtained by solving

$g (β, \hat{α} (β)) = \sum_{i = 1}^{k} X_{i}^{T} A_{i}^{1 / 2} R_{i}^{- 1} (\hat{α}) A_{i}^{- 1 / 2} (y_{i} - μ_{i}) = 0,$ (2)

where $A_{i}$ is an $m_{i} \times m_{i}$ diagonal matrix with $Var (μ_{i t})$ as the tth diagonal element and $R_{i} (\hat{α})$ is the $m_{i} \times m_{i}$ working correlation matrix of the $m_{i}$ repeated measurements. For $j = 1, \dots, m_{i}$ and $j^{'} = 1, \dots, m_{i}$ , the ${(j, j^{'})}^{th}$ element of $R_{i}$ is the known, hypothesized, or estimated correlation. The working correlation may depend on an unknown $s \times 1$ correlation parameter vector $α$ . The observation times and correlation matrix may differ from subject to subject, but the correlation matrix $R_{i} (α)$ for the ith subject is fully specified by $α$ . Some common working correlation structures are independent, autoregressive of order 1 (AR(1)), equally correlated (EQC), moving average of order 1 (MA(1)), or unstructured.

It has been demonstrated that in some situations the use of an arbitrary working correlation structure may lead to no solution for $\hat{α}$ , which may break down the entire GEE methodology (see [3]). In an another study [4] showed that the GEE approach may yield an estimator of $β$ , that, although consistent, is less efficient than that of the independence estimating equation approach under an arbitrary working correlation structure. To overcome this difficulty, [5] proposed using a stationary lag correlation structure instead of the working correlation matrix.

The estimate for $β$ is obtained by solving the following estimating equations:

$g (β, ρ) = \sum_{i = 1}^{k} X_{i}^{T} A_{i} Σ_{i}^{- 1} (\hat{ρ}) (y_{i} - μ_{i}) = 0,$ (3)

where $Σ_{i} (\hat{ρ}) = A_{i}^{1 / 2} C_{i}^{*} (ρ) A_{i}^{1 / 2}$ , with $C_{i}^{*} (ρ)$ the stationary lag correlation structure for the AR(1), MA(1), or EQC models. The stationary lag correlations can be estimated via the method of moments introduced by [6] and showed that the stationary lag correlation approach produces regression estimates that are consistent and more efficient than those obtained from the independence-assumption-based estimating equation approach (see [4]). Using simulation studies, [7] showed that there is a loss in the efficiency of the GEE estimators when the correlation structures are misspecified. Note, however, that the correlation structure is unknown in practice, and it is better to use a stationary lag-correlation structure which can accommodates the AR(1), EQC, and MA(1) structures. We, therefore, recommend defining a lag correlation structure for the longitudinal responses. When the number of estimations (r) is more than the number of parameters (p) (i.e. $r > p$ ), we have extra information about the parameter for improved efficiency, but it may not be possible to solve the estimating equations directly. To overcome this problem, [8] proposed an adaptive quadratic inference function based on moment assumptions and the estimated variance; this does not involve direct estimation of the correlation parameter. Moreover, if the covariates are time-dependent the assumption $l i m_{k \to \infty} E [g (β_{0}, \hat{α} (β_{0}))] = 0$ might not hold for an arbitrary working correlation structure, and so the GEE estimate of $β$ is not necessarily consistent; see [9] [10] [11] [12] and [13]. The GEE estimator of $β$ with the independent working correlation is always consistent, so [10] recommended using this correlation as a safe choice. However, ignoring the correlation of observations may lead to an inefficient estimate of the regression coefficients and an underestimate of the standard errors.

The GEE approach requires only the assumption of the existence of the first two marginal moments and a correlation structure. GEE estimators are consistent and asymptotically normal as long as the mean, variance, and correlation structure are correctly specified. Marginal models have satisfactory performance when the assumptions are satisfied. Misspecification can cause estimates based on marginal models to be inefficient and inconsistent, and inference in this situation can be completely inappropriate. Confidence regions and hypothesis tests are based on asymptotic normality, which may not hold since the finite-sample distribution may not be symmetric. These problems motivate us to investigate the applicability of empirical likelihood (EL), a nonparametric likelihood method, based on a set of GEEs for the parameter of interest.

The EL introduced by [14] has properties similar to the parametric likelihood. The EL combines the reliability of nonparametric methods with the flexibility and effectiveness of the likelihood approach. The EL has many nice properties parallel to those of parametric likelihood, including the ability to carry out hypothesis tests and construct confidence intervals without estimating the variance. The shape of EL confidence regions automatically reflects the emphasis of the observed data set. The EL method also offers advantages in parameter estimation and the formulation of goodness-of-fit tests. The EL has been successfully applied in areas such as linear models, GLMs, survey sampling, variable selection, survival analysis, and time series. We investigate the use of a nonparametric EL in subject-wise longitudinal data analysis, simultaneously estimating within-subject correlations using the method of moments by [6]. We used the adjusted EL to avoid computational issues. We explore the asymptotic properties of the proposed method and assess the method’s performance based on a large number of simulations. Our approach provides consistent estimators and has comparable performance to marginal models when the model assumptions are correct. It is superior to marginal models when the variance function and correlation structure are even misspecified.

The remaining part of the paper is organized as follows. In Section 2, we develop the subject-wise EL via a set of GEEs of the parameter of interest and discuss its characteristics. We then introduce an adjusted EL (AEL) inference to longitudinal data. We discuss its characteristics and asymptotic properties in Section 3. In Section 4, we developed an algorithm based on EL principles for the estimation of the regression parameters and the construction of its confidence region. In Section 5, the performance of the proposed method is assessed based on Monte Carlo simulations. The implementation of the proposed method in two real case examples is discussed in Section 6 and the conclusions are given in Section 7.

2. Empirical-Likelihood-Based Longitudinal Modelling

EL is a nonparametric-likelihood-based approach, introduced by [14], which is an alternative to parametric likelihood and bootstrap methods. This method enables us to fully employ the information available from the data for making asymptotically efficient inference about the population parameters. In this section, we introduce the EL-based longitudinal modelling.

In a seminal paper, [15] introduced the EL for linear models. EL confidence regions for regression coefficients in linear models were studied by [16]. The EL method can also be used to estimate the parameters defined by a set of estimating equations [17]. A comprehensive overview of the EL and its properties can be obtained from [18]. EL methods have attracted increasing attention over the last two decades, and the literature is extensive.

In longitudinal data analysis framework, [19] applied the EL approach using a subject-wise working independence model. This method ignores the within-subject correlation structure. Also note that [20] proposed a subject-wise EL by entering the longitudinal data and obtained asymptotic normality of the maximum EL estimator (MELE) of the regression coefficients. They did not consider the within-subject correlation structure. It is well known that the working-independence assumption may lead to a loss of efficiency in estimation when a within-subject correlation is present. To estimate the within-subject covariance matrices, [21] used the nonparametric sample covariance matrix obtained from the residuals of a GEE using the working-independence assumption. In this work, we show how to incorporate the within-subject correlation structure of the repeated measurements into the EL.

Following [15] and [17], we can extend the EL inference to longitudinal data based on a set of estimating functions $g (β, ρ)$ given in (3). We incorporate the within-subject correlation structure of the repeated measurements into the EL using the well-known method of moments estimators by [6] for a given value of $β$ . The profile empirical log-likelihood function of $β$ is defined by

$l (β) = \sup [\sum_{i = 1}^{k} \log (p_{i}) : p_{i} \geq 0, i = 1, 2, \dots, k; \sum_{i = 1}^{k} p_{i} = 1, \sum_{i = 1}^{k} p_{i} g_{i} (β, ρ) = 0] .$

The EL is maximized when

${\hat{p}}_{i} = \frac{1}{k {1 + {\hat{λ}}^{T} g_{i} (β, ρ)}}, i = 1,2, \dots, k,$ (4)

where the Lagrange multiplier $\hat{λ} = \hat{λ} (β)$ is the solution of

$\sum_{i = 1}^{k} \frac{g_{i} (β, ρ)}{1 + λ^{T} g_{i} (β, ρ)} = 0.$ (5)

This result leads to the profile empirical log-likelihood function

$l (β) = - k l o g (k) - \sum_{i = 1}^{k} l o g (1 + {\hat{λ}}^{T} (β) g_{i} (β, ρ))$

and the profile empirical log-likelihood ratio function

$W_{l} (β) = - \sum_{i = 1}^{k} l o g (k {\hat{p}}_{i}) = \sum_{i = 1}^{k} l o g [1 + {\hat{λ}}^{T} (β) g_{i} (β, ρ)] .$ (6)

Under some regularity conditions, we have $2 W_{l} (β_{0}) \overset{D}{\to} χ_{p}^{2}$ as $k \to \infty$ if

$E [g (β_{0}, \hat{ρ} (β_{0})) g^{T} (β_{0}, \hat{ρ} (β_{0}))]$

is full rank where $β_{0}$ is the true parameter value. This conclusion is similar to that for the parametric likelihood ratio function. The vector $β$ can be estimated by minimizing

$W_{l} (β) = \sum_{i = 1}^{k} \log (1 + {\hat{λ}}^{T} (β) g (β, ρ))$ (7)

with respect to $β$ . Note that the profile log-likelihood ratio function can be minimized with respect to $β$ when $ρ$ is known. In practice, $ρ$ is unknown, but can be consistently estimated using the method of moments by [6].

The computation of the profile EL function is a key step in EL applications, and it involves constrained maximization. In some situations, the algorithm may fail because of poor initial values of the parameters. Moreover, the poor accuracy of EL confidence regions has been reported by several authors, including [17] [18] [22] [23] [24] and [25]. In the next subsection we will discuss how to address these problems in the context of longitudinal data.

Adjusted Empirical Likelihood

The computation of the profile EL ratio function $W_{l} (β)$ given in (7) is a key step in EL applications. The solution for $λ$ must satisfy ${1 + {\hat{λ}}^{T} (β) g_{i} (β, \hat{ρ} (β))} > 0$ for all $i = 1, \dots, k$ . A necessary and sufficient condition for its existence is that the vector $0$ is an interior point of the convex hull of ${g_{i} (β, \hat{ρ} (β)), i = 1, \dots, k}$ . Under some moment conditions on $g (β, \hat{ρ} (β))$ [18], the convex hull contains $0$ as an interior point with probability 1 as $k \to \infty$ . However, when $β$ is not close to the true parameter value $β_{0}$ or when k is small, it is possible that the solution of (5) does not exist. To avoid this problem, [25] introduced the adjusted EL (AEL). The AEL is obtained by adding a pseudo-observation to the data set. It overcomes the difficulties arising when the estimating equations for $λ$ have no solution.

Let $g_{i} (β) = g_{i} (β, \hat{ρ} (β))$ and ${\bar{g}}_{k} (β) = \frac{1}{k} \sum_{i = 1}^{k} g_{i} (β)$ for any given $β$ . For some positive constant $b_{k}$ , by the addition of an artificial observation

$g_{k + 1} (β) = - \frac{b_{k}}{k} \sum_{i = 1}^{k} g_{i} (β) = - b_{k} {\bar{g}}_{k} ( β )$

with $b_{k} = l o g (k) / 2$ . The adjusted profile empirical log-likelihood ratio function is

$\begin{array}{l} W_{l}^{*} (β) \\ = \inf [- \sum_{i = 1}^{k + 1} \log [(k + 1) p_{i}] : p_{i} \geq 0, i = 1, 2, \dots, k + 1; \sum_{i = 1}^{k + 1} p_{i} = 1, \sum_{i = 1}^{k + 1} p_{i} g_{i} (β) = 0] \\ = \sum_{i = 1}^{k + 1} l o g [1 + {\hat{λ}}^{T} (β) g_{i} (β)] \end{array}$

with $\hat{λ} = \hat{λ} (β)$ being the solution of $\sum_{i = 1}^{k + 1} \frac{g_{i} (β)}{1 + λ^{T} g_{i} (β)} = 0$ . Note that $0$ always lies inside the convex hull of ${g_{i} (β, \hat{ρ} (β)), i = 1, \dots, k + 1}$ . The adjusted profile empirical log-likelihood ratio function is well defined after adding a pseudo value $g_{k + 1} (β)$ . For a wide range of $b_{k}$ , following [25], we can show that the adjusted profile EL ratio function $W_{l}^{*} (β)$ has the same asymptotic properties as the unadjusted profile EL ratio function $W_{l} (β)$ . We define the adjusted profile EL estimator of $β$ to be the minimizer of

$W_{l}^{*} (β) = \sum_{i = 1}^{k + 1} [l o g (1 + {\hat{λ}}^{T} (β) g_{i} (β, \hat{ρ} (β)))]$ (8)

with respect to $β$ .

The adjustment is particularly useful because, even for some undesirable values of $β$ , the algorithm guarantees a solution. The confidence regions constructed via the AEL are found to have better coverage probabilities than those for the regular EL and the algorithm provides a promising solution for $λ$ , particularly when the sample size is small. The improved coverage probability is achieved without resorting to more complex procedures such as Bartlett correction or bootstrap calibration.

In the next section, following [17], we state and prove the results on the distributional properties of the adjusted profile EL estimates of $\hat{β}$ . We construct these theorems based on the GEE with lag correlation given in (3), since the GEE estimate of $β$ under an arbitrary working-correlation structure is not necessarily consistent.

3. Main Results

In this section, we present the first-order asymptotic properties of $\hat{β}$ and the adjusted profile empirical log-likelihood ratio statistics. We first introduce some notation and regularity conditions that are used in the theorems and lemma.

Regularity Conditions:

A1. $E {g (β_{0}, \hat{ρ} (β_{0}))} = 0$ , where $β_{0}$ is the true value of $β$ , $g (β, \hat{ρ} (β)) = \sum_{i = 1}^{k} D_{i}^{T} Σ_{i}^{- 1} (\hat{ρ}) (y_{i} - μ_{i})$ be the estimating function for $β \in R^{p}$ (defined in (3)), $D_{i} = \partial {{a^{'}}_{i} (θ)} / \partial β$ , $Σ_{i} (\hat{ρ}) = A_{i}^{1 / 2} C_{i}^{*} (\hat{ρ}) A_{i}^{1 / 2}$ , and $A_{i} = d i a g {{a^{″}}_{i} (θ)}$ for $i = 1,2, \dots, k$ . Let ${\bar{g}}_{k} (β, \hat{ρ} (β)) = \frac{1}{k} \sum_{i = 1}^{k} g_{i} (β, \hat{ρ} (β))$ and $g_{k + 1} (β, \hat{ρ} (β)) = - b_{k} {\bar{g}}_{k} (β, \hat{ρ} (β))$ , where $b_{k}$ is a positive constant.

A2. ${a^{'} (θ)}$ is three times continuously differentiable and ${a^{″} (θ)} > 0$ in $Θ^{\circ}$ , where $Θ$ be the natural parameter space of the exponential family distributions presented in (1) and $Θ^{\circ}$ the interior of $Θ$ . Also, $h (η)$ is three times continuously differentiable and $h^{'} (η) > 0$ .

A3. $E_{β_{0}} {\frac{\partial g_{k} (β, ρ)}{\partial β}}$ and $V_{k} (β_{0}, \hat{ρ} (β_{0})) = E_{β_{0}} {g_{k} (β, \hat{ρ} (β)) g_{k}^{T} (β, \hat{ρ} (β))}$ are positive definite.

A4. The rank of $E {\frac{\partial g_{k} (β, ρ)}{\partial β}}$ is p in a neighbourhood of $β_{0}$ .

A5. There exist functions $G (y, X)$ such that in a neighbourhood of $β_{0}$ .

$| \frac{\partial g_{k} (β, ρ)}{\partial β} | < G (y, X), {‖ g_{k} (y, X, β, \hat{ρ} (β)) ‖}^{3} < G (y, X)$

with $E [G (y, X)] < \infty$ .

Theorem 3.1. Under regularity conditions A1-A5, suppose $(y_{i}, X_{i}), i = 1,2, \dots, k$ is a set of independent and identically distributed random vectors. Let

$2 W_{l}^{*} (β) = 2 \sum_{i = 1}^{k + 1} l o g [1 + {\hat{λ}}^{T} (β) g_{i} (β, \hat{ρ} (β))]$ (9)

be the adjusted profile empirical log-likelihood ratio function. Then, as $k \to \infty$ , $\hat{ρ} (β)$ is a consistent estimator in the neighbourhood of $β$ ; the correlation matrix of $y_{i}$ is $C_{i}^{*} (ρ)$ , defined in (3) and $W_{l}^{*} (β)$ attains its minimum value at some point $\hat{β}$ in the interior of $‖ \hat{β} - β_{0} ‖ < k^{- 1 / 3}$ in probability.

This result corresponds to Lemma 1 in [17] which is about the consistency of maximum empirical likelihood estimates (MELE) for independent and identically distributed data. By following [17], under the regularity conditions A1-A5, we can obtain a subject-wise MELE, as $k \to \infty$ , with probability tending to 1 the equation $W_{l}^{*} (β)$ has a solution within the open ball $‖ \hat{β} - β_{0} ‖ < k^{- 1 / 3}$ . It is noted that the proof is similar to the proof of Lemma 1 in [19] and the details are omitted here.

Theorem 3.2. In addition to the regularity conditions A1-A5, suppose that $\frac{\partial^{2} g (β, ρ)}{\partial β \partial β^{T}}$ is bounded by some integrable function $G (y, X)$ in the neighbourhood. Then, there exists a sequence of adjusted profile EL estimates $\hat{β}$ of $β$ such that

$\sqrt{k} (\hat{β} - β_{0}) \overset{D}{\to} N (0, Δ),$

where

$Δ = {[E_{β_{0}} {\frac{\partial g (β, \hat{ρ} (β))}{\partial β}}^{T} {E_{β_{0}} {g (β, \hat{ρ} (β)) g^{T} (β, \hat{ρ} (β))}^{- 1}} E_{β_{0}} {\frac{\partial g (β, \hat{ρ} (β))}{\partial β}}]}^{- 1}$

It is noted that the proof of Theorem 3.2 is similar to the proof of Theorem 1 in [17]. The details are thus omitted here.

Theorem 3.3. Under regularity conditions A1-A5, the adjusted profile empirical log-likelihood ratio function $2 W_{l}^{*} (β_{0})$ , where $β_{0}$ is the true value of $β$ , is asymptotically chi-squared distributed with degrees of freedom p.

The proof of Theorem 3.3 can be achieved by using similar arguments as those used in the proof of Theorem 2 in [17]. The details are thus omitted here.

4. Algorithm

To implement our method, we need an efficient algorithm. We minimize the profile EL ratio function $W_{l} (β)$ with respect to $β$ using a Newton-Raphson algorithm. At each Newton-Raphson iteration, we compute the Lagrange multiplier for updated values of $β$ and $\hat{ρ} (β)$ . We used the modified Newton-Raphson algorithm proposed by [26] for computing the Lagrange multiplier for a given value of the parameter. We implemented this method, which is numerically stable. The algorithm given in Sections 4.1, 4.2, and 4.3 can easily be extended to the AEL by the addition of a pseudo-value $g_{k + 1} (β) = - b_{k} {\bar{g}}_{k} (β, \hat{ρ} (β))$ , where $b_{k}$ is a positive constant.

4.1. Computation of Lagrange Multiplier

The Lagrange multiplier $λ$ is estimated by solving the equation

$\sum_{i = 1}^{k} \frac{g_{i} (β, \hat{ρ} (β))}{1 + λ^{T} g_{i} (β, \hat{ρ} (β))} = 0$

for a given set of vectors $g_{i} (β, \hat{ρ} (β))$ , $i = 1, 2, \dots, k$ . Note that the above equation is the derivative of R with respect to $λ$ for a given $β$ , where

$R = \sum_{i = 1}^{k} \log {1 + λ^{T} g_{i} (β, \hat{ρ} (β))} .$ (10)

In the EL problem, the solution must satisfy

$1 + λ^{T} g_{i} (β, \hat{ρ} (β)) > 0, i = 1, 2, \dots, k .$

The modified Newton-Raphson algorithm for estimating $λ$ for a given value of $β$ and $\hat{ρ} (β)$ is as follows:

1. Set $λ^{c} = 0$ , $c = 0$ , $γ^{c} = 1$ , $ϵ = 1 e^{- 08}$ , $ρ = ρ^{0}$ , and $β = β^{0}$ .

2. Let $R^{λ}$ and $R^{λ λ}$ be the first and second partial derivatives of R (given in (10)) with respect to $λ$ :

$R^{λ} = \sum_{i = 1}^{k} [\frac{g_{i} (β, \hat{ρ} (β))}{{1 + λ^{T} g_{i} (β, \hat{ρ} (β))}}],$

$R^{λ λ} = - \sum_{i = 1}^{k} [\frac{g_{i} (β, \hat{ρ} (β)) g_{i}^{T} (β, \hat{ρ} (β))}{{1 + λ^{T} g_{i} (β, \hat{ρ} (β))}^{2}}] .$

Compute $R^{λ}$ and $R^{λ λ}$ for $λ = λ^{c}$ and let $Δ (λ^{c}) = - {[R^{λ λ}]}^{- 1} R^{λ}$ .

If $‖ Δ (λ^{c}) ‖ < ϵ$ stop the algorithm and report $λ^{c}$ ; otherwise continue.

3. Calculate $δ^{c} = γ^{c} Δ (λ^{c})$ . If $1 + (λ^{c} - δ^{c}) g_{i} (β, \hat{ρ} (β)) \leq 0$ for some i, set $γ^{c} = \frac{γ^{c}}{2}$ and go to Step 2.

4. Set $λ^{c + 1} = λ^{c} - δ^{c}, c = c + 1$ , and $γ^{c + 1} = {(c + 1)}^{- \frac{1}{2}}$ and go to Step 2. Step 2 will guarantee that $p_{i} > 0$ and the optimization is carried out in the right direction.

4.2. Algorithm for Optimizing Profile Empirical Likelihood Ratio Function

Let $\hat{λ} (β)$ be the estimated value of $λ$ for a given $β$ . We minimize the profile EL ratio function defined in (7) over $β$ . The Newton-Raphson algorithm is as follows:

1. Set $β = β^{0}$ , $h = 0$ , and $ϵ = 1 e^{- 08}$ .

2. Let $\hat{λ} = λ (β)$ and $\hat{ρ} (β)$ be the estimated values of $λ$ and $ρ$ .

3. Compute the new estimate of $β$ via

$β^{(h + 1)} = β^{(h)} - {W_{l}^{β β} (β^{h})}^{- 1} {W_{l}^{β} (β^{h})}$ (11)

where $W_{l} (β)$ is the profile empirical log-likelihood ratio function defined in (7), with

$W_{l}^{β} = \frac{\partial W_{l} (β)}{\partial β}$ , $W_{l}^{β β} = \frac{\partial^{2} W_{l} (β)}{\partial β \partial β^{T}}$ .

Note that to compute $W_{l}^{β}$ and $W_{l}^{β β}$ , we need to estimate the Lagrange multiplier $\hat{λ} (β)$ as in Section 4.1. In practice, $ρ$ is unknown, and the correlations can be consistently estimated by [24] using the method of moments.

4. If $m i n | β^{(h + 1)} - β^{(h)} | < ϵ$ stop the algorithm and report $β^{(h + 1)}$ ; otherwise set $h = h + 1$ and go to Step 3.

The simplified expressions for $W_{l}^{β}$ and $W_{l}^{β β}$ are as follows. Let $R^{β}$ , $R^{β β}$ , and $R^{β λ}$ be the first and second partial derivatives of (10) with respect to $β$ and $λ$

$R^{β} = \sum_{i = 1}^{k} [\frac{{g^{'}}_{i} (β, \hat{ρ} (β)) λ}{{1 + λ^{T} g_{i} (β, \hat{ρ} (β))}}],$

$R^{β β} = \sum_{i = 1}^{k} {[\frac{{g^{″}}_{i} (β, \hat{ρ} (β)) λ^{T}}{{1 + λ^{T} g_{i} (β)}}] - [\frac{{g^{'}}_{i} (β, \hat{ρ} (β)) λ λ^{T} {[{g^{'}}_{i} (β, \hat{ρ} (β))]}^{T}}{{1 + λ^{T} g_{i} (β, \hat{ρ} (β))}^{2}}]},$

and

$R^{β λ} = \sum_{i = 1}^{k} [\frac{{1 + λ^{T} g_{i} (β)} {g^{'}}_{i} (β, \hat{ρ} (β)) - {g^{'}}_{i} (β, \hat{ρ} (β)) λ {[g_{i} (β)]}^{T}}{{1 + λ^{T} g_{i} (β)}^{2}}] .$

The first derivative of $W_{l} (β)$ with respect to $β$ is

$\begin{matrix} W_{l}^{β} = \sum_{i = 1}^{k} [\frac{{[\frac{\partial λ (β)}{\partial β}]}^{T} g_{i} (β, \hat{ρ} (β)) + {g^{'}}_{i} (β, \hat{ρ} (β)) λ (β)}{{1 + λ^{T} (β) g_{i} (β, \hat{ρ} (β))}}] \\ = {[\frac{\partial λ (β)}{\partial β}]}^{T} R^{λ} + R^{β} . \end{matrix}$

Note that for $λ = \hat{λ} (β)$ , $R^{λ} = 0$ . Therefore,

$W_{l}^{β} = R^{β} .$ (12)

Similarly, the second derivative of $W_{l} (β)$ with respect to $β$ is

$\begin{array}{l} W_{l}^{β β} = \sum_{i = 1}^{k} [\frac{{1 + λ^{T} (β) g_{i} (β, \hat{ρ} (β))} {[\frac{\partial^{2} λ (β)}{\partial β \partial β^{T}}] g_{i} (β, \hat{ρ} (β)) + 2 {g^{'}}_{i} (β) {[\frac{\partial λ (β)}{\partial β}]}^{T} + {g^{″}}_{i} (β, \hat{ρ} (β)) λ (β)}}{{1 + λ^{T} (β) g_{i} (β, \hat{ρ} (β))}^{2}}] \\ - \sum_{i = 1}^{k} [\frac{{{[\frac{\partial λ (β)}{\partial β}]}^{T} g_{i} (β, \hat{ρ} (β)) + {g^{'}}_{i} (β, \hat{ρ} (β)) λ (β)} {{[\frac{\partial λ (β)}{\partial β}]}^{T} g_{i} (β, \hat{ρ} (β)) + {g^{'}}_{i} (β, \hat{ρ} (β)) λ (β)}^{T}}{{1 + λ^{T} (β) g_{i} (β, \hat{ρ} (β))}^{2}}] \\ = {[\frac{\partial λ (β)}{\partial β}]}^{T} R^{λ λ} [\frac{\partial λ (β)}{\partial β}] + 2 {[\frac{\partial λ (β)}{\partial β}]}^{T} R^{β λ} + R^{β β} . \end{array}$

Following [18], a local quadratic approximation to $R$ leads to

$[\frac{\partial λ (β)}{\partial β}] = {(R^{λ λ})}^{- 1} R^{β λ},$

$W_{l}^{β β} = R^{β β} - R^{β λ} {(R^{λ λ})}^{- 1} R^{λ β} .$ (13)

4.3. Construction of Confidence Interval

We use the bisection method to construct the lower and upper confidence limits based on the profile EL ratio for $β$ . Let $\hat{β} = {({\hat{β}}_{1}, {\hat{β}}_{2})}^{T}$ be the estimated value of $β$ from Section 4.2, where ${\hat{β}}_{1}$ is a scalar and ${\hat{β}}_{1}$ is the $1 \times p - 1$ vector of parameters and we are interested to construct confidence interval for $β_{1}$ .

1. Compute a reasonable lower confidence limit $β_{1, L}$ for $β_{1}$ . Set $L_{1} = {\hat{β}}_{1}$ , $L_{2} = {\hat{β}}_{1} - a \times SE ({\hat{β}}_{1})$ , and $ϵ = 1 e^{- 05}$ , where $SE ({\hat{β}}_{1})$ is the standard error of ${\hat{β}}_{1}$ using any existing method. We can choose $a$ such that $W_{l} (L_{2}, {\hat{β}}_{2}) > [χ_{1,1 - α}^{2}] / 2 > W_{l} (L_{1}, {\hat{β}}_{2})$ , where $χ_{1,1 - α}^{2}$ is the $(1 - α) th$ quantile from a $χ^{2}$ distribution with one degree of freedom.

2. Compute the profile empirical log-likelihood ratio values $W_{1} = 2 W_{l} (L_{1}, {\hat{β}}_{2})$ and $W_{2} = 2 W_{l} (L_{2}, {\hat{β}}_{2})$ .

3. Minimize the profile EL ratio function defined in (7) over $β_{2}$ for a given $L_{n e w} = (L_{1} + L_{2}) / 2$ . Let ${\hat{β}}_{2 n e w}$ be the new estimate of $β_{2}$ and $W_{n e w} = 2 W_{l} (L_{n e w}, {\hat{β}}_{2 n e w})$ .

4. If $W_{n e w} < χ_{1, 1 - α}^{2}$ , set $L_{1} = L_{n e w}$ and $W_{1} = W_{n e w}$ ; else set $L_{2} = L_{n e w}$ and $W_{2} = W_{n e w}$ .

5. If $| W_{1} - W_{2} | < ϵ$ stop the algorithm and report $β_{1, L}$ ; otherwise go to Step 3.

We can use this approach to construct the upper confidence limit by setting $U_{1} = {\hat{β}}_{1}$ and $U_{2} = {\hat{β}}_{1} + a \times SE ({\hat{β}}_{1})$ .

5. Performance Analysis

In this section, we conduct simulation studies to investigate the performance of our EL-based approach. We compute the coverage probabilities based on the ordinary EL, AEL and compare them with those of the GEE approach, which is based on a normal approximation. We also compute the coverage probabilities based on the extended empirical likelihood (EEL) by [27] and [28], which expands the EL domain geometrically and improve the coverage probabilities. We use different working correlations for the comparison. We generate count and continuous responses with different correlation structures and compare the methods under different working correlation structures.

5.1. Correlation Models for Stationary Count Data

We consider the stationary correlation models for count data discussed by [29] and [30]. The three models used to generate the data are

(i) Poisson Autoregressive Order 1 (AR(1)) Model

Let $y_{i 1} ~ Poi ({\tilde{μ}}_{i})$ , where ${\tilde{μ}}_{i} = \exp ({\tilde{x}}_{i} β)$ . The repeated responses follow the AR lag 1 dynamic model given by

$y_{i t} = ρ * y_{i, t - 1} + d_{i t}, t = 2, \dots, m_{i} .$ (14)

Given $y_{i, t - 1}$ , $ρ * y_{i, t - 1}$ is the binomial thinning operation. That is,

$ρ * y_{i, t - 1} = \sum_{j = 1}^{y_{i, t - 1}} b_{j} (ρ) = z_{i, t - 1},$

where the $b_{j} (ρ)$ are independent and identically distributed $Bernoulli (ρ)$ random variables. We assume that $d_{i t} ~ Poi ({\tilde{μ}}_{i} (1 - ρ))$ and it is independent of $z_{i, t - 1}$ . Let ${\tilde{x}}_{i} = ({\tilde{x}}_{i 1}, \dots, {\tilde{x}}_{i p})$ be the time-independent covariate for the ith individual.

(ii) Poisson Moving Average Order 1 (MA(1)) Model

The repeated responses follow the MA lag 1 dynamic model given by

$y_{i t} = ρ * d_{i, t - 1} + d_{i t}, t = 2, \dots, m_{i},$ (15)

where $ρ * d_{i, t - 1} = \sum_{j = 1}^{d_{i, t - 1}} b_{j} (ρ)$ is a binomial thinning operation and $d_{i t} ~ Poi [\frac{{\tilde{μ}}_{i}}{1 + ρ}]$ , $t = 0, \dots, m_{i}$ , with ${\tilde{μ}}_{i} = \exp ({\tilde{x}}_{i} β)$ . Here $t = 0$ is the initial time.

(iii) Poisson Equally Correlated Model

Let $y_{i 0} ~ Poi ({\tilde{μ}}_{i})$ and $d_{i t} ~ Poi [{\tilde{μ}}_{i} (1 - ρ)]$ for all $t = 1, \dots, m_{i}$ . The repeated responses follow the dynamic equicorrelation model given by

$y_{i t} = ρ * y_{i 0} + d_{i t}, for t = 1, \dots, m_{i} .$ (16)

We simulated 1000 data sets from each of these models follow the AR(1), EQC, or MA(1) structure, and used EL-based methods to estimate the parameters using different working correlation such as AR(1), EQC, and MA(1) as well as lag correlation. In each simulation we use the parameters $β = {(β_{1}, β_{2})}^{T} = {(0.3,0.2)}^{T}$ and $ρ = 0.5$ . We consider $k = 100$ subjects and $m = 4$ time points. For the ith subject, we generate the covariates ${\tilde{x}}_{i} = ({\tilde{x}}_{i 1}, {\tilde{x}}_{i 2})$ from a normal distribution with mean 0 and standard deviation 1. For the analysis, we consider the working correlation to be either a true correlation or a lag correlation. We did not consider other possible values for $ρ$ since the working correlation structure may lead to no solution for $\hat{α}$ in some situations.

Table 1 gives the average estimated values of the regression coefficients with the corresponding simulated standard errors in parentheses for the independent, AR(1), EQC, and MA(1) models. We also give the coverage probabilities for $β_{1}$ and $β_{2}$ for the 0.95 and 0.99 confidence levels with the average width of the CI in parentheses. The results in Table 1 shows that the estimates ${\hat{β}}_{1}$ and ${\hat{β}}_{2}$ are close to the true values, width, and the coverage probabilities of the intervals based on the EL, EEL, and AEL are similar to those of the GEE. For instance, in the AR(1)/AR(1) case (true model/working correlation structure) the coverage probabilities of ${\hat{β}}_{1}$ based on the GEE, EL, EEL, and AEL are 0.947, 0.928, 0.937, and 0.937 respectively for the nominal level of 0.95. For ${\hat{β}}_{2}$ , these probabilities are 0.954, 0.934, 0.940, and 0.942 for the same nominal level. Note that the intervals based on the EL have a slight undercoverage compared with those based on GEE. The EEL and AEL give substantially better coverage probabilities. Moreover, EEL and AEL are consistently more accurate than the EL. The results for lag correlations have similar patterns.

5.2. Misspecified Working Correlation Structure

In the simulation studies discussed in Section 5.1 we considered the correlation structure used to generate the data as the working correlation in the GEE-based modelling. However, in practice, we do not know the correlation structure of the data. As discussed before, if the working correlation is misspecified, we may lose the efficiency of the parameter estimates [3] [4].

We conducted a simulation study to assess the loss of efficiency. We generated repeated counts with the AR(1) correlation structure given in Section 5.1(i) with $ρ = 0.49$ and 0.70 and $m = 5$ time points. We used three working correlation structures: EQC, MA(1), and lag correlation. Table 2 gives the results for the GEE, EL, EEL, and AEL.

Table 2 shows that the EL, EEL, and AEL are superior to the GEE when the correlation structure is misspecified. Note that, in this EL-based approach, we could construct CIs without estimating the variance of the parameter of interest. For example, in the AR(1)/EQC case the coverage probabilities of ${\hat{β}}_{1}$ based on the GEE, EL, EEL, and AEL are 0.917, 0.928, 0.934, and 0.935 respectively for the nominal 0.95 level. For ${\hat{β}}_{2}$ , these probabilities are 0.916, 0.929, 0.937, and 0.937 for the same nominal level. In this situation, the GEE with stationary lag correlation performs better than the GEE with a misspecified working correlation. However, the EL, EEL, and AEL perform as well as the former method, despite being nonparametric methods based on a data-driven likelihood ratio function. We did not consider all possible cases, for instance, a true EQC or MA(1) correlation model, since under different working correlation structures the correlation parameter $\hat{α}$ does not exist (see [5]).

5.3. Over-Dispersed Stationary Count Data

In this section, we consider the performance of our approach when the variance function is misspecified, in the context of stationary count data. We generate over-dispersed stationary count data $y_{i t}$ using ${\tilde{μ}}_{i} = u_{i} \exp ({\tilde{x}}_{i} β)$ for the three models discussed in Section 5.1, where $u_{i}$ is a random sample such that $E (u_{i}) = 1$ and $Var (u_{i}) = ω$ . Marginally, we have $E (y_{i t}) = {\tilde{μ}}_{i}$ and $Var (y_{i t}) = {\tilde{μ}}_{i} (1 + {\tilde{μ}}_{i} ω)$ . The distribution of u is chosen to be gamma with shape parameter $ω$ and scale parameter $1 / ω$ , where $ω$ is the over-dispersion parameter. We choose over-dispersion parameter $ω = 1 / 4$ . However, the GEE, EL, EEL, and AEL CIs are constructed under the assumption that there is no over-dispersion.

Table 3 gives the average estimated values of the regression coefficients, the corresponding simulated standard errors in parentheses, the coverage probabilities for $β_{1}$ and $β_{2}$ for the 0.95 and 0.99 confidence levels, and the average width of the CI in parentheses for the independent, AR(1), EQC, and MA(1) models. Table 3 shows that when there is over-dispersion, the EL, EEL, and AEL outperform the GEE. In the AR(1)/AR(1) case the coverage probabilities of ${\hat{β}}_{1}$ based on the GEE, EL, EEL, and AEL are 0.876, 0.916, 0.926, and 0.931 respectively for the nominal 0.95 level. For ${\hat{β}}_{2}$ , these probabilities are 0.891, 0.920, 0.929, and 0.931 for the same nominal level. This indicates that the EL, EEL, and AEL are fairly robust to model misspecification. Note that the construction of the CI based on the EL, EEL, and AEL does not require the estimation of the scale parameter.

5.4. Correlation Models for Continuous Data

In this section, we investigate the performance of our EL approach on a class of stationary and nonstationary correlation models for longitudinal continuous data. The random errors ${(ϵ_{1}, ϵ_{2}, ϵ_{3}, ϵ_{4})}^{T}$ are generated from the multivariate normal distribution with marginal mean 0, marginal variance 1, and an auto-correlation coefficient $ρ = 0.5$ . In this performance analysis, we consider three correlation models: exchangeable, AR(1), and MA(1).

(i) AR(1) Structure

For $t = 1, \dots, m_{i}$ , for the ith individual

$y_{i t} = x_{i t} β + ϵ_{i t},$ (17)

and we assume that

$ϵ_{i t} = ρ ϵ_{i t} + a_{i t},$

with $| ρ | < 1$ and $a_{i t} ~ N (0,1)$ .

(ii) MA(1) Structure The $ϵ_{i t}$ in (17) follow the model

$ϵ_{i t} = ρ a_{i, t - 1} + a_{i t}$

where $ρ$ is a suitable scale parameter that does not necessarily satisfy $| ρ | < 1$ , and $a_{i t} ~ N (0,1)$ .

(iii) Equicorrelation (EQC) Structure The $ϵ_{i t}$ in (17) follow the model

$ϵ_{i t} = ρ a_{i 0} + a_{i t},$

where $a_{i 0}$ is an error value at the initial time, and $ρ$ is a suitable correlation parameter. We assume that

$a_{i t} ~ N (0,1) and a_{i 0} ~ N (0,1),$

and $a_{i t}$ and $a_{i 0}$ are independent for all t.

We simulated 1000 data sets from the above models under stationary and nonstationary covariates, using the parameters $β = {(β_{1}, β_{2})}^{T} = {(0.4, 0.5)}^{T}$ , $ρ = 0.5$ , and $m = 4$ . For the ith subject, we generate the covariates ${\tilde{x}}_{i} = ({\tilde{x}}_{i 1}, {\tilde{x}}_{i 2})$ from a normal distribution with mean 0 and standard deviation 1. Table 4 gives the mean estimated values of the regression coefficients, the corresponding simulated standard errors in parentheses, the simulated coverage probabilities for $β_{1}$ and $β_{2}$ for the 0.95 and 0.99 confidence levels, and the average width of the CI in parentheses for the independent, AR(1), EQC, and MA(1) models with stationary covariates. Table 5 gives the results for nonstationary covariates.

The coverage probabilities of the intervals based on the EL, EEL, and AEL are similar to those of the GEE. For instance, in the MA(1)/MA(1) case in Table 4 the coverage probabilities of ${\hat{β}}_{1}$ based on the GEE, EL, EEL, and AEL are 0.955, 0.945, 0.955, and 0.954 respectively for the nominal 0.95 level. For ${\hat{β}}_{2}$ , these probabilities are 0.958, 0.944, 0.948, and 0.951 for the same nominal level. Note that the intervals based on the EL have a slight undercoverage compared with those for the GEE. Also, the EEL and AEL are consistently more accurate than the EL. The lag-correlation-based coverage probabilities have similar patterns.

5.5. Correlation Models for Misspecified Continuous Data

In this section, we compare the performances of the methods when the correlation model for continuous data is misspecified. The stationary and nonstationary correlation models for longitudinal continuous data are generated from (17) for the parameter set in Section 5.4, and the correlated random errors ${(ϵ_{1}, ϵ_{2}, ϵ_{3}, ϵ_{4})}^{T}$ are generated from the $χ^{2} (1) - 1$ distribution instead of the normal distribution for the three correlation models:

Table 1. Coverage probabilities of regression estimates for count data with stationary covariates for the independent, AR(1), EQC, and MA(1) models.

Table 2. Coverage probabilities of regression estimates for count data with stationary covariates when the working correlation is misspecified for an AR(1) model.

Table 3. Coverage probabilities of regression estimates for over-dispersion of count data with stationary covariates for the independent, AR(1), EQC, and MA(1) models.

Table 4. Coverage probabilities of regression estimates for continuous data with stationary covariates for the independent, AR(1), EQC, and MA(1) models.

· AR(1): $ϵ_{i t} = ρ ϵ_{i, t - 1} + a_{i t}, t = 1,2, 3,4$ ,

· EQC: $ϵ_{i t} = ρ a_{i,0} + a_{i t}, t = 1,2, 3,4$ ,

· MA(1): $ϵ_{i t} = ρ a_{i, t - 1} + a_{i t}, t = 1,2, 3,4$ .

However, the confidence regions for the GEE are constructed under the normality assumption.

Table 6 gives the mean estimated values of the coefficients and the corresponding simulated standard errors in parentheses. It also includes the coverage probability for $β_{1}$ and $β_{2}$ for the 0.95 and 0.99 confidence levels and the average width of the CI in parentheses for samples of sizes $k = 50$ and $k = 100$ for the independent, AR(1), EQC, and MA(1) models with stationary covariates. Table 7 gives the results for nonstationary covariates.

When the model is misspecified, the EL, EEL, and AEL outperform the GEE. For example, in the AR(1)/Lag case in Table 6 the coverage probabilities of ${\hat{β}}_{1}$ based on the GEE, EL, EEL, and AEL are 0.790, 0.918, 0.931, and 0.932 respectively for the nominal 0.95 level. For ${\hat{β}}_{2}$ , these probabilities are 0.801, 0.924, 0.937, and 0.937 for the same nominal level. Note that we do not need to estimate a scale parameter in the construction of the CI in the EL setup, and also in the EL we did not model the over-dispersion. Table 7 shows that when the covariates are time-dependent the GEE has substantial undercoverage compared with the results for time-independent covariates, as discussed by [9].

6. Applications

In this section, we illustrate the applicability of our proposed method to two real-world examples.

6.1. Health Care Utilization Study

We consider longitudinal health care utilization data [5] that was collected by Eastern Health, St. John’s, Newfoundland, Canada. These longitudinal count data contain complete records for $k = 144$ individuals for the $m = 4$ years from 1985 to 1988. The response of interest was the number of visits to a physician by each individual during a given year. Information on four covariates, namely, gender, number of chronic conditions, education level, and age, was recorded for each individual. Background information allows us to assume that the response variable, marginally, follows the Poisson distribution, and the repeated counts over the four years will be longitudinally correlated. Since the data indicate over-dispersion, we consider a negative binomial model with two variance functions

$var (y) = μ + α μ$

and

$var (y) = μ + α μ^{2} .$

Thus, the variance function is different from that of the Poisson model, $var (y) = μ$ . To confirm the over-dispersion, we test $H_{0} : α = 0$ against

Table 5. Coverage probabilities of regression estimates for continuous data with nonstationary covariates for the independent, AR(1), EQC, and MA(1) models.

Table 6. Coverage probabilities of regression estimates for misspecified data with stationary covariates for the independent, AR(1), EQC, and MA(1) models (k = 50).

Table 7. Coverage probabilities of regression estimates for misspecified data with nonstationary covariates for the independent, AR(1), EQC, and MA(1) models (k = 100).

$H_{a} : α > 0$ using the likelihood ratio test. The result confirms the presence of over-dispersion in both variance function models.

Our analysis used the GEE with a working correlation matrix (AR(1), EQC, MA(1), or lag correlation) and our EL approach. Table 8 gives the regression parameter estimates and 95% CIs. The gender covariate was coded as 1 for male and 0 for female. Under the AR(1) structure, the estimate of its regression coefficient is ${\hat{β}}_{1} = - 0.1929$ , suggesting that females make more visits to physicians. The GEE CI indicates that this variable is significant, but the EL CI does not. The estimated values ${\hat{β}}_{2} = 0.1668$ and ${\hat{β}}_{4} = 0.0308$ suggest that individuals with chronic diseases and older individuals pay more visits to physicians, as expected. The corresponding CIs show that both variables are significant. The education covariate was coded as 1 for less than high school and 0 for higher education. The value ${\hat{β}}_{3} = - 0.4738$ indicates that educated individuals pay more visits to physicians, showing that they are more concerned about their health or can afford it. The corresponding CIs show that this variable is significant. Table 8 shows that different working correlations lead to slightly different parameter estimates, but the overall conclusion remains the same. Since the data indicate over-dispersion, the GEE-based approach may be inefficient, as shown

Table 8. Regression estimates for health care utilization count data.

in our performance analysis. We conclude that the EL approach is more appropriate for this data set, and the significant variables identified by this approach are more reliable.

6.2. Longitudinal CD4 Cell Counts of HIV Seroconverters

This data set contains 2376 observations of the CD4 cell counts of $k = 369$ men infected with the HIV virus [31]. The goal of our analysis is to estimate the average evolution over time of the CD4 counts by considering the effects of AGE, SMOKE (smoking status measured by packs of cigarettes per day), DRUG (yes = 1; no = 0), SEXP (number of sex partners), DEPRESSION (measured by the CESD scale) and YEAR (time since seroconversion). To examine whether there are any interaction effects between the covariates, we included all the two-factor interactions in our model.

This data set has the subject-specific evolution over time of the CD4 cell counts with and without drug use. The cell counts are right-skewed, so the analysis was conducted on square-root transformed CD4 cell counts whose distribution is more nearly Gaussian. Tables 9-11 summarize the analysis for the AR(1), EQC, and lag working correlations. The GEE indicates that SMOKE, DRUG, SEXP, AGE × SEXP, SMOKE × DRUG, SMOKE × SEXP, and DRUG × SEXP are significant. Under EQC, AGE × SMOKE and AGE × DRUG are also significant. The EL selects SMOKE, DRUG, SEXP, and DRUG × SEX. Under EQC and lag AGE × SEXP is also significant. The GEE approach is sensitive to the choice of correlation structure. In this real data set, the true correlation structure is unknown, so the lag correlation approach is appropriate since it can accommodate all three correlation structures. The Shapiro-Wilk test shows that the square-root transformed CD4 cell counts are not normally distributed. The GEE-based method is, therefore, not appropriate. We, therefore, conclude that the EL is a better choice.

Table 9. Estimated coefficients for CD4 data set using AR(1) working correlation.

Table 10. Estimated coefficients for CD4 data set using EQC working correlation.

Table 11. Estimated coefficients for CD4 data set using lag working correlation.

7. Conclusions

Longitudinal data modelling using the GEE approach assumes a working correlation model for the within-subject correlation of the responses. When the working correlation is incorrectly specified, the GEE based estimates are not necessarily consistent and may lose efficiency. Any misspecification can cause estimates based on marginal models to be inefficient and misleading conclusions. Also, the construction of a confidence region and hypothesis testing are based on asymptotic normality, which may not hold since the finite-sample distribution may not be symmetric.

Taking these issues into account, we have proposed an EL-based longitudinal modelling based on a data-driven likelihood ratio approach sharing many of the properties of the parametric likelihood. We do not need to specify the complete parametric distribution to perform the inference. We can, therefore, use likelihood methods without assuming that the data come from a known family of distributions. We defined the subject-wise profile EL based on a set of GEEs. The estimation and confidence region construction using the EL approach are proposed, which has advantages over other methods such as those based on normal approximations. We introduced the adjusted EL to avoid any computational issues, which improve the coverage probabilities. A major advantage of EI is that involves no prior assumptions about the shape of an EL-based confidence region, which is data-driven. The construction of the confidence region based on the EL method does not involve any variance estimation.

The proposed approach yields more efficient estimators than the conventional GEE approach and achieves the same asymptotic properties as [17]. Our performance analysis showed that our method for longitudinal count and continuous responses is comparable to the GEE when the model assumptions are satisfied. For instance, when the working correlation is correctly specified, the coverage probabilities of the intervals based on the EL, EEL, and AEL are similar to those of the GEE. CIs based on the regular EL have slight undercoverage compared with those of the GEE; the coverage probabilities are substantially improved with the EEL and AEL. Moreover, these methods are consistently more accurate than the regular EL. When the working correlation is misspecified, the coverage probabilities of the intervals based on the EL, EEL, and AEL are shown to be equally efficient to the GEE estimator with stationary lag correlation structure. Also, the results show that when the working correlation is misspecified, the GEE estimator with stationary lag correlation structure, EL, EEL, and AEL outperforms the GEE with an incorrect working correlation structure. When the model is misspecified such as marginal variance, our method outperforms the GEE. This result shows that EL methods are robust to misspecification. Moreover, the EL-based CI has a data-driven shape, whereas the GEE-based CI is always symmetric due to normal approximation.

Acknowledgements

The authors’ research was supported by grants from Natural Sciences & Engineering Research Council of Canada and Canadian Institute of Health Research.

Conflicts of Interest

The authors declare no conflicts of interest.

References

[1]	McCullagh, P. and Nelder, J. (1989) Generalized Linear Models. 2nd Edition, CRC Press, London. https://doi.org/10.1007/978-1-4899-3242-6
[2]	Liang, K.Y. and Zeger, S.L. (1986) Longitudinal Data Analysis Using Generalized Linear Models. Biometrika, 73, 13-22. https://doi.org/10.1093/biomet/73.1.13
[3]	Crowder, M.J. (1995) On Use of a Working Correlation Matrix in Using Generalized Linear Models for Repeated Measures. Biometrika, 82, 407-410. https://doi.org/10.1093/biomet/82.2.407
[4]	Sutradhar, B.C. and Das, K. (1999) On the Efficiency of Regression Estimators in Generalized Linear Models for Longitudinal Data. Biometrika, 86, 459-465. https://doi.org/10.1093/biomet/86.2.459
[5]	Sutradhar, B.C. (2003) An Overview on Regression Models for Discrete Longitudinal Responses. Statistical Science, 18, 377-393. https://doi.org/10.1214/ss/1076102426
[6]	Sutradhar, B.C. and Kovacevic, M. (2000) Analysing Ordinal Longitudinal Survey Data: Generalized Estimating Equations Approach. Biometrika, 87, 837-848. https://doi.org/10.1093/biomet/87.4.837
[7]	Nadarajah, T., Variyath, A.M. and Loredo-Osti, J.C. (2016) Penalized Generalized Quasi-Likelihood Based Variable Selection for Longitudinal Data. In: ISS-2015 Proceedings Volume on Advances in Parametric and Semiparametric Analysis of Multivariate, Time Series, Spatial-Temporal, and Familial-Longitudinal Data, Springer Lecture Notes in Statistics, Springer, Berlin, 233-250. https://doi.org/10.1007/978-3-319-31260-6_8
[8]	Qu, A., Lindsay, B.G. and Li, B. (2000) Improving Estimating Equations Using Quadratic Inference Functions. Biometrika, 87, 823-836. https://doi.org/10.1093/biomet/87.4.823
[9]	Hu, F.-C. (1993) A Statistical Methodology for Analyzing the Causal Health Effect of a Time Dependent Exposure from Longitudinal Data. Dissertation, Harvard School of Public Health, Department of Biostatistics, Boston.
[10]	Pepe, M.S. and Anderson, G.L. (1994) A Cautionary Note on Inference for Marginal Regression Models with Longitudinal Data and General Correlated Response Data. Communications in Statistics, Part B Simulation and Computation, 23, 939-951. https://doi.org/10.1080/03610919408813210
[11]	Emond, M.J., Ritz, J. and Oakes, D. (1997) Bias in GEE Estimates from Misspecified Models for Longitudinal Data. Communications in Statistics, 26, 15-32. https://doi.org/10.1080/03610929708831899
[12]	Pan, W., Louis, T.A. and Connett, J.E. (2000) A Note on Marginal Linear Regression with Correlated Response Data. American Statistician, 54, 191-195. https://doi.org/10.1080/00031305.2000.10474544
[13]	Diggle, P.J., Heagerty, P.J., Liang, K.-Y. and Zeger, S.L. (2002) The Analysis of Longitudinal Data. 2nd Edition, Oxford Statistical Science, Oxford University Press, Oxford.
[14]	Owen, A.B. (1988) Empirical Likelihood Ratio Confidence Interval for a Single Functional. Biometrika, 75, 237-249. https://doi.org/10.1093/biomet/75.2.237
[15]	Owen, A.B. (1991) Empirical Likelihood for Linear Models. The Annals of Statistics, 19, 1725-1747. https://doi.org/10.1214/aos/1176348368
[16]	Chen, S.X. (1994) Empirical Likelihood Confidence Intervals for Linear Regression Coefficients. Journal of Multivariate Analysis, 49, 24-40. https://doi.org/10.1006/jmva.1994.1011
[17]	Qin, J. and Lawless, J. (1994) Empirical Likelihood and General Estimating Equations. Annals of Statistics, 22, 300-325. https://doi.org/10.1214/aos/1176325370
[18]	Owen, A.B. (2001) Empirical Likelihood. 2nd Edition, CRC Press, London. https://doi.org/10.1201/9781420036152
[19]	You, J., Chen, G. and Zhou, Y. (2006) Block Empirical Likelihood for Longitudinal Partially Linear Regression Models. Canadian Journal of Statistics, 34, 79-96. https://doi.org/10.1002/cjs.5550340107
[20]	Xue, L.G. and Zhu, L.X. (2007) Empirical Likelihood Semiparametric Regression Analysis for Longitudinal Data. Biometrika, 94, 921-937. https://doi.org/10.1093/biomet/asm066
[21]	Wang, S., Qian, L. and Carroll, J.R. (2010) Generalized Empirical Likelihood Methods for Analyzing Longitudinal Data. Biometrics, 97, 79-93. https://doi.org/10.1093/biomet/asp073
[22]	Hall, P. and La Scala, B. (1990) Methodology and Algorithm of Empirical Likelihood. International Statistical Review, 58, 109-127. https://doi.org/10.2307/1403462
[23]	Corcoran, S.A., Davison, A.C. and Spady, R.H. (1995) Reliable Inference from Empirical Likelihood. Technical Report, Nuffield College, University of Oxford, Oxford.
[24]	Tsao, M. (2004) Bounds on Coverage Probabilities of the Empirical Likelihood Ratio Confidence Regions. The Annals of Statistics, 32, 1215-1221. https://doi.org/10.1214/009053604000000337
[25]	Chen, J., Variyath, A.M. and Abraham, B. (2008) Adjusted Empirical Likelihood and Its Properties. Journal of Computational Graphics and Statistics, 17, 426-443. https://doi.org/10.1198/106186008X321068
[26]	Chen, J., Sitter, R.R. and Wu, C. (2002) Using Empirical Likelihood Methods to Obtain Range Restricted Weights in Regression Estimators for Surveys. Biometrika, 89, 230-237. https://doi.org/10.1093/biomet/89.1.230
[27]	Tsao, M. (2013) Extending the Empirical Likelihood by Domain Expansion. Canadian Journal of Statistics, 41, 257-274. https://doi.org/10.1002/cjs.11175
[28]	Tsao, M. and Wu, F. (2013) Empirical Likelihood on the Full Parameter Space. Revised for the Annals of Statistics, 41, 2176-2196. https://doi.org/10.1214/13-AOS1143
[29]	McKenzie, E. (1988) Some ARMA Models for Dependent Sequences of Poisson Counts. Advances in Applied Probability, 20, 822-835. https://doi.org/10.1017/S0001867800018395
[30]	Sutradhar, B.C. (2011) Dynamic Mixed Models for Familial Longitudinal Data. Springer Series in Statistics, Springer, Berlin. https://doi.org/10.1007/978-1-4419-8342-8
[31]	Zeger, L.S. and Diggle, J.P. (1994) Semiparametric Models for Longitudinal Data with Application to CD4 Cell Numbers in HIV Seroconverters. Biometrics, 50, 689-699. https://doi.org/10.2307/2532783

Journals Menu

Follow SCIRP

	+1 323-425-8868
	customer@scirp.org
	+86 18163351462(WhatsApp)
	1655362766

	Paper Publishing WeChat

Journals Menu

Home

About SCIRP

Service

Policies