1. Introduction
There has been a massive growth in the number of randomized clinical trials (RCTs) since the first RCT was introduced in the well-known streptomycin trial in 1946 [1] . In making some of this information more readily available, an attempt is made to pull together the existing evidence in a form that can be used by researchers or statisticians; this is called systematic review. The aim of systematic reviews is to identify and summarize the findings of all possible studies addressing the clinical question of the review. Systematic review reduces the large quantities of information to a manageable size. An important aspect of most reviews is the quantitative synthesis of results; thus, meta-analysis is the statistical part of systematic review. The main purpose of meta-analysis is to increase the precision of the conclusions of a review. With statistical perspective, it is able to detect treatment effects with greater power and estimate these effects with greater precision than any single study. A meta-analysis from systematic reviews of Antiplatelet Trialists’ Collaboration [2] was applied in the paper.
Most meta-analysis has focused on summarizing treatment effect measures based on the comparison of two treatments or called arms, sometimes also called interventions or exposures. In this comparison, two groups of individual studies are exposed to two different treatments. Standard two-arm RCTs are frequently used in clinical research due in part to its relative simplicity of design and interpretation. At its most basic, one power, one significance level and one magnitude of difference are analyzed for two-arm comparisons. Conclusions are straightforward: either the two arms are shown to be different or they are not. The implementation for the model is not complicated. When more than two arms are included in meta-analysis, complexity ensues. These types of dataset are called multi-arm trials [3] [4] [5] although some authors call it mixed treatment comparison (MTC) [6] [7] , or multiple treatment comparison [4] [8] . Additionally, some authors call network meta-analysis for indirect head-to-head evidence for multiple evidences from RCTs [9] - [14] .
For multi-arm trials data, Chootrakool and Shi [3] introduced the normal approximation model using an empirical logistic transform requiring a large number of individual observations and the probability of being the case to be not too near zero or one. If the number of individual observations is small, the normal approximation model is not suitable. The number of samples in a single study should usually be larger than 20 [4] . In this paper, an exact binomial model is introduced to fit the binary multi-arm trials data. There are two alternative maximum likelihood approaches that can be used to make inferences for the unknown parameters in the logistic regression model. These are unconditional method and conditional method. The logistic regression model has become increasingly popular with the easy availability of appropriate computer routines. Many authors have described maximum likelihood estimation procedures which turn out to be iterative [15] . There have been a large number of studies about unconditional and conditional methods, for example in Tritchler [16] , Sartori [17] , Lee et al. [18] and Caterina et al. [19] .
Most existing methods for meta-analysis of multi arm trials use the logistic regression model with the unconditional approach. Lu and Ades [20] introduced the Bayesian hierarchical model for multi-arm trials using the unconditional method to estimate unknown parameters. Weber et al. [21] used random effect model using the unconditional maximum likelihood method for the meta-analysis. They conducted a simulation study comparing two zero-cell corrections under the ordinary random-effects model. Logistic regression model is introduced using the conditional approach in the paper.
There has the direct and indirect comparison for RCTs in a meta-analysis [9] [20] , this has led to network meta-analysis. Seide et al. [12] perform simulation study for sparse network of trial including multi-arm trials. They compared Bayesian and Frequentist methods in random effects network meta-analysis. Jenkins et al. [13] introduced methods for inclusion of evidence in network meta-analysis. More applications of network meta-analysis can be found in Greco et al. [22] , Wang [23] and Zhao et al. [14] .
The structure of the paper is arranged as follows. The data structure of multi-arm trials is firstly introduced in Section 2. Fitting the logistic regression model is proposed in the next section. Unconditional maximum likelihood approach for the model including the standard error of maximum likelihood estimators (MLEs) are described in Section 4. Similarly, conditional maximum likelihood approach for the model is presented in Section 5. In Section 6, the logistic regression model is illustrated with the unconditional and conditional approaches with the data. The final section is discussion and conclusion including the advantages and the limitations of the two approaches.
2. The Data Structure of Multi-Arm Trials
Let the indices
and
stand for the studies and the treatments respectively, where the index
stands for the control group. To make multi-arm comparisons, suppose that there is M RCTs comparing K + 1 treatments. For the ith study, let
represent the number of cases on treatment j and let
be the total number in the corresponding group. Let
be the probability of being the case for a patient received the treatment j in the ith study. The
has a binomial distribution
A data structure of multi-arm trials shall be defined by introducing an index set
comprising the treatments involved in the ith study. The data structure is shown as
Some studies of meta-analysis might not have all the treatments available. For example, some studies might compare fewer than K + 1 treatments, or some baseline treatments may be different, or both cases could occur simultaneously. The data structure is analogous to an incomplete-blocks design. For example, from Antiplatelet data, the 8th - 17th studies are composed of two treatment comparison unlike the 1st - 7th studies comparing three treatments.
Let b(i) denote baseline treatment or called study-specific reference treatment [4] corresponding to the ith study, which can be the control group or any other treatments. As mentioned earlier about indirect comparison, in a situation that the treatments in some studies cannot be compared directly to the control group, we need to use evidence from the external studies. To make it clear, if b(i) = 0 then the direct comparison is involved in this study. Conversely, if
then the study makes indirect comparison. Let
represent the set of treatments involved in the ith study but excluding the baseline treatment b(i). Let
and
denote the number of treatments in the sets
and
respectively. The
and
are binomially distributed, respectively as
and
for
and
. Let D be set of studies that make the direct comparison of treatment thus let I be set of indirect comparison of treatment.
The multi-arm trials data is consisted of 27 RCTs from systematic reviews of Antiplatelet Trialists’ Collaboration [2] in total as shown in Table 1. The studies compare three treatments: aspirin plus dipyridamole (A), aspirin alone (B) and control group (C), where seven trials compare aspirin plus dipyridamole, aspirin alone and control group (i.e. comparing all A, B and C), ten trials compare aspirin plus dipyridamole and control group (i.e. comparing A and C) and ten trials compare aspirin alone and control group (i.e. comparing B and C) The “event” in Table 1 represents the number of patients in whom deep venous thrombosis was detected by systematic fibrinogen scans or venography, or both, after general and orthopedic surgery and in high risk medical patients. The “total” represents the number of patients controlled in each group.
3. Fitting the Logistic Regression Model
This section illustrates how to fit the logistic regression model to the binary data related to multi-arm trials including the direct and indirect comparisons. Logistic regression is a regression model for a binomially distributed response/dependent variable. It is useful for modelling the probability of an event occurring as a function of other factors. Logistic regression is part of a category of statistical models called generalized linear models and uses the logit as its link function. Logistic regression can be used only with two types of dependent variables: one is a categorical dependent variable that has exactly two categories (i.e. a binary or dichotomous variable). The other is a continuous dependent variable that has values in the range 0 to 1 representing the probability values or the proportions. The names for logistic regression used in various other application areas are logistic model or logit model. Logistic regression is similar to linear regression. In linear regression, ultimate objective for the study may be either estimation of the coefficient values, or prediction of the response value. One significant difference between logistic and linear models is that the linear model has a continuous response variable and the logistic model uses a binary or dichotomous response. Logistic regression models for the ith study can be defined by
(1)
(2)
where
and
are trial effect and treatment effect, they will be detailed next.
Table 1. RCTs of Antiplatelet data.
3.1. Trial Effect
Two assumptions are usually made about the trial effect
. The first one is that the trial effects are assumed to be study-level effects, which means that the
’s are different parameters and are treated as nuisance parameters in the model. The M different unknown parameters are needed to include in the model. The second one is that the model may be assumed for the
’s. A special case is to assume that the trial effect is a fixed effect, defined by
. Conversely, it may be assumed to be a random effect, given by
, where
is the overall mean of the trial effect and
measures the magnitude of the variation between the studies. Most of the existing methods therefore used the first assumption. However, the number of unknown parameters is the same as the number of studies if the first assumption of the trial effect is made. It will result in some theoretical and computational problems. The accuracy of the estimation depends on the sample size of each study not the overall sample size of the pool in the meta-analysis. Therefore, the
’s are assumed to be different in the paper.
3.2. Treatment Effect
The treatment effect
can be direct treatment effect if
or indirect treatment effect if
: they are defined as follows.
where
is the correlation coefficient between
and
. For example, if the data, includes 3 treatments such as treatment A, B, C and C stands for control group. The baseline treatment for some studies exclude the control group C (such as A and B) thus the indirect treatment effect can be written as
Next, we shall consider the treatment effect in a matrix form. Let
and
represent the vectors of
and
respectively where the superscript t stands for matrix transposition and let
represent the
covariance matrix. The model
can be written as
This is called the basic model of random treatment effect. Let
be the index vector of length K consisting of elements 0 and 1 corresponding to
, given by
Now, the random effect
can be written in the form of
and
:
As before, the covariance between the treatment effects
and
for
and
may be dependent. For the ith study, let
be the following
matrix
Let
denote the vector
then we have
where
and
The
are assumed to be different and the
are assumed to be a random effect as
. The models in the Equation (1) and Equation (2) can be used for both treatment comparisons. From model in Equation (2), log
is called the logistic transform of probability
or alternatively log odds
or logit
. Having considered the properties of logit
, the term
is the odds of an unsuccessful outcome from a patient treated with treatment j and so logit
is the log odds of being the case. It is easily seen that a value of
in the range (0, 1) corresponds to a value of logit
in (−∞, ∞). As
approaches to 0, logit
approaches to −∞; as
approaches to 1, logit
approaches to ∞ and for
, logit
. After some rearrangement, the logistic regression models in Equation (1) and Equation (2) have respectively equivalent formulations as
and
. (3)
There are two alternative maximum likelihood (ML) approaches, the unconditional and conditional approaches that can be used to estimate the unknown parameters in a logistic regression model.
4. Unconditional Maximum Likelihood Approach
Generally, unconditional ML estimation is preferred if the number of parameters in the model is small relative to the number of studies in a meta-analysis [25] .
4.1. Probability Functions
To demonstrate the unconditional ML estimation, let
and
denote the probability functions associated with the distributions of
and
respectively for
and
, defined as follows.
For the baseline treatment,
(4)
For the treatments j,
(5)
The combination in Equation (4) represents the number of possible combinations of
taken
at a time. The
in the middle term of Equation (4) is substituted from Equation (3) and
becomes
. The combination in Equation (5) can be analyzed in the same way.
4.2. The Unconditional Likelihood
From the probability functions
and
, the trial effects
’s are study-level effects. They are assumed to be different and also included in both probability functions. While the
is a random effect, thus the
involves the vector of random effects,
. The standard method of handling a probability function which involves random variables that have a fully specified probability is to integrate the probability function with respect to the distribution of those variables. To deal with the random effects
, let
be the vector
. The probability function
will be integrated with respect to
. The
contains
integrals, which is the number of treatments in the set
, and is given by
(6)
where
is the probability density function of the normal distribution with mean
and covariance
given by
(7)
The integral in Equation (6) can be calculated numerically; one way to do it is to use the Gauss-Hermite method. To apply Gauss-Hermite approximation, the probability function
for the ith study can be estimated by
, (8)
where the sampling nodes are at
and
. The vector
depends on the number
, which is the number of treatments comprising in the ith study. The resulting function
does not depend on the
. For most practical purposes,
need not be greater than 20, although some authors suggest using even smaller values [26] .
As before, let
be the collection of all unknown parameters for the meta-analysis including all trial effects
,
and
and let
be the vector
. The likelihood function for the ith study can be written as
(9)
where
and
are given in Equation (4) and Equation (8) respectively. Let
standing for the unconditional log-likelihood function of the logistic regression model for the ith study. The log-likelihood function of
for the logistic regression models is given by
(10)
The number of
’s is the same as the number of studies. The computation of MLEs may be quite unstable if the number of studies is large while the sample size of each study is small. As discussed earlier, this may also result in a biased or misleading estimate. A conditional approach is suggested to eliminate all nuisance parameters.
4.3. Asymptotic Variance-Covariance Matrix
This section shows how to calculate the standard errors for the MLEs of the logistic regression model using the unconditional approach. Since there are random effects in the model, some integrals are involved in the likelihood function. The unconditional log-likelihood function
in Equation (10) can be written as
(11)
Let
and
stand for the first and second terms of the above log-likelihood function, given by
. Three types of unknown parameters are involved in
; the trial effects,
’s, the overall mean effects μ’s (for
), and the variances τ’s and the correlation coefficients ρ’s in the covariance matrix
. Let τ represent a parameter (either τ or ρ) involved in
. There is no random effect involved in
.
First, the second-order partial derivative
can be calculated in the usual way; while the other terms are
,
and
.
Next, consider the second term of Equation (11), for notational convenience, let
represent the function
in
. Now the term
takes the form
where
is the density of the multivariate normal distribution with mean
and variance matrix
, and
is a summand of the log-likelihood involving the integrals (
). The first-order partial derivatives relating to
are shown as follows
Similarly, the second-order partial derivatives are
(12)
(13)
(14)
(15)
Note that the second-order partial derivative
is equal to zero. The second-order partial derivatives of
and
can be expressed in similar equations to Equation (13) and Equation (14) respectively. The integrals in the first-order and second-order partial derivatives can be approximated by Gaussian quadrature.
From the log-likelihood
in Equation (11), the second-order partial derivatives for the observed Fisher information matrix can be calculated as
,
,
,
,
,
.
As set earlier, the second partial derivatives of
and
(and
) can be calculated in similar equations to
and
respectively. Notice that the second-order derivative of
is only related in
. The matrix of second partial derivatives can be partitioned into a block matrix with null matrices in the off diagonals:
,
where
and
are the second-order partial derivatives about
, and μ, τ and ρ respectively. By multiplying
by −1, the observed Fisher information matrix
is obtained. The inverse of
is the asymptotic variance-covariance matrix of MLEs and their standard errors are the square roots of the diagonal of
.
5. Conditional Maximum Likelihood Approach
Conditional likelihood is widely used in logistic regression models with binary data. In particular, this leads to accurate inferences for the parameters of interest and eliminates all nuisance parameters [25] . The conditional likelihood will be defined and the maximum likelihood estimation will be described in this section.
5.1. Conditional Likelihood
From the logistic regression models in Equation (1) and Equation (2), the conditional likelihood
given that
for the ith study, is given by
(16)
The conditional likelihood reflects the probability of the observed data configuration relative to the probability of all possible configurations of the given data. The numerator
is exactly the same as the unconditional likelihood obtained from Equation (4) and Equation (5). The denominator is what makes the conditional likelihood different from the unconditional likelihood; it sums the joint probability for all possible configurations. To derive the Equation (16), the conditional likelihood
given
can be simplified as
(17)
where
and
Notice that this likelihood function does not involve any nuisance parameters
’s and is a function of
alone. The removal of the trial effects from the conditional likelihood is important because when the conditional likelihood is used, estimates are obtained only for the parameters of interest in the model and not for the
’s.
5.2. Estimation
The conditional likelihood in Equation (17) has
random effects so the likelihood
involves
integrations:
(18)
where
is the probability density function of multivariate normal distribution with mean
and covariance
given in Equation (7). Gauss-Hermite approximation is applied to Equation (18) and obtain:
(19)
where
is obtained from Equation (17) where the
sampling nodes is
and
. The likelihood for the ith study
can be written as
The log-likelihood function of the logistic regression models using the conditional approach is
(20)
By maximizing the conditional likelihood function over
, an exact parameter estimate is obtained for
, called the conditional maximum likelihood estimate. To calculate the standard error of their MLEs, the log-likelihood function in Equation (20) can be written as
(21)
Let
represent
in the above equation. The second-order partial derivatives of
,
and
are similar to Equation (13) - Equation (15) respectively. In a similar way to the previous section, the standard errors for the MLEs are obtained.
6. Application to Antiplatelet Therapy Data
From the data given in Table 1, the total number of individual observations is small thus the empirical log-odds model is not appropriate. The logistic regression model will be applied using the unconditional and conditional approaches with the data.
6.1. Unconditional Inference
From the data, there are 27 studies investigating the use of aspirin plus dipyridamole or aspirin alone in comparison with the control group. The studies compare three treatments: aspirin plus dipyridamole (A), aspirin alone (B) and the control treatment (C). Seven studies compare A, B and C, ten studies compare A and C and ten studies compare B and C. There is no indirect comparison for this dataset, so the set D is
. The baseline treatment for all studies is the control group (b(i) = 0).
The indices
and
stand for the studies and the treatments C, A and B, respectively. The data is partitioned into three groups:
,
and
. The sets
and
are given by
(22)
Let
and
be the numbers of patients who suffered reocclusions on treatments C, A and B respectively, where the ith study is in
,
and
, respectively. The total numbers of patients are
and
Let
and
be the probabilities that patients have reocclusions on treatments C, A and B respectively in the ith study. The
and
are binomially distributed as
The treatment effect
for
is defined as
(23)
Logistic regression models for the data can be fitted using the Equation (1) and Equation (2) where b(i) = 0 and
is given in Equation (22). Note that the trial effects are assumed to be different in each study. To define the unconditional likelihood function, let
represent the vector
. The probability functions
and
are formulated from Equation (4) and Equation (8) respectively.
From
for
, the correlation coefficient ρ between
and
is in the form
, where is obtained from
. Note that
and
are not estimable unless some other information is used. The assumption of homogeneity variance will be considered here. Suppose that all heterogeneity parameters are the same:
and the correlation coefficient takes the value 1/2. The unknown parameter
for the models is
. The log-likelihood function
is obtained from Equation (10). By maximizing the log-likelihood function, the MLEs can be estimated. Also their standard errors are given by the observed Fisher information matrix.
The results for the treatment effects
and
are given in Table 2. SD and CI stand for standard deviation and confidence interval respectively. The trial effects are presented in Table 3. The overall means on the log odds ratio
Table 2. The results of the treatment effects for the model using the unconditional method.
Table 3. The trial effect of the model using the unconditional method.
(LOR) scale for
and
are −1.17849 (SD 0.08499) and −0.63700 (SD 0.03728), and the heterogeneity parameter is 0.0372 (SD 0.04752). On the odds ration (OR) scale, the means are 0.30774 and 0.52800 respectively. Their confidence interval (CI) can be calculated from the related CI on the LOR scale. To be concluded, treatments aspirin plus dipyridamole and aspirin only in antiplatelet therapy reduce deep venous thrombosis by over 70% and 45% respectively. The average of both treatments reduces deep venous thrombosis by over 55%.
6.2. Conditional Inference
The models and other parameters are similar to those defined in the unconditional method. The function
for the data can be defined by
(24)
Let
denote the vector
. The conditional likelihood
for the ith study is given in Equation (18). To handle the random treatment effect
, the likelihood function is approximated by Gaussian-Hermite approximation as defined in Equation (19). The unknown parameter
for the models is
. By using the log-likelihood function in Equation (20), the results of the models are given in Table 4. On the LOR scale, the overall mean effects for both treatment effects are −0.87516 (SD 0.04340) and −0.39000 (SD 0.31160) while their variation between studies is 0.37000 (SD 0.03900). Those means on the OR scale are 0.41679 and 0.67434. As before their confidence intervals are obtained from the related CI on the LOR scale. The results indicate that treatments aspirin plus dipyridamole and aspirin only produce a reduction in deep venous thrombosis by over 55% and 30% respectively. The average of both treatments in antiplatelet therapy reduces deep venous thrombosis by over 40%.
As seen from Table 2 and Table 4, the results from using the unconditional likelihood (on the LOR scale) are smaller than from using conditional likelihood. Note that those results are negative. That is to say that estimation with unconditional likelihood may cause underestimation or bias. Collaboration [2] summarized that antiplatelet therapy produced a highly significant (2p ≤ 0.00001) reduction in deep venous thrombosis of about 40%. The results from the model using the conditional likelihood support this.
7. Conclusions and Discussion
The logistic regression model was introduced for the exact binomial distribution. Two alternative approaches for making inferences were presented. The unconditional likelihood involves nuisance parameters (from the trial effects). If the number of studies (M) is large, it may lead to inconsistent estimators. Cox and Snell [15] concluded for the unconditional likelihood that if the number of studies (M) is large and the number of individual observations (
) is small then it makes estimation inaccurate and inconsistent. The conditional maximum likelihood approach was introduced for the model to eliminate all nuisance parameters. In making a choice between the two approaches, it is needed to consider the number of studies and the number of individual observations. However, the use
Table 4. The results of the treatment effects for the model using the conditional method.
of this method can be expensive in term of the cost of computer running time, especially if the number of individual observations is large. Using the unconditional maximum likelihood approach, note that if the number of studies is large and the number of individual observations is small then the estimate may be biased or misleading.
Some other methods can be used in the logistic regression model, for example, using a pseudo-loglikelihood, (see Severini [27] ); or the modified profile likelihood (see Bellio and Sartori [28] , Hamza, Houwelingen and Stijnen [29] and Caterina et al. [19] ). Gaussian-Hermite quadrature was used to calculate the integral forms of the probabilities including random effects in the likelihood functions for both approaches [30] . The approximation is reasonably effective for low-order integrations [31] . Implementing Gaussian-Hermite approximation, we used the function “gauss.quad” in the software R to estimate MLEs for the model. The number of integrands depends on the number of treatments involved in those studies. If this number is large then it makes the dimensionality of the integral large and so it cannot be approximated accurately. Other approximations such as Laplace approximation or Monte Carlo method can be used, see Shi and Copas [32] and Caterina [19] . Laplace approximation could make the calculation of second-order derivatives for the observed Fisher information matrix easier than using Gaussian approximation since there is no weight term in the approximation [33] [34] . Additionally, simulation studies could be conducted in further work to compare these two approaches in different scenarios including improving the approximation if multi-arm trials are more than three treatment comparisons.
The established binomial tree method could be applied in the further work [24] .
8. Conclusion
Simulation studies could be conducted in further work to compare these two approaches in different scenarios including improving the approximation if multi-arm trials are more than three treatment comparisons.
Acknowledgements
The paper was partially supported for publication by Faculty of Public Health, Mahidol University, Thailand.