Optimal Weights in Nonparametric Analysis of Clustered ROC Curve Data ()
1. Introduction
In diagnostic trials, clustered data are obtained when several subunits of the same patient are observed. For example, in a study by Masaryk et al. (1991) [2], two radiologists evaluated 65 carotid arteries (left and right) in 36 patients using three-dimensional Magnetic Resonance Angiography(MRA), a potential screeening tool for athe- rosclerosis of the carorid arteries. These patients also underwent intra-arterial digital subtraction angiography (DSA), which is considered the gold standard for characterizing the degree of stenosis. The goals of the study were to evaluate the performance of MRA according to each reader, and to compare the performance for the two radiologists.
In the above example, each patient(cluster) contributes a number of unaffected and affected units. Correlation exists for outcomes between two unaffected units, between two affected units, and between an unaffected and an affected unit from the same cluster, and between the outcomes of the two diagnostic tests from the same cluster. All these correlations need to be taken into account when analyzing such clustered data.
An ROC curve is a plot of a diagnostic test’s sensitivity versus 1-specificity. The curve is constructed by changing the cutpoint that defines a positive diagnostic test result. The area under the ROC curve (AUC) summarizes the test’s overall diagnostic ability and is typically used as a global measure of the accuracy of the diagnostic test.
In the clustered data case, Obuchowski (1997) [1] proposed a nonparametric AUC estimator, and derived an asymptotic variance estimate for the AUC estimator, taking into account of within-cluster correlations. However, Obuchowski’s AUC estimator gives equal weight to all pairwise rankings within and between clusters. Clusters can be different in terms of cluster size, the number of unaffected units, and the number of affected units. In the presence of various within-cluster correlations, these differences would affect the contribution of a cluster to the overall variance of the AUC estimator and hence weights should vary across clusters.
In this paper, we modify Obuchowski’s estimator by allowing the weight assigned to each pairwise ranking to vary across clusters, and derive the optimal weights that minimize the variance of the AUC estimator. Our results in this paper show that the optimal weights depends not only on the within-cluster correlation but also the proportion of clusters that have both unaffected and affected units. More importantly, we show that the gain of efficiency in comparison with two simple weighting schemes can be doubled when there is a large within-cluster correlation and the proportion of clusters that have both unaffected and affected units is small.
The rest of this paper is organized as follows. In Section 2, the optimal weights for one AUC are derived and the estimators of the optimal weights are discussed. The relative asymptotic efficiencies in comparing our optimal estimator with two simple weighting schemes are studied. A data example is presented in Section 3 and conclusions are provided in Section 4.
2. Optimal Weights for Estimating One Auc
2.1. Optimal Weights Derivation
Assume that there are clusters, of which clusters contain only unaffected units, clusters contain both unaffected and affected units, and clusters contain only affected units. The total number of clusters with at least one unaffected unit is given by, and the total number of clusters with at least one affected unit is given by. Without loss of generality, we assume that clusters contain
only unaffected units, clusters contain both unaffected and affected units, and clusters contain only affected units. Let denote the diagnostic test result of the kth unaffected unit in the jth cluster. Similarly, let denote the diagnostic test result of the kth affected unit in the jth cluster.
Let and be the distribution functions of and, respectively. Assume that if the value of or exceeds a predetermined cut-off point the diagnostic test will be considered positive. Then the area under the ROC curve of the diagnostic test is. Obuchowski (1997) [1] proposed a non-parametric estimate for, given by
(1)
where and. This estimate gives equal weight to all pairwise ranking.
Note that can be estimated by
(2)
where is a set of weights assigned to the clusters with at least one unaffected unit satis- fying and. Similarly, can be estimated by
(3)
where is a set of weights assigned to the clusters with at least one affected unit satisfying and. Similar to Emir et al. (2000) [3], two simple weighting schemes can be considered: (1) assigning equal weights to observations, i.e., , when within-cluster correlation is low, and (2) assigning equal weights to clusters, i.e., , when within-cluster correlation is high.
We propose to estimate by
(4)
Notice that when and, our estimator is the same as that in Obuchowski (1997) [1].
To derive our optimal weight, we utilize the following result which can be found in the Appendix of Emir, et al. (2000) [3]:
(5)
where
and if the jth cluster contains at least one unaffected unit and =0 otherwise and if the jth cluster contains at least one affected unit and =0 otherwise. Hence, the variance of is approximately
(6)
Note that
and
Defining the transformation
(7)
we can express the variance of in (6) in terms of and as
(8)
where
and
The optimal weights can be obtained by minimizing (8) with respect to and with constraints, , and. Applying Langrage Multipler Method, we have
(9)
and
(10)
where,
and
2.2. Asymptotic Variance Comparison
Let be the estimated optimal weight, be the estimator of using simple weighting Scheme 1:, and be the estimator of using simple weighting Scheme 2:.
Along the same line of the proofs for (??), (??) and (??), we can show that is approximately normal, and is approximately normal, , with
(11)
(12)
and
(13)
where
and
Let be the asymptotic relative efficiency for comparing with, and be the asymptotic relative efficiency for comparing with. Similar to the case of a single AUC, for the special case where, and Corr, , we have that both and
increases dramatically as increases and decreases, and increases slowly as decreases (Figure 1).
3. Conculsions
We have proposed an optimal nonparametric estimator for one AUC, which modifies Obuchowski’s estimate by allowing different weights for the pairwise rankings within and between cluster. Optimal weights for one AUC has been derived by minimizing the variance of the estimate of one AUC(two AUCs’ difference). Asymptotic performance of the AUC estimate using our optimal weights has been studied in contrast with the two weighting schemes.
We have shown that when there is a moderate within-cluster unaffected-affected units correlation and the proportion of clusters that contain both unaffected and affected units is small, using either of the two weighting schemes, corresponding to Obuchowski’s estimator or the estimator with equal cluster weights, can lead to dramatic efficiency loss. For this situation, the optimal weights are recommended.