Statistical Diagnosis for General Transformation Model with Right Censored Data Based on Empirical Likelihood ()
1. Introduction
Statistical diagnosis developed in the mid-1970s, which is a new statistical branch. In the course of development of the past 40 years, the diagnosis and influence analysis of linear regression model has been fully developed (R. D. Cook and S. Weisberg [1] , Bocheng Wei, Guobin Lu & Jianqing Shi [2] ). Influence diagnostics for the proportional hazards model has been fully developed (L. A. Weissfeld [3] ), for example, the proportional odds model, heteroscedastic linear transformation model, generalized linear transformation model, generalized transformation model and the other survival models.
The empirical likelihood method originates from Thomas & Grunkemeier [4] . Owen [5] first proposed the definition of empirical likelihood and expounded the system info of empirical likelihood. The empirical CDF of
is defined as for. The empirical likelihood of the CDF is. Zhu and Ibrahim [6] utilized this method for statistical diagnostic, they developed diagnostic measures for assessing the influence of individual observations when using empirical likelihood with general estimating equations, and used these measures to construct goodness-of-fit statistics for testing possible misspecification in the estimating equations. Liugen Xue and Lixing Zhu [7] summarized the application of empirical likelihood method.
Many authors have successfully applied empirical likelihood to the analysis of survival data. For example, Qin and Jing [8] investigated empirical likelihood confidence intervals for Cox’s regression models with right censored data; He [9] studied the goodness-of-fit of Cox’s regression models with various types of censored data; Gu et al. [10] considered inferences for Cox’s regression models with time-dependent coefficients; Zhou [11] , Zheng and Yu [12] and Zhou et al. [13] studied empirical likelihood for accelerated failure time models, multivariate accelerated failure time models and heteroscedastic accelerated failure time models respectively. Li et al. [14] overviewed some applications of empirical likelihood in survival analysis; Lu and Liang [15] discussed empirical likelihood procedure based on estimating equations for a class of flexible survival models-lineartransformation models, which includes popular proportional hazard regression models and proportional odds regression models as its special cases. Jianbo Li et al. [16] studied empirical likelihood inference for general transformation models with right censored data.
In this paper, we will consider statistical diagnostic for a class of very general survival models-general transformation models with right censored data in the form of
(1)
where is the conditional survival function of failure time variable given covariate vector; is a completely unspecified baseline survival function when; is a known monotonically increasing function with respect to satisfying and for any and; is a parameter vector including regression coefficients and possible model transformation parameters in. Model (1) includes many popular survival models, for example heteroscedastic linear transformation models, as their special cases. Note that when
where is a survival function, Model (1) reduces to the popular linear transformation models (Clayton and Cuzik [17] ; Dabrowska and Doksum [18] ; Bickel [19] ; Cheng et al. [20] ; Fine et al. [21] ).
So far the diagnosis of the general transformation model with random right censorship based on empirical likelihood method has not yet seen in the literature. This paper attempts to study it. One advantage of this procedure is that it is free of baseline survival function and censoring distribution. The class of models we investigate is also general than previous studies for survival models.
The rest of the paper is organized as follows. Empirical likelihood and estimation equation are presented in Section 2. The main results are given in Section 3 and Section 4. Section 5 contains some simulation studies as well as applications. Conclusions with discussions are given in Section 6.
2. Empirical Likelihood and Estimation Equation
Let be the censoring variable, be the censored event time variable and be the censoring indicator. Suppose are i.i.d. copies of. Denote by the total number of uncensored failure times., the partial ranking among the uncensored failure times and the censored observations between each neighboring pair of uncensored observations. Given the partial ranking and covariate observations, Jianbo Li [16] has proposed the empirical log-likelihood ratio function for can be defined by
where,
,
,
,
.
By Qin and Lawless [22] , Owen [5] , when
the empirical log-likelihood ratio statistic equal to the maximum
where and.
Regard and as independent variable and define
.
Obviously, the maximum empirical likelihood estimates and are the solutions of following equations
.
3. Case-Deletion Influence Measures
Consider Model (1), where the j-th case is deleted.
. (2)
This model is called case-deletion model. Let is the maximum empirical likelihood estimate of in model (2). In order to study the influence of the j-th case, and compare the difference between and
. The important result as follows theorem.
By Zhu, et al. [6] , for model (2), the maximum empirical likelihood estimator of is
, (3)
where,.
3.1. Empirical Cook Distance
Zhu, et al. [6] proposed empirical cook distance. Let M is a nonnegative matrix. The empirical cook distance is defined as follows
(4)
where.
3.2. Empirical Likelihood Distance
Empirical likelihood distance is advanced from the view of data fitting. Considering the influence of deleting the j-th case. In order to eliminate the influence of scale, it is also need to divide the variance of estimator.
Because the keystone is to review the influence of deleting the j-th case. Hence, is substituted by. Then, the W-K statistic can be expressed as follows
(5)
4. Local Influence Analysis of Model
We consider the local influence method for a case-weight perturbation, for which the empirical log-likelihood function is defined by. In this case, , defined to be an
vector with all elements equal to 1, represents no perturbation to the empirical likelihood, because. Thus, the empirical likelihood displacement is defined as
where is the maximum empirical likelihood estimator of based on. Let with and
where is a direction in. Thus, the normal curvature of the influence graph is given by
where in which is a
matrix with -th element given by.
We consider two local influence measures based on the normal curvature as follows. Let
be the ordered eigen values of the matrix and let
be the associated orthonormal basis, that is,. Thus, the spectral decomposition of is given by
.
The most popular local influence measures include, which corresponds the largest eigen value, as well as,where is an vector with j-th component 1 and 0 otherwise. The represents the most influential perturbation to the empirical likelihood function, whereas the observation with a large can be regarded as influential.
As the discuss of Zhu et al. [6] , for the general transformation regression model with random right censorship, we can deduce that
(6)
where,
,
.
5. Numerical Studies
In this section, we simulate data with sample sizes from the follow transformation model
where, , , , , where denotes the Bernoulli distribution and denotes the uniform distribution. For the simulation studies, we will consider three choices of: 1) standard exponential survival function 2). Note that when takes standard exponential survival function and, Model (1) corresponds to the proportional hazard Cox regression model and the proportional odds regression model. For all two models, we will generate censoring times from. By properly choosing values of, we consider three censoring proportions for all the cases (Qian Jun, et al. [23] ). The survival data simulated by software SAS as follows Table 1.
In order to check out the validity of our proposed methodology, we change the response variable value of the third, 20th, 54th, 80th and 99th data.
For every case, it is easy to obtain. For the parameters and, using the samples, we evaluated their maximum empirical likelihood estimators for two models.
Consequently, it is easy to calculate the value of and. The result of is as following figures.
From all figures, we can see that in most cases, the value of are reasonably close to one fixed value. Following the definition and properties of, we can diagnose the strong influence points, the value of which deviate from the average seriously. From Figures 1-3, we can see from the value of that the third, 20th, 54th and 80th data are strong influence point. From Figures 4-6, we can see from the value of that the third, 20th, 54th, 80th and 99th data are strong influence point. Indeed, our proposed approaches are illustrated.
6. Discussion
In this paper, we considered the statistical diagnostic for general transformation models with right censored data based on empirical likelihood. We also studied in detail the method of simulating survival data under three different censored proportions. Through simulation studies, we illustrate that our proposed method can work fairly well.
Zhensheng Huang [24] analyzed empirical likelihood for varying-coefficient single-index model with right censored data. In addition, Zhengsheng Huang [25] studied profile empirical likelihood inferences for the single
Table 1. Survival data (Note: the “star” in top right corner represent censored data).
Continued
Continued
Figure 1. The influence of Model (1).
Figure 2. The influence of Model (1).
Figure 3. The influence of Model (1).
Figure 4. The influence of Model (2).
Figure 5. The influence of Model (2).
Figure 6. The influence of Model (2).
index-coefficient regression model. All of these will be topics for our further research.