A Simulation Study on Comparing General Class of Semiparametric Transformation Models for Survival Outcome with Time-Varying Coefficients and Covariates ()
1. Introduction
In many experimental and observational studies such as randomized clinical trials, agricultural experiments, and engineering and industrial production commonly we obtain time-to-end outcomes so-called survival time or failure time. In biomedical researches, the main concern is usually on the survival time, which is a time from defined origin until the defined endpoint or outcome [1] . The survival data have missing value raised through the censoring mechanisms. Censoring is the problem of not finding the exact time of an event during the experimental or observational studies, which makes the analysis much more complex.
Central to the entire discipline of survival analysis, mostly right censoring exists. Besides, a time-varying covariate is a classical problem in modeling survival time. The semiparametric transformation models which have been attracted by several authors have been an important concept in the study of right censored survival time. The another important concept in analysing survival data is proportionality assumption. Sometimes, in our experimental study, we have no warrantee of the fulfillment of this assumption. Because the effect of covariate may vary over time breaking the proportionality assumption for Cox proportional hazards model of [2] . In this situation, we need to consider the time-varying coefficient to our model. Due to this, the time-dependent effect and time-dependent covariates have been given attentions these days. Generally, someone may need to extend this model to more general model that can incorporate both time-varying covariate and time-varying effect. Thus the combination brings more general version.
A key role of semiparametric transformation models (STM) is that the model provides a framework for deriving the effect of time-varying covariates and the effect of time-varying coefficients on failure time. In this model, since the model consists of different special cases inside, the failure of proportionality assumption might not be much problem.
The remaining part of this paper is organized as follows. Section 2 introduces the methods and model framework which are going to be used in the whole paper and proposes a modified estimating equation for robust semiparametric transformation models. Section 3 presents a large sample theory and regularity conditions for the consistency and asymptotic properties of the proposed estimators. Section 4 devotes simulation studies to check the performance of the proposed techniques. Finally, the conclusion is presented in Section 5.
2. Methods and Model Framework
Here we start with some basic notations that are used throughout this paper.
which is
be the survival time of interest, and
be
vectors of possibly time-varying covariates, where p is number covariates included in the model.
Where the covariate is allowed to vary over time, possibly the furthermost instant tactic is to use the step-function as follows.
(1)
where
is transaction time for change.
Whenever the covariate only changes once at fixed time point and do not change after that, the step function is used. However, in some situations it is common to have covariate that change over time continuously and frequently at a time with the only requirement that the intervals of the observation need not be contiguous. Therefore, in this situation a simple way to code time-dependent covariates is using intervals of time and recorded in to two columns as the start, stop or time 1, time 2, entry, end and so forth. The “tmerge” package in R can do this arrangement in the survival library.
When the censoring time is denoted by C, the failure or censor time represented by
is the minimum of failure time of censoring time and failure time; i.e.;
. We write
for the event indicator. Finally, the summarized n independent random vectors of observations are formulated as
(2)
2.1. The Semiparametric Transformation Models
The flexibility extended general class of semiparametric transformation models with the effect of time-varying coefficients is formulated
(3)
where X is a set of covariates, the set of time-varying regression coefficients or parameters
, where
is a natural logarithm or logarithm of base
and the unspecified continuously differentiable monotone arbitrary transformation function
satisfying
, are unknown and the extraneous random error term
comes from unrestricted well-known parametric distributions
; see [3] . The unspecified monotonically increasing transformation function
is cumulative baseline hazard function satisfying
. Then the conditional distribution is formulated as
(4)
However, the model (3) does not applicable for time-varying covariate. Then, with the extension of time-varying covariates, the special cases of the transformation models consider proportional hazards (PH) model and proportional odds (PO) model. These special models are based on the given distribution of random error term
corresponds to extreme value distribution and the standard logistic distribution respectively [4] [5] [6] .
Let
be the counting process recording the number of events that have occurred by time t and let
be a set of predictors which contains a vector of possibly time-varying covariates. We specify that the cumulative intensity function for
conditional on
and therefore, equivalent formulation of model (3) can be expressed as
(5)
where
is subject-specific cumulative hazards function under completely specified continuous monotonically increasing transformation function
satisfying for
and
. Here the independent identically distributed (i.i.d) random variable
with known distribution is unobservable positive noise associated to random biological features.
For strictly increasing transformation function
, the class of Box-Cox transformations which was recently used by [7] is also considered here. For the two special cases of transformation model classes namely Proportional hazards (PH) and proportional odds (PO) models, we reflect on the Box-Cox transformation functions
(6)
and the class of logarithmic transformation
(7)
Therefore, the choice of
when either
or
, the special case of transformation model indeed yields PH model for survival data. Equivalently, the choice of
for either
or
, the special case of transformation model indeed yields PO model for survival data.
Remark: Specifying the function
while leaving the function
unspecified is equivalent to specifying the distribution of
while leaving the function
unspecified. Non-identifiability arises if both
and
(or both
and
) are unspecified and
( [3] , p. 169) which was quoted by [8] .
The Modified Estimating Equations
Before developing estimating equations, let us impose on the following two unignorable assumptions.
Assumption 1: The parameter space of
of
is bounded open subset of
.
Assumption 2: The random variable censoring time C is the independent of random variable time of failure or event T given possibly time varying covariates
i.e.;
.
To develop the estimating equations to estimate the unknown parameter
and unknown strictly increasing monotonic function
, estimating equations of [5] which has been lately used by several authors for example [9] [10] [11] and [12] is modified for the effect of time-varying coeffcients and time-varying covariates.
In this paper we suppose,
for the r failure times among the n observations. Furthermore, we suppose
and
be the known hazard and cumulative hazard functions of
, respectively. Let us propose the true values of
and
denoted by
and
respectively. Therefore, following the usual counting process notation, let
(8)
are an at-risk indicator process and the distinct ordered uncensored failure times
respectively. Suppose
be the sample analogues of
. Thus, the martingale decomposition can minimize the complexity of the estimation of equations by constructing the following easily tractable formula.
(9)
for complete σ-field
since
(10)
where
,
and
denote the estimators of
and
and the mean of a martingale process with respect to
is zero.
Lemma 1: The mean of the derivative of regular martingale process is zero.
(11)
Thus, slightly modified estimating equations of [5] are proposed by making possibly time-varying covariate under consideration. The two modified estimating equations are
(12)
(13)
where
(14)
is the intensity function for
and
is nondecreasing function satisfying
. Therefore, this requirement in turn ensures that for any finite number k,
.
For the special case when we assume the Cox’s proportional hazards model of [5] in which
, while
, and
,
therefore, by plugging this in (12) we simply obtain
(15)
Someone may use computationally easiest alternative versions of (12) which were first mentioned by [5] and lately by [11] .
Finally, the survival function of T given possibly time-varying covariates
can easily be derived from the model (5) as follows.
(16)
Therefore, the cumulative hazard function is given by
(17)
thus,
(18)
Thus, the true induced intensity (hazard) function for failure time T given possibly time-varying covariates
is the derivative of the true cumulative intensity function of Equation (18) which is defined as
(19)
therefore, to ease the notations without lose of truth, here we propose some representations
(20)
where
(21)
where
.
Now, we set a zero-mean martingale process with respective filtration
of complete
as
(22)
Thus, by imposing at Lemma 1, we modify the estimating Equation (12) and Equation (13) as
(23)
(24)
3. Large Sample Theory and Conditions
Some regularity conditions are necessarily imposed here.
C1: The covariate vectors are bounded in the sense that
for some constant
and the possibly time-varying covariate
has a uniformly bounded variation on
and its left limit exists with any t where
is the maximum follow-up time.
C2: The true value of
denote by
, lies in the interior of a known compact set
in
and the true value of
denote by
is continuously positive differentiable on the closed interval
.
C3: The transformation
is at least thrice continuously differentiable on interval
with
and
, and
, where
denotes kth derivatives of
.
C4:
is a strictly increasing positive function on interval
and
is continuously differentiable.
C5: For any given finite scalar k,
is strictly positive and
is bounded and continuously differentiable on interval
, where the superscript dot always refers derivatives.
C6: Both the variance covariance matrices
and
are nonsingular.
Theorem 1: Under some suitable regularity conditions C1-C6 in order to ensure CLT for counting process martingale holds,
is consistent estimator of the true parameter
, i.e.;
(25)
Thus, similar to [5] [9] [12] and others, the asymptotic variance of estimator
can be estimated consistently by estimating
and
consistently.
Theorem 2: Under some suitable conditions C1-C6,
is consistent under the metric
, where for any two nondecreasing functions
and
on interval
such that
,
(26)
for any fixed
, where
. The following theorem establishes the asymptotic distribution of the estimated distribution.
4. Simulation Study
The data is simulated from Cox model in four cases, such as with fixed covariates, with fixed covariates and time-varying coefficients, with time-varying covariates and with time-varying covariates and time-dependent effects simultaneously. The data was generated by using sim.survdata() under R package “coxed” based on the flexible hazard methods described by [13] . The survival time data with three continous covariates was generated with sample size n = 200 and maximum duration 50 units using sim.survdata(). By default sim.survdata() generates the survival time and three covariates from standard normal distribution. However, we can adjust for other characteristics of covariates from different distributions for fixed covariate case.
Required data structure for time-dependent covariate is technically different from the survival data structure with baseline covariates. The dependent variable for Cox model in survival data can be arranged by using “Surv()” function in survival package of R software. Commonly it has two arguments survival time and a censoring time variables. However, for in the case of time-varying covariates the survival time variable setup is divided in to two sections referring start and end of discrete intervals, which in turn permits a covariate to be measured in different values across different intervals for the same observations. Thus, in the case of time dependent covariates, we set type = “tvc” in “sim.survdata()” function to generated survival time data with time varying covariates. Then the survival durations are generated again using proportional hazards, and are passed to the “permalgorithm()” functionin the “permAlgo” package to generate the time-varying data structure [14] . In the case of time-dependent covariates, the type = “tvc” option of sim.survdata does not allow to use user supplied data for the covariates, as a time-varying covariate is expressed overtime frames which themselves convey part of the variation of the times, and then the time is generated [15] .
The usual proportionality assumptions of Cox proportional hazard model fails when the coefficient effect varies through over time. The data for time-dependent coefficients can similarly generated using sim.survdata() function by setting the type = “tvbeta” option inside the function. Whenever this option sets, the first coefficient, whether coefficients are user-supplied or randomly generated, is interacted with natural logarithm of the time counter from 1 to maximum time T [15] . Then the sim.survdata() function generates survival time from proportional hazards model, and saves the coefficients in designed matrix form to allow their dependence on time. So to generate the data with the time-dependent coefficients we set type = “tvbeta”.
The data for more flexible and general cox model with the time-dependent coefficients and the time-dependent covariates can similarly generated using sim.survdata() function by setting the type = c(“tv”, ”tvbeta”) option inside the function. Finally, semiparametric transformation models are applied for the simulated data. The different models were compared based on their performance in precision.
4.1. Computational Algorithm
Since we have more than one unknown items to be estimated, it is necessary to apply some sophisticated iterative algorithms to handle the iteration problem. Thus, in this paper expectation-maximization (EM) algorithm is proposed to estimate unknown true parameter
and nondecreasing monotone function
. In this concept, it is necessary to fix one of them and estimate the another one and in terms of the fixed one and vice versa. Therefore, as it was done in [5] , it is not difficult to show the unique solution of (12), (13) in H, for every fixed value of
. Consequently, Equation (3) and Equation (5) logically suggest the following iterative algorithms for computing
.
Step 0: Opt an initial value of
, denoted by
.
Step 1: For each
, obtain
and
by solving Equation (12) and Equation (13) by setting
with
. Then obtain
, for
, one-by-one by solving the equation
Step 2: Then obtain new estimate of
by solving (12) with
as obtained in Step 1.
Step 3: Set
to be the estimator obtained in Step 2 and repeat Steps 1 and 2 until prescribed convergence criteria are met based on Equation (12) and Equation (13).
4.2. Numerical Results
This subsection explores the numerical results based on simulation studies through figures and numerical analysis. This numerical result is expected to evaluate the performance of the proposed model.
Figure 1 illustrates about the baseline characteristics of survival data. The top panel of the figure refers the feature of probability density function, cumulative distribution function, hazard function, and cumulative hazard function of failure time. The bottom panel shows the feature of simulated duration in terms of histogram of failure time or duration, linear predictor and exponentiated linear predictors respectively. The left panel of Figure 1 is when the survival data are
(a) (b)
Figure 1. Plots of baseline feature of simulated survival data. (a) Plot with 25% censoring rate; (b) Plot with 45% censoring rate.
assumed to have 25% censoring rate and the right side panel is when the survival data are assumed to have 45% censoring rate.
Table 1 illustrates the results of simulation based on four different cases under special cases of semiparametric transformation models. The result has shown, the performance of the model reduces as censoring rate increases. The standard errors in the bracket indicated the precision level of the estimators. The estimators with small standard errors have high precision. In these simulations, the effect of time-varying coefficient did not improve the model performance. However, the effect of time-varying covariates did improve the performance of the model.
5. Conclusions
The study is basically concerned on comparisons of the semiparametric transformation models with and without the effect of the time on covariates and coefficients. The summary review of other works was done and the result of simulation was included to come up with reasonable review of the study. The data were generated in four different cases under the “sim.survdata()” function of R package called “coxed”. Then the results of semiparametric transformation models for four types of simulation studied were compared based. Three special cases of semiparametric transformation models such as PH, PO and model when r = 0.5 were considered.
The results have shown that the semiparametric transformation models with time-dependent covariates did relatively better perform with small standard errors. However, the effect of time-varying coefficient did not improve the performance of the semiparametric transformation models in our simulation studies. The last
Table 1. Estimates of Regression Coefficients with their respective standard errors in the brackets for Semiparametric Transformation models for n = 200. TCV and TVbeta refers time-varying covariates and time-varying coefficients.
two cases such as the semiparametric transformation models with time-varying covariates and both time-varying covariates and time-varying coefficients have shown better performance. Therefore, we can give the general conclusion that when the proportionality assumption fails to fulfill, incorporating the time-varying coefficient effect in the model is advisable. Considering only baseline covariate may not be always true; because there is the time when the covariate changes throughout the time. Therefore, incorporating time-varying covariate in the model may help us to get reasonable results. Sometimes it can be happened that both covariate and coefficient effect changes over time. Thus, incorporating both time-varying covariates and time-varying coefficients shall give us more reasonable results.