Marginal Distribution Plots for Proportional Hazards Models with Time-Dependent Covariates or Time-Varying Regression Coefficients

Given a sample of regression data from (Y, Z), a new diagnostic plotting method is proposed for checking the hypothesis H0: the data are from a given Cox model with the time-dependent covariates Z. It compares two estimates of the marginal distribution FY of Y. One is an estimate of the modified expression of FY under H0, based on a consistent estimate of the parameter under H0, and based on the baseline distribution of the data. The other is the Kaplan-Meier-estimator of FY, together with its confidence band. The new plot, called the marginal distribution plot, can be viewed as a test for testing H0. The main advantage of the test over the existing residual tests is in the case that the data do not satisfy any Cox model or the Cox model is mis-specified. Then the new test is still valid, but not the residual tests and the residual tests often make type II error with a very large probability.

1. Introduction

In this paper, we propose a new diagnostic plotting method for the proportional hazards (PH) model [1] with continuous survival time $Y$ , which may be right censored, and with possible time-dependent covariates $Z$ or time-varying re- gression coefficients $\beta$ . The new method has some advantages over the existing methods in the literature, especially when the data do not satisfy any PH model or when the PH model is mis-specified. It provides an alternative diagnostic plot for model checking.

Denote the conditional hazard function and survival function of $Y$ for given $Z$ by ${h}_{Y|Z}\left(\cdot |\cdot \right)$ and ${S}_{Y|Z}\left(\cdot |\cdot \right)$ , respectively, or simply $h\left(\cdot |\cdot \right)$ and $S\left(\cdot |\cdot \right)$ . Let $\left({Y}_{1},{Z}_{1}\right),\cdots ,\left({Y}_{n},{Z}_{n}\right)$ , be i.i.d. copies of $\left(Y,Z\right)$ with the distribution function ${F}_{Y,Z}$ . The PH model was first defined by $h\left(t|z\right)=\mathrm{exp}\left({\beta }^{\prime }z\right){h}_{o}\left(t\right)$ , where ${h}_{o}$ is the baseline hazard function, $z$ is a $p×1$ covariate vector, $\beta$ is a $p×1$ para- meter vector, ${h}_{o}$ and $\beta$ are unknown, and $p$ does not depend on $n$ . The model is referred as the time-independent covariate PH (TIPH) model. This model has been extended in two ways: 1) the covariate is time-dependent, i.e., $z=z\left(t\right)$ is a function of time $t$ ; 2) the regression coefficient is time-varying, i.e., $\beta =\beta \left(t\right)$ is a function of time $t$ . For the time-dependent covariates PH (TDPH) model, Kalbfleisch and Prentice [2] distinguish the external ones from the internal ones.

It is often that the external time-dependent covariate ${Z}_{i}$ can be written as ${Z}_{i}={Z}_{i}\left(t\right)=\mathcal{B}\left({U}_{i},{g}_{i}\left(t\right)\right)$ , where $\mathcal{B}\left(\cdot ,\cdot \right)$ is a function, ${U}_{1},\cdots ,{U}_{n}$ are i.i.d. co- pies from the time-independent random vector $U$ and ${g}_{i}\left(\cdot \right)$ is a function of time t. A simple example of $\mathcal{B}\left(U,g\left(t\right)\right)$ is $w\left(U\right)g\left(t\right)$ , where $w\left(\cdot \right)$ is a function not depending on time $t$ . Without loss of generality (WLOG), we can assume $w\left(U\right)=U$ . Two simple examples of ${g}_{i}\left(t\right)$ are (a) ${g}_{i}\left(t\right)=1\left(t\ge {a}_{i}\right)$ and (b) ${g}_{i}\left(t\right)=1\left(t\ge {a}_{i}\right)\left(t-{a}_{i}\right)$ , where ${a}_{i}$ is a constant but may depend on subject $i$ [1] . The PH model in case (b) is referred as the linearly-dependent PH model (LDPH model) hereafter. We also consider other forms of ${g}_{i}\left(\cdot \right)$ for the external case. For the time-varying regression coefficients PH model, in addition to the assumptions made on the TDPH model, one further assumes $h\left(t|z\left(t\right)\right)={e}^{\beta {\left(t\right)}^{\prime }z\left(t\right)}{h}_{o}\left(t\right)$ . A special case of this model is the piecewise PH (PWPH) model with $k$ cut-points ${a}_{1},\cdots ,{a}_{k}$ [3] .

An important step in the data analysis under the PH model is to check whether the model is indeed appropriate for the data. To this end, it is desirable to have some diagnostic plotting methods for the PH model. In the literature, some diagnostic plotting methods under the semi-parametric set-up are designed to inspect whether the data follow the TIPH model $h\left(t|z\right)={h}_{o}\left(t\right)\mathrm{exp}\left({\beta }^{\prime }z\right)$ , (i.e., $g\left(t\right)$ is constant). One well-known diagnostic method for the PH model is the log-minus-log plots (log-log plots).

Several other graphical methods using residuals to check the PH model assumption have been proposed in the literature [4] . By plotting the estimator of the cumulative hazard functions evaluated at each Cox-Snell residuals against the residuals, one expects a straight line pattern if the data set satisfies the PH model. However, as mentioned by Baltazar-Aban and Pena [5] this plotting method has serious defect. This method may fail to identify the poor fitting in many circumstances. Martingale residuals and its cumulative sums are in- troduced to examine the functional form of the covariate [4] [6] [7] . By plotting the Martingale residuals without a covariate, say ${Z}_{1}$ , against this missing covariate, one can observe the functional form of ${Z}_{1}$ . And by plotting the score process versus follow-up time, one can examine the violation of the proportional hazards assumption. Scheike and Martinussen [8] later extend this method to the time-varying model. Deviance residuals [9] are then introduced. These methods provide more symmetric plots and are useful in identifying the outliers. Moreover, one may use Schoenfeld residuals or the cumulative sums form [10] . The residual plotting methods also induce several residual tests for the PH models.

We provide a new diagnostic plotting method for the PH model. The main idea is to plot the Kaplan-Meier estimator (KME) of ${S}_{Y}$ against a proper estimator of the marginal distribution of $Y$ under the selected model. Thus it is called the marginal distribution (MD) plot. The MD plot can be described as a 5-step procedure: 1) Fit the Cox model you have in mind to obtain the regression coefficients. 2) Choose a reference value for the covariate $Z$ , say ${z}_{o}$ , such that there exist many observations in its neighborhood, say $\mathcal{N}\left({z}_{o}\right)$ . 3) Estimate the survival function of $Y$ for $Z={z}_{o}$ using $\mathcal{N}\left({z}_{o}\right)$ . 4) Use the estimator in 3) to estimate the marginal survival function of $Y$ . 5) Compare the estimator of the marginal survival function in 4) with the KME of ${S}_{Y}$ .

The paper is organized as follows. In Section 2, we propose the MD plot and other supplementary diagnostic plots. In Section 3, we present simulation results on the performance of the plot. We also compare the MD plot to the current residual plots. In Section 4, we apply the new diagnostic plot to the long-term breast cancer follow-up data analyzed in Wong et al. [11] . Section 5 is a concluding remark, where we explain that the MD plot can also be served as a naive hypothesis test for the PH models.

2. The Main Results

The assumption and notations are given in $§$ 2.1. The idea of the marginal approach is introduced in $§$ 2.2. The method is explained in $§$ 2.3 and $§$ 2.4.

2.1. Assumptions and Notations

Let ${\Theta }_{ph}$ be the collection of all PH models specified by

${h}_{Y|Z}\left(t|z\left(t\right)\right)={e}^{\beta {\left(t\right)}^{\prime }z\left(t\right)}{h}_{o}\left(t\right)$ (1)

where $\beta \left(t\right)$ and $z\left(t\right)$ are now possible vectors of functions of $t$ [12] . For model checking under the semi-parametric set-up, the null hypothesis is

${H}_{0}:\text{thedataarefromModel}\left(1\right)\text{with}Z\left(t\right)=\mathcal{B}\left(U,g\left(t\right)\right)\text{andgiven}\mathcal{B}\left(\cdot ,\cdot \right)\text{and}g\left(\cdot \right),$ (2)

where $U$ is a $p$ -dimensional random covariate vector and the baseline hazard function ${h}_{o}\left(\cdot \right)$ is unknown. Let $\Theta$ be the collection of all possible joint dis- tribution functions ${F}_{Y,U}$ of $\left(Y,U\right)$ . Notice that ${F}_{Y,U}$ does not need to belong to ${\Theta }_{ph}$ . Abusing notation, by $z\left(t\right)=ug\left(t\right)$ , we mean that $ug\left(t\right)=\left({u}_{1}{g}_{1}\left(t\right),\cdots ,{u}_{p}{g}_{p}\left(t\right)\right)=D\left({g}_{1}\left(t\right),\cdots ,{g}_{p}\left(t\right)\right){\left({u}_{1},\cdots ,{u}_{p}\right)}^{\prime }$ , where $D\left({g}_{1}\left(t\right),\cdots ,{g}_{p}\left(t\right)\right)$ is a $p×p$ diagonal matrix with diagonal elements ${g}_{i}\left(t\right)$ ’s.

Our method involves the mode, say a vector $c\in {ℝ}^{p}$ , of the distribution of the random vector U. That is, $\forall ϵ>0$ and $\forall \eta \in {ℝ}^{p}$ , $pr\left(‖U-c‖<ϵ\right)\ge pr\left(‖U-\eta ‖<ϵ\right)$ , where $‖\cdot ‖$ is a norm, e.g., $‖U‖={\mathrm{max}}_{i}|{U}_{i}|$ , where $U=\left({U}_{1},\cdots ,{U}_{p}\right)$ .

Proposition 1. If $\left(Y,U\right)$ satisfies Model (1), then for each $c\in {ℝ}^{p}$ , ${h}_{Y|W}\left(t|w\left(t\right)\right)={h}_{1}\left(t\right)\mathrm{exp}\left\{\beta {\left(t\right)}^{\prime }w\left(t\right)\right\}$ , where ${h}_{1}\left(t\right)={h}_{o}\left(t\right)\mathrm{exp}\left\{\beta {\left(t\right)}^{\prime }c\right\}$ and $W\left(t\right)=Z\left(t\right)-c$ .

In view of Proposition 1, hereafter, WLOG, we can assume that

AS1. The zero vector 0 is a mode of $U$ and it satisfies that $‖\mathcal{B}\left(0,g\left(t\right)\right)‖=0$ .

Otherwise, let ${U}_{o}$ be a mode of $U$ and define $W\left(t\right)=Z\left(t\right)-c$ , where $c=Z\left({U}_{o}\right)$ . Then by Proposition 1, Model (1) is equivalent to another PH model, where ${h}_{1}$ takes place of the role of the baseline hazard function ${h}_{o}$ and ${h}_{1}\left(t\right)={h}_{Y|W}\left(t|0\right)$ is also unknown.

Since ${S}_{Y|U}$ may not satisfy the PH model in the null hypothesis ${H}_{0}$ (see (2)), one can define a new conditional survival function of a new response variable, say ${Y}^{\ast }$ , for given Z such that ${S}_{{Y}^{\ast }|Z}$ satisfies the PH model in the null hypothesis ${H}_{0}$ . Correspondingly, one can define the new marginal survival function of ${Y}^{\ast }$ , say ${S}_{{Y}^{\ast }}$ . That is,

${S}_{{Y}^{\ast }}\left(t\right)=E\left({S}_{{Y}^{\ast }|U}\left(t|U\right)\right),\text{where}{S}_{{Y}^{\ast }|U}\left(t|u\right)=\mathrm{exp}\left\{-{\int }_{0}^{t}{e}^{\beta {\left(x\right)}^{\prime }\mathcal{B}\left(u,g\left(x\right)\right)}{h}_{o}\left(x\right)\text{d}x\right\}.$ (3)

Notice that ${h}_{o}$ and ${S}_{o}$ are equivalent if ${h}_{o}$ exists. Abusing notation, write ${S}_{{Y}^{\ast }|U}\left(t|u\right)=S\left(t,u,{S}_{o},\beta \right)$ . Notice that ${S}_{{Y}^{\ast }}$ is a function of the unknown para- meter $\beta$ . Given the distribution function ${F}_{Y,U}$ , if $\beta$ is not a function of $t$ , then $\beta$ is the almost sure limit of the maximum partial likelihood estimator (MPLE) of $\beta$ under ${H}_{0}$ (see Example 1 below), otherwise, it is conceivable that $\beta \left(t\right)$ is some limiting point of the estimator of $\beta \left(t\right)$ [12] .

Example 1. Assume that ${F}_{Y,Z}$ is a uniform distribution in the region ${A}_{1}\cup {A}_{2}$ , where ${A}_{1}$ is the set bounded by the four straight lines $y=0$ , $y=1$ , $x-y=0$ and $x-y=-1$ , and ${A}_{2}$ is the set bounded by $y=0$ , $y=1$ , $x=3$ and $x=4$ . Then the family of distributions $\left\{{S}_{Y|Z}\left(\cdot |z\right):z\in \left(-1,1\right)\cup \left(3,4\right)\right\}$ does not satisfy the PH model, and ${S}_{Y}\left(t\right)={S}_{o}\left(t\right)=1-t$ for $t\in \left[0,1\right]$ . If one fits the TIPH model ${H}_{0}:{h}_{Y|Z}\left(t|z\right)={h}_{o}\left(t\right)\mathrm{exp}\left(\beta z\right)$ , with data from ${F}_{Y,Z}$ without knowing ${F}_{Y,Z}$ , then ${S}_{{Y}^{\ast }}\left(t\right)=E\left({\left(1-t\right)}^{{e}^{\beta Z}}\right)$ by (3), where $\beta \approx -0.045$ , which is the limit of the MPLE based on the random sample from ${F}_{Y,Z}$ under ${H}_{0}$ . ${S}_{{Y}^{\ast }}$ and ${F}_{{Y}^{\ast },Z}$ are uniquely determined by ${F}_{Y,Z}$ and ${H}_{0}$ .

Lemma 1. If ${S}_{Y|U}$ does not follow the model defined in ${H}_{0}$ (see (2)), then (a) ${S}_{{Y}^{\ast }|U}\overline{)=}{S}_{Y|U}$ , and (b) ${S}_{o}\left(t\right)={S}_{{Y}^{\ast }|U}\left(t|0\right)={S}_{Y|U}\left(t|0\right)$ . Otherwise, (a) ${S}_{{Y}^{\ast }|U}={S}_{Y|U}$ , (b) ${S}_{o}\left(t\right)={S}_{{Y}^{\ast }|U}\left(t|0\right)={S}_{Y|U}\left(t|0\right)$ and (c) ${S}_{{Y}^{\ast }}={S}_{Y}$ .

The proof of Lemma 1 is trivial and is skipped.

2.2. The Marginal Distribution Plot

Motivated by Lemma 1, the new plotting method we propose here is to plot an estimator of ${S}_{{Y}^{\ast }}$ , say ${\stackrel{^}{S}}_{{Y}^{\ast }}$ , against ${\stackrel{^}{S}}_{Y}$ (the KME of ${S}_{Y}$ ) together with the $95%$ confidence band of ${\stackrel{^}{S}}_{Y}$ , and

$\begin{array}{l}\text{tocheckwhetherthegraphsof}y={\stackrel{^}{S}}_{Y}\left(x\right)\text{and}y={\stackrel{^}{S}}_{{Y}^{\ast }}\left(x\right)\text{areclose}\\ \left(\text{e}\text{.g}\text{.},\text{whether}y={\stackrel{^}{S}}_{{Y}^{\ast }}\left(x\right)\text{iswithinoroutsidethe95%confidencebandof}{\stackrel{^}{S}}_{Y}\right),\end{array}$

where ${\stackrel{^}{S}}_{{Y}^{\ast }}\left(t\right)={\sum }_{i=1}^{n}{\stackrel{^}{S}}_{{Y}^{\ast }|U}\left(t|{u}_{i}\right)/n$ , ${\stackrel{^}{S}}_{{Y}^{\ast }|U}\left(t|u\right)=S\left(t,u,{\stackrel{^}{S}}_{o},\stackrel{^}{\beta }\right)$ (see (3)), $\stackrel{^}{\beta }$ is a

consistent estimator of $\beta$ under ${H}_{0}$ , and ${\stackrel{^}{S}}_{o}$ is a consistent estimator of ${S}_{o}={S}_{Y|U}\left(\cdot |0\right)$ under the assumption that ${F}_{Y,U}\in \Theta$ , even if the data do not satisfy the pre-assumed PH model in ${H}_{0}$ (see Remark 1). In the latter case, it is conceivable that ${\stackrel{^}{S}}_{{Y}^{\ast }}$ gives a close image of ${S}_{{Y}^{\ast }}$ If the graphs of ${\stackrel{^}{S}}_{Y}$ and ${\stackrel{^}{S}}_{{Y}^{\ast }}$ are close, then it suggests that ${H}_{0}$ in (2) is true. We thus call this plot the marginal distribution plot.

Since the main issue of the MD plot is the estimator ${\stackrel{^}{S}}_{{Y}^{\ast }}$ and it is not trivial to find ${\stackrel{^}{S}}_{{Y}^{\ast }}$ due to ${h}_{o}\left(\cdot \right)$ in (3), the main focus of the paper is to introduce how to construct ${\stackrel{^}{S}}_{{Y}^{\ast }}$ . We shall explain in details how to obtain ${\stackrel{^}{S}}_{{Y}^{\ast }}$ through various $g\left(t\right)$ . We also give ${\stackrel{^}{S}}_{{Y}^{\ast }}$ for the general piecewise continuous $g\left(t\right)$ or $\mathcal{B}\left(\cdot ,g\left(t\right)\right)$ .

For simplicity, we shall first explain our method when $U$ or $Z$ is a univariate covariate and $z\left(t\right)=\mathcal{B}\left(u,g\left(t\right)\right)=ug\left(t\right)$ . We introduce the generalization to the case of a covariate vector (or matrix) in Remark 3 and to the time-dependent model for general $\mathcal{B}\left(u,g\left(t\right)\right)$ in $\text{§}$ 2.3.4.

Suppose that $\left({Y}_{1},{U}_{1},{C}_{1}\right),\cdots ,\left({Y}_{n},{U}_{n},{C}_{n}\right)$ are i.i.d. copies from a random vector $\left(Y,U,C\right)$ , where $Y$ may be subject to right censoring by censoring variable $C$ . The success of the MD plot relies on the proper estimators of ${S}_{o}$ , ${S}_{{Y}^{\ast }|U}\left(t|u\right)$ and ${S}_{{Y}^{\ast }}$ . Denote them by ${\stackrel{^}{S}}_{o}$ , ${\stackrel{^}{S}}_{{Y}^{\ast }|U}\left(t|u\right)$ and ${\stackrel{^}{S}}_{{Y}^{\ast }}$ , respectively. Since 0 is a mode of the covariate $U$ by assumption AS1, a consistent estimator of ${S}_{o}$ is the KME, denoted by ${\stackrel{^}{S}}_{o}$ , based on the data satisfying $|{U}_{i}|<{ϵ}_{n}$ , where ${ϵ}_{n}=r{n}^{-1/{k}_{o}}$ with ${k}_{o}>1$ (e.g., ${ϵ}_{n}=r{n}^{-1/\left(3p\right)}$ ), and $r$ is a given positive number (e.g., the inter-quartile-range or the standard deviation of ${U}_{i}$ ’s). WLOG, let the first ${n}_{o}$ observations be all the observations satisfying $|{U}_{i}|<{ϵ}_{n}$ . If ${Y}_{i}$ ’s are right censored, then the KME ${\stackrel{^}{S}}_{o}$ is based on $\left({U}_{i},{Y}_{i}\wedge {C}_{i},{\delta }_{i}\right)$ , $i=1,\cdots ,{n}_{o}$ , where ${\delta }_{i}=1\left({Y}_{i}\le {C}_{i}\right)$ . For ease of explanation, we only consider the case of complete data in this section. The extension to the right censored data is straightforward and we present simulation results with right-censored data in Section 3. For the complete data, the KME of ${S}_{o}$ based on the first ${n}_{o}$ ob- servations is

${\stackrel{^}{S}}_{o}\left(t\right)=\frac{1}{{n}_{o}}\underset{i=1}{\overset{{n}_{o}}{\sum }}1\left({Y}_{i}>t\right)=\frac{{\sum }_{i=1}^{n}1\left({Y}_{i}>t\right)1\left(|{U}_{i}| (4)

${\stackrel{^}{S}}_{{Y}^{\ast }|U}\left(t|u\right)$ will be introduced in details in $\text{§}$ 2.3 and

${\stackrel{^}{S}}_{{Y}^{\ast }}\left(t\right)={\sum }_{i=1}^{n}{\stackrel{^}{S}}_{{Y}^{\ast }|U}\left(t|{u}_{i}\right)/n$ .

Remark 1. Since ${\stackrel{^}{S}}_{o}\left(t\right)$ is a kernel estimator, it is well known that under certain regularity conditions, ${\stackrel{^}{S}}_{o}\left(t\right)$ converges to ${S}_{o}\left(t\right)$ in probability and

$\frac{1}{n}{\sum }_{i=1}^{n}S\left(t,{U}_{i},{\stackrel{^}{S}}_{o},\beta \right)$ converges to ${S}_{{Y}^{\ast }}\left(t\right)$ in probability for each given $\beta$ ,

and $\forall {F}_{Y,U}\in \Theta$ . If the given PH model holds, then ${S}_{Y}\left(t\right)={S}_{{Y}^{\ast }}\left(t\right)$ and we expect that the graphs of ${\stackrel{^}{S}}_{Y}\left(t\right)$ and ${\stackrel{^}{S}}_{{Y}^{\ast }}\left(t\right)$ are close, as ${\stackrel{^}{S}}_{Y}$ and ${\stackrel{^}{S}}_{{Y}^{\ast }}$ are consistent estimators of ${S}_{Y}$ and ${S}_{{Y}^{\ast }}$ , respectively. Otherwise it is likely that ${S}_{Y}\left(t\right)\ne {S}_{{Y}^{\ast }}\left(t\right)$ and two curves $y={\stackrel{^}{S}}_{Y}\left(x\right)$ and $y={\stackrel{^}{S}}_{{Y}^{\ast }}\left(x\right)$ are apart.

Remark 2. One may wonder whether ${\stackrel{^}{S}}_{o}$ in (4) can be replaced by the existing estimators of the baseline survival function under the PH model, denoted by ${\stackrel{˜}{S}}_{o}$ . For instance, under the TIPH model, several consistent estimators of ${S}_{o}$ , say ${\stackrel{˜}{S}}_{o}$ , can be obtained from the standard statistical packages. For example, the Breslow estimator of baseline survival function can be obtained by applying $\text{survfit}\text{.coxph}\left(\right)$ in R. However, if the given TIPH model does not satisfy the data, ${\stackrel{˜}{S}}_{o}$ is inconsistent, whereas ${\stackrel{^}{S}}_{o}$ given in (4) is still consistent. In fact, given a joint distribution of a random vector $\left(Y,U\right)$ , if it does not satisfy the TIPH model, there exists at least one pair of survival function ${S}_{1}$ and

vector $b$ satisfying $E\left[{\left\{{S}_{1}\left(t\right)\right\}}^{\mathrm{exp}\left(bU\right)}\right]={S}_{Y}\left(t\right)$ , e.g., $\left({S}_{1},b\right)=\left({S}_{Y},0\right)$ . It means

that the estimator ${\stackrel{˜}{S}}_{Y}$ using ${\stackrel{˜}{S}}_{0}$ will always suggest that the given TIPH model fits the regression data even though the data set may not satisfy the TIPH model, and hence ${\stackrel{˜}{S}}_{0}$ is not a proper choice. Simulation study in $\text{§}$ 3.1 suggests that under the assumption in Example 1, ${\stackrel{˜}{S}}_{Y}$ converges to ${S}_{Y}\left(t\right)$ in probability and using ${\stackrel{˜}{S}}_{Y}$ fails to identify the wrong model assumption.

2.3. Estimation of $S\left(t,z,{S}_{o},\beta \right)$

We shall first illustrate the main idea through three typical cases when $\beta$ is a constant: 1) $g\left(t\right)=1$ i.e., the TIPH model, 2) $g\left(t\right)=\left(1\left(t i.e., the PWPH model, 3) $g\left(t\right)=\left(t-a\right)1\left(t\ge a\right)$ i.e., the LDPH model. We also discuss the general case of $g\left(t\right)$ .

1) Case of $g\left(t\right)=1$ , i.e., the TIPH model. Since $S\left(t,u,{S}_{o},\beta \right)={\left\{{S}_{o}\left(t\right)\right\}}^{\mathrm{exp}\left(\beta u\right)}$ by (3), define

#Math_282# (5)

where $\stackrel{^}{\beta }$ is a consistent estimator of $\beta$ under the selected PH model, e.g, the MPLE.

Remark 3. Even though $U$ in (4) and (5) is a random variable, it is easy to extend to the case that $U$ is a vector. Assume $U=\left({U}_{1},\cdots ,{U}_{p}\right)$ is a $p$ - dimensional random vector, $n$ is much larger than $p$ and $U$ is bounded. We can define $‖U‖={\mathrm{max}}_{1\le i\le p}|{U}_{i}|$ or $\sqrt{{\sum }_{i=1}^{p}{U}_{i}^{2}}$ . In the simulation study, we know the mode of $U$ . In applications, the sample mode is not well defined. We choose a “sample mode” of ${U}_{i}$ ’s, say $c\in {ℝ}^{p}$ as follows: 1) Select a proper radius r and choose points $q$ such that in a neighborhood of $q$ with rasius r, ${\mathcal{N}}_{r}\left(q\right)$ , there are more than 20 (or ${n}^{1-\frac{1}{3p}}$ ) observations. Of course, we do not want $r$ to be too large. 2) Among these points $q$ , choose the one, say $c$ , with the largest number of observations in ${\mathcal{N}}_{r}\left(c\right)$ among all ${\mathcal{N}}_{r}\left(q\right)$ . 3) Then set ${U}_{i}^{\ast }={U}_{i}-c$ , $i=1,\cdots ,n$ , (in view of Proposition 1).

2) Case of $g\left(t\right)=\left(1\left(t , i.e., the PWPH model with one cut point. Now $\beta ={\left({\beta }_{1},{\beta }_{2}\right)}^{\prime }$ and

$S\left(t,u,v,{S}_{o},\beta \right)={\left({S}_{o}\left(a\right)\right)}^{\left\{\mathrm{exp}\left({\beta }_{1}u\right)-\mathrm{exp}\left({\beta }_{2}v\right)\right\}1\left(t\ge a\right)}{\left({S}_{o}\left(t\right)\right)}^{\mathrm{exp}\left\{{\beta }_{1}u1\left(t

The covariate $z\left(t\right)$ in (2) is $z\left(t\right)=D\left(1\left(t , where $D$ is a 2 ´ 2 diagonal matrix. Then ${\stackrel{^}{S}}_{Y|U,V}\left(t|u,v\right)=S\left(t,u,v,{\stackrel{^}{S}}_{o},\stackrel{^}{\beta }\right)$ and

#Math_315# (6)

where ${a}_{i}$ may depend on the $i$ -th observation. And $g\left(t\right)={\left(1\left(t corresponds to a special case of the PWPH model with one cut-point. For the PWPH model with more than one cut-point, the derivation of ${\stackrel{^}{S}}_{{Y}^{\ast }}$ is similar. Typically for the PWPH model with two cut-points,

$h\left(t|u,r,v\right)=\mathrm{exp}\left\{{\beta }^{\prime }z\left(t\right)\right\}{h}_{o}\left(t\right)=\mathrm{exp}\left\{{\beta }_{1}u1\left(t ,

where $\beta ={\left({\beta }_{1},{\beta }_{2},{\beta }_{3}\right)}^{\prime }$ , the covariate $z\left(t\right)=D\left(1\left(t , and $D$ is a 3 ´ 3 diagonal matrix. Then (3) yields

$S\left(t,u,r,v,{S}_{o},\beta \right)=\left(\begin{array}{ll}{\left({S}_{o}\left(t\right)\right)}^{\mathrm{exp}\left({\beta }_{1}u\right)}\hfill & \text{if}t\in \left(-\infty ,a\right)\hfill \\ {\left({S}_{o}\left(a\right)\right)}^{\mathrm{exp}\left({\beta }_{1}u\right)}{\left(\frac{{S}_{o}\left(t\right)}{{S}_{o}\left(a\right)}\right)}^{\mathrm{exp}\left({\beta }_{2}r\right)}\hfill & \text{if}t\in \left[a,b\right)\hfill \\ {\left({S}_{o}\left(a\right)\right)}^{\mathrm{exp}\left({\beta }_{1}u\right)}{\left(\frac{{S}_{o}\left(b\right)}{{S}_{o}\left(a\right)}\right)}^{\mathrm{exp}\left({\beta }_{2}r\right)}{\left(\frac{{S}_{o}\left(t\right)}{{S}_{o}\left(b\right)}\right)}^{\mathrm{exp}\left({\beta }_{3}v\right)}\hfill & \text{if}t\in \left[b,\infty \right),\hfill \end{array}$

${\stackrel{^}{S}}_{{Y}^{\ast }}\left(t\right)=\underset{i=1}{\overset{n}{\sum }}S\left(t,{U}_{i},{R}_{i},{V}_{i},{\stackrel{^}{S}}_{o},\stackrel{^}{\beta }\right)/n,$

where $\stackrel{^}{\beta }$ is an estimator of $\beta$ .

3) Case of $g\left(t\right)=1\left(t\ge a\right)\left(t-a\right)$ , i.e., the LDPH model. In this situation, $S\left(t,u,{S}_{o},\beta \right)$ is not a simple form in terms of ${S}_{o}\left(t\right)$ . Let ${\stackrel{^}{S}}_{o}$ be defined as in (4) and let $a={b}_{0}<{b}_{1}<\cdots <{b}_{k}$ be the discontinuous points of ${\stackrel{^}{S}}_{o}\left(t\right)$ for $t>a$ , we propose to estimate $S\left(t,u,{S}_{o},\beta \right)$ and ${S}_{{Y}^{\ast }}\left(t\right)$ by

${\stackrel{^}{S}}_{{Y}^{\ast }|U}\left(t|u\right)=\left(\begin{array}{ll}{\stackrel{^}{S}}_{o}\left(t\right)\hfill & \text{if}t<{b}_{1}\hfill \\ {\stackrel{^}{S}}_{o}\left(a\right){{\prod }_{i=1}^{j}\left(\frac{{\stackrel{^}{S}}_{o}\left({b}_{i}\right)}{{\stackrel{^}{S}}_{o}\left({b}_{i-1}\right)}\right)}^{{e}^{\left({b}_{i}-a\right)u\stackrel{^}{\beta }}}\hfill & \text{if}t\in \left[{b}_{j},{b}_{j+1}\right)\hfill \end{array}$ (7)

and ${\stackrel{^}{S}}_{{Y}^{\ast }}\left(t\right)={\sum }_{i=1}^{n}{\stackrel{^}{S}}_{{Y}^{\ast }|U}\left(t|{U}_{i}\right)/n$ . The reason is as follows. Notice that ${\stackrel{^}{S}}_{o}$

defined in (4) is a step function. It has the same consistency property as

$\begin{array}{l}{\stackrel{⌣}{S}}_{o}\left(t\right)=\left(\begin{array}{ll}{\stackrel{^}{S}}_{o}\left(t\right)\hfill & \text{if}t\le a\hfill \\ {\stackrel{^}{S}}_{o}\left(a\right)\mathrm{exp}\left(-{\int }_{0}^{t}{\stackrel{⌣}{h}}_{o}\left(x\right)\text{d}x\right)\hfill & \text{if}t>a,\hfill \end{array}\\ \text{where}{\stackrel{⌣}{h}}_{o}\left(t\right)=1\left(t\in \left({a}_{i},{b}_{i}\right]\right){h}_{i},\text{}{h}_{i}=\frac{1}{ϵ}\mathrm{ln}\frac{{\stackrel{^}{S}}_{o}\left({b}_{i-1}\right)}{{\stackrel{^}{S}}_{o}\left({b}_{i}\right)}\text{for}t\in \left({b}_{i-1},{b}_{i}\right],\end{array}$ (8)

${a}_{i}={b}_{i}-ϵ$ , $i=1,2,\cdots ,k$ , $ϵ=\mathrm{min}\left\{|{b}_{j}-{b}_{j-1}|:j=1,\cdots ,k\right\}/n$ , and ${\stackrel{⌣}{S}}_{o}\left(t\right)={\stackrel{^}{S}}_{o}\left(t\right)$ at the discontinuous points of ${\stackrel{^}{S}}_{o}\left(t\right)$ . Since

#Math_346# (9)

$\begin{array}{l}S\left(t,u,{\stackrel{⌣}{S}}_{o},\stackrel{^}{\beta }\right)\\ =\left(\begin{array}{l}\begin{array}{l}{\stackrel{^}{S}}_{o}\left(t\right)\\ \text{if}t\le {a}_{1}\text{or}u=0\end{array}\hfill \\ \begin{array}{l}{\stackrel{^}{S}}_{o}\left(a\right)\mathrm{exp}\left(-{\sum }_{i=1}^{j}{\stackrel{⌣}{h}}_{i}\frac{\mathrm{exp}\left\{\left({a}_{i}-a\right)u\stackrel{^}{\beta }\right\}}{u\stackrel{^}{\beta }}\left[\mathrm{exp}\left\{u\stackrel{^}{\beta }\left({b}_{i}-{a}_{i}\right)\right\}-1\right]\right)\\ \text{if}u\ne 0,t\in \left[{b}_{j},{a}_{j+1}\right),j\le k\end{array}\hfill \\ \begin{array}{l}{\stackrel{^}{S}}_{o}\left(a\right)\mathrm{exp}\left(-{\sum }_{i=1}^{j}{\stackrel{⌣}{h}}_{i}\frac{\mathrm{exp}\left\{\left({a}_{i}-a\right)u\stackrel{^}{\beta }\right\}}{u\stackrel{^}{\beta }}\left[\mathrm{exp}\left\{u\stackrel{^}{\beta }\left(t-{a}_{i}\right)\right\}-1\right]\right)\\ \text{if}u\ne 0,t\in \left[{a}_{j},{b}_{j}\right),j\le k.\end{array}\hfill \end{array}\end{array}$

Write ${\stackrel{⌣}{S}}_{{Y}^{\ast }|U}\left(t|u\right)=S\left(t,u,{\stackrel{⌣}{S}}_{o},\stackrel{^}{\beta }\right)$ . At the discontinuous points $t$ of ${\stackrel{^}{S}}_{o}$ ,

$\begin{array}{c}{\stackrel{⌣}{S}}_{{Y}^{\ast }|U}\left(t|u\right)={\stackrel{^}{S}}_{o}\left(a\right)\mathrm{exp}\left(-\underset{i=1}{\overset{j}{\sum }}{\stackrel{⌣}{h}}_{i}\frac{\mathrm{exp}\left\{\left({a}_{i}-a\right)u\stackrel{^}{\beta }\right\}}{u\stackrel{^}{\beta }}\left[\mathrm{exp}\left\{u\stackrel{^}{\beta }\left({b}_{i}-{a}_{i}\right)\right\}-1\right]\right)\\ \approx {\stackrel{^}{S}}_{o}\left(a\right)\mathrm{exp}\left(-\underset{i=1}{\overset{j}{\sum }}{\stackrel{⌣}{h}}_{i}\mathrm{exp}\left\{\left({b}_{i}-a\right)u\stackrel{^}{\beta }\right\}\left({b}_{i}-{a}_{i}\right)\right)\\ ={\stackrel{^}{S}}_{o}\left(a\right)\underset{i=1}{\overset{j}{\prod }}{\left(\frac{{\stackrel{^}{S}}_{o}\left({b}_{i}\right)}{{\stackrel{^}{S}}_{o}\left({b}_{i-1}\right)}\right)}^{\mathrm{exp}\left\{\left({b}_{i}-a\right)u\stackrel{^}{\beta }\right\}}\text{if}u\ne 0,t={b}_{j},j\le k\left(\text{by}\left(8\right)\right).\end{array}$

Define ${\stackrel{^}{S}}_{{Y}^{\ast }|U}\left(t|u\right)=\left(\begin{array}{ll}{\stackrel{^}{S}}_{o}\left(t\right)\hfill & \text{if}t<{b}_{1}\hfill \\ {\stackrel{^}{S}}_{o}\left(a\right){{\prod }_{i=1}^{j}\left(\frac{{\stackrel{⌣}{S}}_{o}\left({b}_{i}\right)}{{\stackrel{⌣}{S}}_{o}\left({b}_{i-1}\right)}\right)}^{\mathrm{exp}\left\{\left({b}_{i}-a\right)u\stackrel{^}{\beta }\right\}}\hfill & \text{if}t\in \left[{b}_{j},{b}_{j+1}\right).\hfill \end{array}$ It

leads to (7). ${\stackrel{^}{S}}_{{Y}^{\ast }|U}\left(t|u\right)$ is simpler than ${\stackrel{⌣}{S}}_{{Y}^{\ast }|U}\left(t|u\right)$ in implementation and they have the same asymptotic properties. Simulation results in section 3 suggest that ${\stackrel{^}{S}}_{{Y}^{\ast }}$ is consistent under the selected PH model.

4) The case of other forms of $g\left(t\right)$ or $\mathcal{B}\left(\cdot ,g\left(t\right)\right)$ . It can be shown that the estimator ${\stackrel{^}{S}}_{{Y}^{\ast }|U}\left(t|u\right)$ in the previous three cases of $g\left(t\right)$ are all in the form that it is a step function with discontinuous points ${b}_{1},\cdots ,{b}_{m}$ , which are as the same as those of ${\stackrel{^}{S}}_{o}$ , and

${\stackrel{^}{S}}_{{Y}^{\ast }|U}\left(t|u\right)=\left(\begin{array}{ll}{\stackrel{^}{S}}_{o}\left(t\right)\hfill & \text{if}t<{b}_{1}\hfill \\ {\stackrel{^}{S}}_{o}\left(a\right){{\prod }_{i=1}^{j}\left(\frac{{\stackrel{⌣}{S}}_{o}\left({b}_{i}\right)}{{\stackrel{⌣}{S}}_{o}\left({b}_{i-1}\right)}\right)}^{\mathrm{exp}\left\{\mathcal{B}\left(u,g\left({b}_{i}\right)\right)\stackrel{^}{\beta }\right\}}\hfill & \text{if}t\in \left[{b}_{j},{b}_{j+1}\right),\hfill \end{array}$ (10)

where ${b}_{j}$ ’s are defined in the case of $g\left(t\right)=\left(t-a\right)1\left(t\ge a\right)$ . It can be shown that if $\mathcal{B}\left(u,g\left(t\right)\right)$ is piece-wise continuous in $t$ , then it also leads to a con- sistent estimator of ${S}_{{Y}^{\ast }}\left(t\right)$ .

2.4. Supplementary Diagnostic Plots

The MD plot needs to know $g\left(t\right)$ . One possible way to conjecture the form of the function $g\left(t\right)$ for a time-dependent covariate is to extend the PWPH plotting method in Wong et al. [11] as follows:

#Math_370# (11)

$\text{where}{\stackrel{^}{S}}_{1}\left(t\right)=\frac{{\sum }_{i=1}^{n}1\left({Y}_{i}>t,|{Z}_{i}-{z}_{1}| (12)

$r$ is a positive constant and ${z}_{1}$ belongs to the support of $Z$ . For instance, for a PWPH model defined in (6),

$\mathrm{ln}{\stackrel{^}{S}}_{1}\left(t\right)=\left(\begin{array}{ll}\mathrm{exp}\left({\stackrel{^}{\beta }}_{1}u\right)\mathrm{ln}{\stackrel{^}{S}}_{o}\left(t\right)\hfill & \text{if}t

which corresponds to two lines: $y={b}_{1}x$ and $y={a}_{2}+{b}_{2}x$ , and the cut-points can be determined from the PWPH plot (see Figure 3 in Section 3).

Let ${b}_{1}<\cdots <{b}_{m}$ be all the distinct exact observations. In view of the ex- pression in (7) under the PH model with $g\left(t\right)=\left(t-a\right)1\left(t\ge a\right)$ , if ${b}_{i}\ge a$ , then

$\mathrm{ln}\frac{{S}_{{Y}^{\ast }|U}\left({b}_{i+1}|z\right)}{{S}_{{Y}^{\ast }|U}\left({b}_{i}|z\right)}\approx \mathrm{exp}\left\{\left({b}_{i+1}-a\right)z\beta \right\}\mathrm{ln}\frac{{S}_{o}\left({b}_{i+1}\right)}{{S}_{o}\left({b}_{i}\right)},$

$\mathrm{ln}\frac{{\stackrel{^}{S}}_{1}\left({b}_{i+1}\right)}{{\stackrel{^}{S}}_{1}\left({b}_{i}\right)}\approx \mathrm{exp}\left\{\left({b}_{i+1}-a\right){z}_{1}\stackrel{^}{\beta }\right\}\mathrm{ln}\frac{{\stackrel{^}{S}}_{o}\left({b}_{i+1}\right)}{{\stackrel{^}{S}}_{o}\left({b}_{i}\right)},$

${\stackrel{^}{G}}_{i}=\mathrm{ln}\left[\left\{\mathrm{ln}\frac{{\stackrel{^}{S}}_{1}\left({b}_{i+1}\right)}{{\stackrel{^}{S}}_{1}\left({b}_{i}\right)}\right\}/\left\{\mathrm{ln}\frac{{\stackrel{^}{S}}_{o}\left({b}_{i+1}\right)}{{\stackrel{^}{S}}_{o}\left({b}_{i}\right)}\right\}\right]\left(\begin{array}{ll}=0\hfill & \text{if}{b}_{i}

where ${\stackrel{^}{S}}_{1}$ and ${z}_{1}$ are given in (11) and (12). Thus another diagnostic plotting method is

$\text{toplot}\left({b}_{i},{G}_{i}\right),\text{for}i\in \left\{1,\cdots ,m\right\}$ (13)

and check whether it appears as a two-piecewise-linear curve: one is $y=0$ and another one is $y=\left(x-a\right)b$ . If so, it is likely that $g\left(t\right)=\left(t-a\right)1\left(t\ge a\right)$ , where $a$ is the intersection of the two line segments. We shall call the plotting method the LDPH plot, as $g\left(t\right)=\left(t-a\right)1\left(t\ge a\right)$ corresponding to the LDPH model. The advantage of the PWPH plot and the LDPH plot is that they provide clues on the cut-points, which are needed in the MD plot, unless the cut-point is given.

Remark 4. If the cut-points in the PWPH or TDPH model vary from observation to observation, then the PWPH plot as in (11) and LDPH plot as in (13) do not work. However the cut-points ${a}_{i}$ are also observations in such cases, in addition to $\left({Y}_{i},{U}_{i}\right)$ ’s (in the case of complete data). Thus we do not need to guess the cut-points, and one can replace ${\stackrel{^}{S}}_{j}$ in (12) by

${\stackrel{^}{S}}_{j}\left(t\right)=\frac{{\sum }_{i=1}^{n}1\left({Y}_{i}>t,|{U}_{i}-{z}_{j}|<{r}_{1}{n}^{-1/3},|{a}_{i}-a|<{r}_{2}{n}^{-1/\left(3p\right)}\right)}{{\sum }_{l=1}^{n}1\left(|{U}_{l}-{z}_{j}|<{r}_{1}{n}^{-1/3},|{a}_{l}-a|<{r}_{2}{n}^{-1/\left(3p\right)}\right)},$

where $a$ is a predetermined reference cut-point and ${r}_{1}$ and ${r}_{2}$ are positive constants.

3. Simulation Study

In order to compare the MD plot to the other plots, we present three sets of simulation results in $\text{§}$ 3.1, $\text{§}$ 3.2 and $\text{§}$ 3.3, respectively. The mode is 0 as assumed in AS1. ${S}_{o}$ can be uniform or exponential distributions. The covariate can be discrete or continuous. The data do not need to be from a PH model. There is no unit in time $t$ due to the simulation.

3.1. Simulation Study under the Assumptions in Example 1

Two random samples of $n=30$ and $n=300$ pseudo random numbers ${Y}_{i}$ are generated from U(0,1) distribution. For each ${Y}_{i}={y}_{i}$ , generate ${Z}_{i}$ from $U\left({y}_{i}-1,{y}_{i}\right)$ with probability 0.5 and from $U\left(3,4\right)$ with probability 0.5. These $\left({Y}_{i},{Z}_{i}\right)$ satisfy the assumptions given in Example 1. Let ${W}_{i}=1\left({Z}_{i}\ge 3\right)$ . Then the family of distributions $\left\{{S}_{{Y}_{1}|{Z}_{1}}\left(\cdot |z\right):z\in \left(-1,1\right)\cup \left(3,4\right)\right\}$ does not satisfy the null hypothesis ${H}_{0}:h\left(t|z\right)={h}_{o}\left(t\right)\mathrm{exp}\left(\beta z\right)$ , but it can be shown that $\left\{{S}_{{Y}_{1}|{W}_{1}}\left(\cdot |z\right):z\in \left\{0,1\right\}\right\}$ does.

The sample of size $n=300$ is only used for the MD plots in panels (1,3) and (3,3) of Figure 1 with data $\left({Y}_{i},{Z}_{i}\right)$ ’s and data $\left({Y}_{i},{W}_{i}\right)$ ’s respectively.

3.1.1. The MD Plots Successfully Identify the Violation of the TIPH Model Assumption When Data Are (Yi,Zi)’s

Since ${S}_{Y|Z}$ does not satisfy the PH model, a proper estimate of ${S}_{{Y}^{\ast }}$ is expected to deviate from ${\stackrel{^}{S}}_{Y}$ . It is seen from the two MD plots with data $\left({Y}_{i},{Z}_{i}\right)$ ’s in panels (1,2) and (1,3) of Figure 1 that (a) when $n=30$ , ${\stackrel{^}{S}}_{{Y}^{\ast }}$ lies most of the time outside or around the edge of the $95%$ confidence band of ${\stackrel{^}{S}}_{Y}$ ; (b) when $n=300$ , ${\stackrel{^}{S}}_{{Y}^{\ast }}$ is totally outside the confidence band of ${\stackrel{^}{S}}_{Y}$ . The MD plots suggest that our estimator ${\stackrel{^}{S}}_{{Y}^{\ast }}$ appears quite different from ${\stackrel{^}{S}}_{Y}$ even when $n=30$ and they suggest that the data $\left({Y}_{i},{Z}_{i}\right)$ , $i=1,2,\cdots ,n$ are not from the TIPH Model, as expected.

3.1.2. The MD Plots Correctly Support the TIPH Model for (Yi,Wi)’s

Since ${S}_{Y|W}$ satisfies the PH model, a proper estimate of ${S}_{{Y}^{\ast }}$ should be close to ${\stackrel{^}{S}}_{Y}$ . It is seen from both of the MD plot with data $\left({Y}_{i},{W}_{i}\right)$ ’s in panels (3,2) and (3,3) of Figure 1 that the curves of ${\stackrel{^}{S}}_{{Y}^{\ast }}$ corresponding to $n=30$ and $n=300$ both lie within the confidence bands of ${\stackrel{^}{S}}_{Y}$ . It is a strong evidence that the data $\left({Y}_{i},{W}_{i}\right)$ ’s are from the TIPH model, as expected.

3.1.3. About ${\stackrel{˜}{S}}_{{Y}^{\ast }}$ Defined in Remark 2

To show that the consistent estimator of ${S}_{o}$ is the key in the MD approach, we

Figure 1. Diagnostic plots based on Example 1.

present in panels (1,1) and (3,1) of Figure 1 the plots of the curves of “ph est” ( ${\stackrel{˜}{S}}_{{Y}^{\ast }}$ ) and the curves of ${\stackrel{^}{S}}_{Y}$ , together with its confidence band, with data $\left({Y}_{i},{Z}_{i}\right)$ ’s and $\left({Y}_{i},{W}_{i}\right)$ ’s, respectively. It is seen that the curves of “ph est” and ${\stackrel{^}{S}}_{Y}$ almost coincide for both data sets even when $n=300$ . These plots suggest that the curve of “ph est” is always close to the curve of the KME , regardless of whether the pre-assumed model fits the data. Thus ${\stackrel{˜}{S}}_{Y}$ is not a proper choice for the diagnostic plot.

In panels (2,1) and (4,1) of Figure 1, the residual plots using the score process by the cumulative martingale residuals method are plotted with data $\left({Y}_{i},{Z}_{i}\right)$ ’s and $\left({Y}_{i},{W}_{i}\right)$ ’s, respectively. There is no big difference in pattern between these two plots. Notice that ${S}_{Y|Z}$ does not satisfy the TIPH model but ${S}_{Y|W}$ does. Thus this residual plot can not tell whether the data satisfy the TIPH model in this example.

In panels (2,2) and (4,2) of Figure 1, the deviance residuals are plotted with data $\left({Y}_{i},{Z}_{i}\right)$ ’s and $\left({Y}_{i},{W}_{i}\right)$ ’s, respectively. In panels (2,3) and (4,3) of Figure 1, the scaled Schoenfeld residuals are plotted with data $\left({Y}_{i},{Z}_{i}\right)$ ’s and $\left({Y}_{i},{W}_{i}\right)$ ’s, respectively. There is no big difference in pattern between these two pairs of plots. Thus these two types of residual plots can not tell whether the data satisfy the TIPH model in this example.

We also carried out simulation study on the testing ${H}_{0}:h\left(t|z\right)={h}_{o}\left(t\right)\mathrm{exp}\left({\beta }^{\prime }z\right)$ with data $\left({Y}_{i},{Z}_{i}\right)$ ’s and using the residual test in the existing R package. Our simulation study suggests that for $n=50$ , 100 or 200 and with a replication of 5000, the residual test does not reject the incorrect ${H}_{0}$ for more than $93%$ of the time. Thus, it is not surprised that the residual plots do not work well.

3.2. Simulation Based on Data from the TIPH Model

A sample of complete data with $n=300$ is generated from the TIPH model: $h\left(t|z\right)=\mathrm{exp}\left(\beta {z}^{2}\right){h}_{o}\left(t\right)$ , where $\beta =1$ , ${h}_{o}\left(t\right)=1$ and $Z\sim \text{Norm}\left(0,1\right)$ . Panels (1,1), (1,2), (1,3) and (3,1) in Figure 2 presents four different plots by fitting the data set into the TIPH model $h\left(t|z\right)=\mathrm{exp}\left(\beta z\right){h}_{o}\left(t\right)$ . Thus the covariate is mis-specified.

The second sample of complete data with size $n=300$ is generated from the model , $h\left(t|z\right)=\mathrm{exp}\left(\beta z\right){h}_{o}\left(t\right)$ , where $\beta =1$ , $Z\sim \text{Norm}\left(0,1\right)$ and ${h}_{o}\left(t\right)=1$ , $t>0$ . Panels (2,1), (2,2), (2,3) and (4,1) in Figure 2 presents 4 plots by fitting the data set into the TIPH model $h\left(t|z\right)=\mathrm{exp}\left(\beta z\right){h}_{o}\left(t\right)$ . Thus the covariate is correctly specified.

3.2.1. Residuals Plots

Panels (1,1) and (2.1) in Figure 2 display the residual plots using the score process by the cumulative martingale residuals method by fitting data from the first and the second TIPH model respectively into $h\left(t|z\right)=\mathrm{exp}\left(\beta z\right){h}_{o}\left(t\right)$ . It is

Figure 2. Diagnostic plots based on data from Model: $h\left(t|z\right)={h}_{o}\left(t\right){e}^{\beta z}$ or $h\left(t|z\right)={h}_{o}\left(t\right){e}^{\beta {z}^{2}}$ .

not easy to distinguish the mis-specified model from the correct one by this pair of residual plots.

Similarly, in Figure 2, Panels (3,1) and (4,1) display the scaled Schoenfeld residual plots for fitting the first and second samples, respectively, into the TIPH model $h\left(t|z\right)=\mathrm{exp}\left(\beta z\right){h}_{o}\left(t\right)$ . Moreover, Panels (1,2) and (2,2) present the log-log plots in a similar pattern. Again, those methods do not provide enough information about the overall fitting of the models.

3.2.2. Residuals Tests

In fact, with the data from the first mis-specified model and with a moderate sample sizes $n\ge 50$ , our simulation study with a replication of 5000 suggests that the residual test in the existing R package (e.g., $\text{cox}\text{.zph}\left(\right)$ ) would not reject the mis-specified TIPH model for more than $70%$ of the time. Thus it is not surprised that the residual plots cannot detect the mis-specified TIPH model.

3.2.3. MD Plots

The MD plot in panel (1,3) fits the mis-specified TIPH model with data from the first sample. It successfully identifies that the functional form of the covariates $Z$ is mis-specified for the first data set, as ${\stackrel{^}{S}}_{{Y}^{\ast }}$ is almost totally outside the $95%$ confidence band of ${\stackrel{^}{S}}_{Y}$ . In other words, the MD approach suggests that the first data set does not follow the PH model $h\left(t|z\right)=\mathrm{exp}\left(\beta z\right){h}_{o}\left(t\right)$ . On the other hand, the MD plot in panel (2,3) successfully identifies that the functional form of the covariates $Z$ is correct for the second data set, as ${\stackrel{^}{S}}_{{Y}^{\ast }}$ is totally inside the $95%$ confidence band of ${\stackrel{^}{S}}_{Y}$ .

Based on the second sample, the modified PWPH plot and the LDPH plot are displayed in panels (3,2) and (4,2); the MD plots under the PWPH and LDPH Models are displayed in panels (3,3) and (4,3). The PWPH plot in panel (3,2) suggests that the data are either from a TIPH Model, or from a PWPH model with one cut-point at $a\approx 0.7$ . The MPLE of the regression coefficient under the TIPH Model is $\stackrel{^}{\beta }=1.26$ and the MPLE of the regression coefficients under the PWPH model with $k=1$ is $\left({\stackrel{^}{\beta }}_{1},{\stackrel{^}{\beta }}_{2}\right)=\left(1.30,1.20\right)$ with $SE=\left(0.170,0.136\right)$ . Both ${\stackrel{^}{\beta }}_{1}$ and ${\stackrel{^}{\beta }}_{2}$ are not significantly different from $\stackrel{^}{\beta }$ , as their differences from $\stackrel{^}{\beta }$ are within two SEs. Both MD plots in panels (2,3) and (3,3) suggest that the PWPH model with at most one cut-point fits the data, as expected, as both curves of ${\stackrel{^}{S}}_{{Y}^{\ast }}$ are totally inside the $95%$ confidence band of ${\stackrel{^}{S}}_{Y}$ .

The LDPH plot in panel (4,2) suggests that the LDPH Model may fit the data, but it is seen from panel (4,3) that even within the interval [0,1], only less than 30% of the curve of ${\stackrel{^}{S}}_{{Y}^{\ast }}$ lies inside the confidence band of ${\stackrel{^}{S}}_{Y}$ . Thus the MD plot suggests that the data are not from the LDPH Model, as expected. It is seen that the LDPH plot performs not as good as the MD plot.

3.3. Simulation on the LDPH Model

A sample of $n=300$ right-censored data is generated under the LDPH Model, where $h\left(t|z\right)=\mathrm{exp}\left(\beta ug\left(t\right)\right){h}_{o}\left(t\right)$ , $g\left(t\right)=\left(t-a\right)1\left(t\ge a\right)$ , $a=0.2$ , and ${h}_{o}\left(t\right)={e}^{-t}$ . $U$ has a Poisson distribution with mean 1, and $\beta =10$ . It is subject to right censoring and the right censoring variable $C\sim U\left(1,2\right)$ .

The modified PWPH plots and the LDPH plot are given in panels (1,1), (2,1) and (3,1) in Figure 3. The MD plots and the qqplots for the TIPH, PWPH and LDPH Models are given in the second and the third columns of Figure 3.

Both the PWPH plot and the MD plots with corresponding qqplot in panels (1,1), (1,2) and (1,3) suggest that the data are not from the TIPH Model. We

Figure 3. Diagnostic plots based on data from the LDPH Model.

need the information from the PWPH plot to decide the cut-point needed in the MD plots. The PWPH plot in panel (2,1) suggests that the data may be from a PWPH model with a cut-point $a$ satisfying $-\mathrm{ln}S\left(a|0\right)\approx 0.05$ , that is, $a\approx 0.1$ . However, the MD plot in panel (2,2) suggests that the data are not from the PWPH model. This again indicates that the MD plot performs better than the modified PWPH plot.

The LDPH plot in panel (3,1) in Figure 3 suggests that the data are from the LDPH Model with the cut-point $a$ in $\left(0.2,0.4\right)$ . The MD plot and the cor- responding qqplot in panels (3,2) and (3,3) suggest that the data are from the LDPH Model, as expected. Thus the LDPH plot has the advantage that it can suggest the cut-point $a$ in the case that $a$ is not given in the LDPH Model. Since $U$ is discrete in this case, the PWPH plot and the LDPH plot perform better than those in the previous simulation studies when $U$ is continuous.

4. Data Analysis

A common situation that will involve the use of the PH model is a long-term clinical follow-up study. In such a study, the impact of a prognostic variable may change at different time periods. This is the case in the breast cancer data analyzed in Wong et al. [11] . The data are obtained from an Institutional Review Board approved long-term clinical follow-up study on 371 women with stages I-III unilateral invasive breast cancer treated by surgery at Memorial Sloan- Kettering Cancer Center in New York City between 1985 and 2001. The median follow-up time of the study is 7.4 years (range is 1 month-180 months (14.8 years)), which is the longest among published studies on bone marrow micro- metastasis (BMM).

One objective of the study is to investigate whether tumor diameter is significant in predicting early or late relapse. Then the relapse time $Y$ is the response and the tumor diameter $Z$ is the covariate. Clinical consideration and survival plots suggest late failure can be considered at time greater than 5 years from initial breast cancer surgery. Data analysis based on a PWPH model with two cut-points at 2 years and 5 years with covariate $Z$ is carried out in Wong et al. [11] . We apply our diagnostic plotting methods to check whether such a PWPH model fits our data in the case that $Z$ is continuous and $Z$ is discretized as a dichotomized variable ( $>2$ cm or $\le 2$ cm), as is done in Wong et al. [11] . In the latter case, we try the log-log plots as well.

We apply our data to the TIPH model with covariate $Z$ (which is basically continuous), the regression coefficient is $\stackrel{^}{\beta }=0.35$ and is significant. We shall only present our new methods for the case that the tumor diameter $Z$ is continuous in panel (2,1) of Figure 4. The modified log-log plots in panel (1,1) seems to suggest that the TIPH Model fits the data, but the MD plot in panel (2,1) suggests that the TIPH Model does not fit our data, as the curve of ${\stackrel{^}{S}}_{{Y}^{\ast }}$ lies almost totally outside the $95%$ confidence band of ${\stackrel{^}{S}}_{Y}$ . Thus the partial like- lihood estimate $\stackrel{^}{\beta }=0.35$ is not valid.

Wong et al. [11] suggest to do data analysis by dichotomizing $Z$ according

Figure 4. Diagnostic plots for cancer data with the tumor diameter measurements.

to $Z>2\text{cm}$ or $Z\le 2\text{cm}$ . Then the covariate is discrete, and the standard log-log plots, as well as the PWPH plot proposed by Wong et al. [11] , is applicable. These two plots are given in panels (1,2) and (1,3) of Figure 4. In view of the graph, the log-log plots suggest that the TIPH Model does not fit the discretized data. Moreover, it does not present useful information on the other possible PH models. However, the PWPH plot in panel (1,3) does suggest that the PWPH model with two cut-points close to 1.5 years and 6 years fits the discretized data. Wong et al. [11] suggest to do data analysis using a PWPH model with the cut-points at 2 and 5 years (which are close to 1.5 and 6). Thus we further compute the partial likelihood MLE for model with $g\left(t\right)=\left(1\left(t<2\right),1\left(t\in \left[2,5\right)\right),1\left(t\ge 5\right)\right)$ . The estimates ${\stackrel{^}{\beta }}_{1}$ , ${\stackrel{^}{\beta }}_{2}$ and ${\stackrel{^}{\beta }}_{3}$ are −1.76, 2.02 and 0.34, with p-value 0.77, 0.001 and 0.52, respectively. In panels (2,2) and (2,3) of Figure 4, we provide the MD plots based on the discrete and continuous covariates, respectively, for fitting the PWPH model. In particular, since the curve of ${\stackrel{^}{S}}_{{Y}^{\ast }}$ lies totally inside the $95%$ confidence band of ${\stackrel{^}{S}}_{Y}$ in panel (2,2), the MD plot suggests that the PWPH model considered in Wong et al. [11] fits the discretized data. Thus their data analysis is valid. However, the MD plot in panel (2,3) indicates that the original data set with the continuous covariate $Z$ does not fit a PWPH model, as the curve of ${\stackrel{^}{S}}_{{Y}^{\ast }}$ lies totally on one edge of the $95%$ confidence band of ${\stackrel{^}{S}}_{Y}$ .

5. Concluding Remarks

Our simulation results and the data analysis suggest that the MD plot has certain advantages over the existing residual plots, especially when the null hypothesis ${H}_{0}$ in (2) is mis-specified or the data are not from any PH model. Our MD plot does not involve residuals studied in the literature, and this is the first difference between the residual approaches and the MD approach.

5.1. Drawback and Remedy of the MD Plot

The MD approach is closely related to ${H}_{0}$ in (2) with $Z\left(t\right)=\mathcal{B}\left(U,g\left(t\right)\right)$ , where the parameter $\beta$ is unknown but $\mathcal{B}\left(\cdot ,g\left(t\right)\right)$ is given. The MD plot is applicable to all PH models with $Z\left(t\right)=\mathcal{B}\left(U,g\left(t\right)\right)$ and with all types of covariates $U$ , provided that $\mathcal{B}\left(\cdot ,g\left(t\right)\right)$ is given. The assumption of a given $\mathcal{B}\left(\cdot ,g\left(t\right)\right)$ may be viewed as a drawback of the MD plot if one wants to find a certain PH model to fit the data. There are several ways to overcome this drawback.

1) For the case that $\mathcal{B}\left(u,g\left(t\right)\right)=ug\left(t\right)$ , we also propose a modified PWPH plot and LDPH plot in $\text{§}$ 2.4 for inferring the functional form of $g\left(t\right)$ .

2) One can apply the MD approach to several possible typical models. For instance, in $\text{§}$ 3.2 and $\text{§}$ 3.3, given a data set, the MD approach finds which of the 3 semi-parametric PH models fits.

3) Of course, one can also make use of the existing residual approaches in the literature for guessing $g\left(t\right)$ . There is no harm to inspect all diagnostic plots available based on the data.

On the other hand, the MD plot can further check the validity of the function forms suggested by the existing residual plots. As illustrated in the paper, with the help of the confidence band of the KME ${\stackrel{^}{S}}_{Y}$ , it is more reliable and more informative than the residual plots on whether the model suggested by the residual plots is appropriate for the data. Thus the MD plot is a nice complement to the existing diagnostic methods, not a replacement to them.

5.2. A Naive but Valid Model Test

As seen from the simulation results, when the data are not from any PH model (see Example 1 in $\text{§}$ 3.1) or ${H}_{0}$ is mis-specified (see simulation on the TIPH Model in $\text{§}$ 3.2), the MD plots can successfully reject ${H}_{0}$ , provided that $n$ is large enough $\left(n\ge 30\right)$ . Thus the MD plot can play a role of a model checking test ${\varphi }_{md}$ , that is,

$\left(\begin{array}{ll}\text{ }\text{reject}{H}_{0}\text{ }\hfill & \text{if}{\stackrel{^}{S}}_{{Y}^{\ast }}\text{liesinsidetheconfidenceband}\left(\text{case}\left(I\right)\right)\text{ }\hfill \\ \text{ }\text{donotreject}{H}_{0}\text{ }\hfill & \text{if}{\stackrel{^}{S}}_{{Y}^{\ast }}\text{liesoutsidetheconfidenceband}\left(\text{case}\left(II\right)\right)\text{ }\hfill \\ \text{ }\text{inconclusive}\text{ }\hfill & \text{ }\text{otherwise}\left(\text{case}\left(III\right)\right).\text{ }\hfill \end{array}$

On the contrary, in such case, our simulation results suggest that the residual plots often cannot detect that ${H}_{0}$ is incorrect and the residual tests often do not reject ${H}_{0}$ . There must be some reasons.

As summarized in Therneau and Grambsch [13] “Interestingly, nearly all of the tests for proportional hazards that have been proposed in the literature are $T\left(G\right)$ tests” “and differ only in their choice of the time transform g(t)”. The parameter space ${\Theta }_{r}$ for the residual approach contains all distributions that follow Model (1) with $\gamma z\left(t\right)=\beta \mathcal{B}\left(U,t\right)+\theta {\mathcal{B}}_{2}\left(U,t\right)$ , where $\gamma =\left(\beta ,\theta \right)$ . For instance, in the case of Example 1, let $\gamma z\left(t\right)=\beta u+\theta ut$ . So a residual test is to test ${H}_{0}^{r}:\theta =0$ v.s. ${H}_{1}^{r}:\theta \overline{)=}0$ , instead of testing ${H}_{0}$ in (2) v.s. ${H}_{1}:{H}_{0}$ is not true. The parameter space ${\Theta }_{r}$ for testing ${H}_{0}^{r}$ v.s. ${H}_{1}^{r}$ is a subset of the para- meter space $\Theta$ for testing ${H}_{0}$ v.s. ${H}_{1}$ , where $\Theta$ is the collection of all ${F}_{Y,U}$ , where $Y$ is continuous. When the data are not from any PH model or ${\Theta }_{r}$ is mis-specified, the residual tests are not valid. Moreover, in that case, it is some- what true that $\theta =0$ most of the time, and thus when ${H}_{0}$ is not rejected, it is likely to make type II error. In real applications, we do not know whether the data are indeed from some PH model, or whether we select a correct sub-model of the PH model.

Remark 5. In summary, if the existing residual tests reject ${H}_{0}$ , the decision is likely to be correct. Otherwise, we have no confidence to believe that the test is valid. In this regard, it is interesting to point out that the parameter space for the aforementioned “test” induced by the MD plot is actually $\Theta$ . Thus the MD approach is valid even if ${F}_{Y,U}\notin {\Theta }_{r}$ . It can also detect the incorrect model assumption ${\Theta }_{r}$ by testing the new null hypothesis ${H}_{0}:{F}_{Y,U}\in {\Theta }_{r}$ when $n$ is not too small, but the residual approach cannot accomplish this task. This is the $\underset{_}{\text{ }\text{second}}$ difference between the MD approach and the residual approach. In application, if ${H}_{0}$ is not rejected, the residual tests often make type II error, and we strongly suggest to consider the MD approach then.

Of course, very often, neither case (I) nor (II) happens, then we need some more rigorous model checking tests. The difference of ${\stackrel{^}{S}}_{Y}$ and ${\stackrel{^}{S}}_{{Y}^{\ast }}$ is a natural choice for a test statistic, and it is one of our on-going research. The other is to extend the MD approach to other common regression models, such as the linear regression model and the generalized additive model.

Acknowledgements

We thank the Editor and the referee for their comments.

Conflicts of Interest

The authors declare no conflicts of interest.

Cite this paper

Yu, Q. , Dong, J. and Wong, G. (2017) Marginal Distribution Plots for Proportional Hazards Models with Time-Dependent Covariates or Time-Varying Regression Coefficients. Open Journal of Statistics, 7, 92-111. doi: 10.4236/ojs.2017.71008.

 [1] Cox, D.R. and Oakes, D. (1984) Analysis of Survival Data. Chapman and Hall, New York. [2] Kalbfleisch, J.D. and Prentice, R.L. (1980) The Statistical Analysis of Failure Time Data. Wiley, New York. [3] Zhou, M. (2001) Understanding the Cox Regression Models with Time-Change Covariates. American Statistician, 55, 153-155. https://doi.org/10.1198/000313001750358491 [4] Therneau, T.M., Grambsch, P.M. and Fleming, T.R. (1990) Martingale-Based Residuals for Survival Models. Biometrika, 77, 147-160. https://doi.org/10.1093/biomet/77.1.147 [5] Baltazar-Aban, I. and Pena, E. (1995) Properties of Hazard-Based Residuals and Implications in Model Diagnostics. Journal of the American Statistical Association, 90, 185-197. https://doi.org/10.1080/01621459.1995.10476501 [6] Barlow, W.E. and Prentice, R.L. (1998) Residuals for Relative Risk Regression. Biometrika, 75, 65-74. https://doi.org/10.1093/biomet/75.1.65 [7] Lin, D.Y., Wei, L.J. and Ying, Z. (1993) Checking the Cox Model with Cumulative Sums of the Martingale-Based Residuals. Biometrika, 81, 557-572. https://doi.org/10.1093/biomet/80.3.557 [8] Scheike, T.H. and Martinussen, T. (2004) On Estimation and Tests of Time-Varying Effects in the Proportional Hazards Model. Scandinavian Journal of Statistics, 31, 51-62. https://doi.org/10.1111/j.1467-9469.2004.00372.x [9] Hastie, T.J. and Tibshirani, R.J. (1990) Generalized Additive Models. Chapman and Hall, New York. [10] Grambsch, P.M. and Therneau, T.M. (1994) Proportional Hazards Tests and Diagnostics Based on Weighted Residuals. Biometrika, 81, 515-526. https://doi.org/10.1093/biomet/81.3.515 [11] Wong, G.Y.C., Osborne, M.P., Diao, Q. and Yu, Q. (2016) Piecewise Cox Models With Right-Censored Data. Communication in Statistics: Simulation and Computation, Online. [12] Tian, L., Zucker, D. and Wei, L.J. (2005) On the Cox Model with Time-Varying Regression Coefficients. Journal of the American Statistical Association, 100, 172-183. https://doi.org/10.1198/016214504000000845 [13] Therneau, T.M. and Grambsch, P.M. (2000) Modeling Survival Data: Extending the Cox Model. Springer, Berlin. https://doi.org/10.1007/978-1-4757-3294-8