Adaptive Conditional Hazard Regression Modeling of Multiple Event Times

Abstract

Recurrent event time data and more general multiple event time data are commonly analyzed using extensions of Cox regression, or proportional hazards regression, as used with single event time data. These methods treat covariates, either time-invariant or time-varying, as having multiplicative effects while general dependence on time is left un-estimated. An adaptive approach is formulated for analyzing multiple event time data. Conditional hazard rates are modeled in terms of dependence on both time and covariates using fractional polynomials restricted so that the conditional hazard rates are positive-valued and so that excess time probability functions (generalizing survival functions for single event times) are decreasing. Maximum likelihood is used to estimate parameters adjusting for right censored event times. Likelihood cross-validation (LCV) scores are used to compare models. Adaptive searches through alternate conditional hazard rate models are controlled by LCV scores combined with tolerance parameters. These searches identify effective models for the underlying multiple event time data. Conditional hazard regression is demonstrated using data on times between tumor recurrence for bladder cancer patients. Analyses of theory-based models for these data using extensions of Cox regression provide conflicting results on effects to treatment group and the initial number of tumors. On the other hand, fractional polynomial analyses of these theory-based models provide consistent results identifying significant effects to treatment group and initial number of tumors using both model-based and robust empirical tests. Adaptive analyses further identify distinct moderation by group of the effect of tumor order and an additive effect to group after controlling for nonlinear effects to initial number of tumors and tumor order. Results of example analyses indicate that adaptive conditional hazard rate modeling can generate useful insights into multiple event time data.

Share and Cite:

Knafl, G. (2023) Adaptive Conditional Hazard Regression Modeling of Multiple Event Times. Open Journal of Statistics, 13, 492-513. doi: 10.4236/ojs.2023.134025.

1. Introduction

Cox regression, also called proportional hazard regression, is the commonly used approach for modeling single survival time, failure time, or time-to-event data (see [1] for an extensive overview). The hazard function is assumed to equal a product of an unspecified function of time and the exponent of a linear function of available covariates with estimation addressing only the slope parameters for the covariates while ignoring the underlying dependence of the hazard function on time. Extensions of Cox regression have been developed to handle multiple event times. For the case of recurrent events, Andersen and Gill [2] model the hazard rate (or intensity function) in terms of possibly time-varying covariates while Lin et al. [3] model proportional mean and rate functions for the underlying counting process in terms of possibly time-varying covariates. For the more general case of multiple event times, Prentice et al. [4] use marginal Cox regression models, one for each possible event, to model possibly time-varying covariates. Wei et al. [5] propose an alternative marginal Cox regression approach. What these approaches have in common is that covariates have multiplicative effects while general dependence on time is left un-estimated. The purpose of this article is to formulate and demonstrate an adaptive approach for conditional hazard regression modeling of multiple event times data based on fractional polynomials [6] [7] using an extension of the approach of Knafl [8] for modeling single event time data. This approach estimates the dependence of the conditional hazard rate on current time values, prior time values, and event time order as well as on possibly time-varying covariates.

2. Methods

2.1. Multiple Event Times

Let T m > 0 for 1 m M denote M continuous, possibly correlated random event time variables. Also, let C m for 1 m M denote associated random indicators for whether T m is right censored, that is, only a lower bound for T m is observable ( C m = 1 ), or whether T m is not censored, that is, an actual value for T m is observable ( C m = 0 ). Furthermore, let Z m for 1 m M be random r × 1 covariate vectors consisting of random covariates Z i , m for 1 i r and 1 m M . This formulation allows for time-varying covariates. In practice, many covariates will be time-invariant with Z i , m = Z i for 1 m M . One natural time-varying covariate is event order with values equal to the event time index m.

For 1 m M , let

W m = ( T 1 T m C 1 C m Z 1 T Z m T ) T

be the ( m ( 2 + r ) ) × 1 random vector of measurements up to time m where Z T denotes the transpose of Z . Also, let t m , c m , and z m denote values associated with the random variables/vectors T m , C m , and Z m and use these to form the associated vector values

w m = ( t 1 t m c 1 c m z 1 T z m T ) T

for W m . Note that t m is an actual value for T m when c m = 0 , that is, T m = t m , and a lower bound on T m when c m = 1 , that is, T m > t m .

For 1 m M , denote the conditional distribution function

F ( t m ; w m 1 , z m ) = P ( T m t m | W m 1 = w m 1 , Z m = z m )

For t m > 0 where W 0 = 0 and w 0 = 0 are ( m ( 2 + r ) ) × 1 zero-valued vectors so that

F ( t 1 ; w 0 , z 1 ) = P ( T 1 t 1 | Z 1 = z 1 ) .

for time-invariant covariates with Z i , m = Z i , the dependence of T m on W m 1 = w m 1 and Z m = z m is simpler than in general since it depends only on the single value z i . When 1 m M , the associated excess time probability function is

S ( t m ; w m 1 , z m ) = 1 F ( t m ; w m 1 , z m ) = P ( T m > t m | W m 1 = w m 1 , Z m = z m )

(generalizing the survival function for single time event data) and the associated conditional hazard rate function is

λ ( t m ; w m 1 , z m ) = lim Δ t 0 P ( t m < T m t m + Δ t | T m > t m , W m 1 = w m 1 , Z m = z m ) Δ t = f ( t m ; w m 1 , z m ) S ( t m ; w m 1 , z m )

where f ( t m ; w m 1 , z m ) is the conditional density function for T m | ( W m 1 = w m 1 , Z m = z m ) . Integration gives

Λ ( t m ; w m 1 , z m ) = 0 t m λ ( t ; w m 1 , z m ) d t = log ( S ( t m ; w m 1 , z m ) )

so that

S ( t m ; w m 1 , z m ) = exp ( Λ ( t m ; w m 1 , z m ) )

and

f ( t m ; w m 1 , z m ) = λ ( t m ; w m 1 , z m ) exp ( Λ ( t m ; w m 1 , z m ) ) .

A common density function f is assumed for T m | ( W m 1 = w m 1 , Z m = z m ) for all m (and so also a common distribution function F, a common excess time probability function S, a common conditional hazard rate function λ, and a common integrated conditional hazard rate function Λ), but different functions for different m may be more appropriate. The above formulation assumes that such differences are accounted for using the event order covariate with values equal to the event time index m. Also, the formulation assumes that the range of possible values for T m is not constrained by the values for T 1 , T 2 , , T m 1 . For the special case of increasing event times T m cum (as for recurrent events [9] ), values for T m cum are constrained to be larger than the value for T m 1 cum , but then consecutive differences T m = T m cum T m 1 cum with T 0 cum = 0 can be modeled instead using the above formulation.

2.2. Fractional Polynomial Conditional Hazard Rate Modeling

Fractional polynomials [10] [11] are based on power transforms of primary predictors u. For modeling the conditional hazard rate λ ( t m ; w m 1 , z m ) , the primary predictors u are functions of t m and the entries of the vectors w m 1 and z m . The values for t 1 , t 2 , , t m are positive while the values for c 1 , c 2 , , c m 1 are nonnegative. For most i, the entries z i , 1 , z i , 2 , , z i , m will be nonnegative so that it is reasonable to restrict z i , m to be nonnegative for 1 i r and 1 m M . Under this assumption, power transforms of primary predictors u can be defined as

g ( u ; q ) = { 0 , u = 0 u q u > 0

for real-valued powers q.

The fractional polynomial conditional hazard rate is defined as

λ ( t m ; w m 1 , z m ) = x T ( t m ; w m 1 , z m ) β

where x ( t m ; w m 1 , z m ) is a p × 1 vector of individual terms x j ( t m ; w m 1 , z m ) . Each term is a power transform of a single primary predictor or of a product of multiple power-transformed primary predictors (called a geometric combination, generalizing the standard interaction). The primary predictors for the fractional polynomial are a subset of t m and the entries of w m 1 and z m while β is a p × 1 vector of slope parameters β j for 1 j p . An intercept parameter can be included in the model using the unit transform u 1 as a primary predictor. This formulation requires that λ ( t m ; w m 1 , z m ) be constrained to take on positive values as addressed later. This restriction could be resolved by modeling instead the natural log of λ ( t m ; w m 1 , z m ) as a linear function in β (as in [2] [3] [4] [5] ), but then the integral

Λ ( t m ; w m 1 , z m ) = 0 t m λ ( t ; w m 1 , z m ) d t

cannot always be expressed in closed form. The advantage of the above definition is that power transforms g ( u j ; q j ) are simply integrated with respect to t m , which can also speed up computation of estimates for the parameter vector β . The possible forms for x j ( t m ; w m 1 , z m ) are a power transform of t m including the unit transform g ( t m ; 0 ) 1 , a power transform of any one entry of either w m 1 or z m , a power-transformed geometric combination in t m combined with one or more entries of w m 1 and/or z m , and a power-transformed geometric combination in one or more entries of w m 1 and/or z m but not in t m . Consequently, x j ( t m ; w m 1 , z m ) can be expressed as the product of a function of t m and a function of w m 1 and z m , that is,

x j ( t m ; w m 1 , z m ) = x j , 1 ( t m ) x j , 2 ( w m 1 , z m ) = g ( t m ; q j ) x j , 2 ( w m 1 , z m )

where x j , 2 ( w m 1 , z m ) is either the unit transform or depends on some subset of the entries of w m 1 and z m but not on t m . When q j = 0 for all j, none of the x j ( t m ; w m 1 , z m ) = x j ( w m 1 , z m ) depend on t m , and so the conditional hazard rate is constant in t m and satisfies

λ ( t m ; w m 1 , z m ) = λ ( w m 1 , z m ) = x T ( w m 1 , z m ) β .

Also, x j , 2 ( w m 1 , z m ) equals the unit transform when x j ( t m ; w m 1 , z m ) = x j , 1 ( t m ) depends only on t m . The power q j is either the power for a single power transform of t m or equals a product of a power q of t m and a power q transforming a geometric combination containing the power q of t m . In general, x j , 2 ( w m 1 , z m ) can contain power transforms of the entries of w m 1 and z m .

Integration gives

Λ ( t m ; w m 1 , z m ) = h T ( t m ; w m 1 , z m ) β

where h ( t m ; w m 1 , z m ) is the p × 1 vector with entries

h j ( t m ; w m 1 , z m ) = 0 t m x j ( t ; w m 1 , z m ) d t = ( 0 t m g ( t ; q j ) d t ) x j , 2 ( w m 1 , z m ) .

Assume that q j > 1 , so that

h j ( t m ; w m 1 , z m ) = g ( t m ; q j + 1 ) q j + 1 x j , 2 ( w m 1 , z m )

with g ( t m ; q j + 1 ) q j + 1 a positive function of time t m . The second term x j , 2 ( w m 1 , z m ) is nonnegative, but not identically 0, for example, when it equals a single indicator variable z. By definition, S ( t m ; w m 1 , z m ) is a decreasing function of t m for each unique possible value of the entries of w m 1 and z m , and so Λ ( t m ; w m 1 , z m ) must be an increasing function of t m for functions of w m 1 and z m determined by the fractional polynomial model.

Let J = { j : 1 j p } , J 1 = { j : 1 j p , q j 0 } , and J 2 = J \ J 1 . Partition x ( t m ; w m 1 , z m ) into x 1 ( t m ; w m 1 , z m ) with entries x j ( t m ; w m 1 , z m ) for j J 1 and associated slope parameter vector β 1 as well as x 2 ( w m 1 , z m ) with entries x j ( w m 1 , z m ) for j J 2 depending only on w m 1 and z m but not on t m with associated slope parameter vector β 2 . The conditional hazard function λ ( t m ; w m 1 , z m ) then satisfies

λ ( t m ; w m 1 , z m ) = x T ( t m ; w m 1 , z m ) β = x 1 T ( t m ; w m 1 , z m ) β 1 + x 2 T ( w m 1 , z m ) β 2

so that

λ ( t m ; w m 1 , z m ) = j J 1 g ( t m ; q j ) x j , 2 ( w m 1 , z m ) β j + j J 2 x j , 2 ( w m 1 , z m ) β j .

Assume for now that J 2 is empty. For models based on a single transform of t m ,

λ ( t m ; w m 1 , z m ) = λ ( t m ) = g ( t m ; q 1 ) β 1

can be guaranteed to be positive by restricting the associated slope β 1 to be positive-valued. For models based on multiple transforms of t m but not on w m 1 and z m ,

λ ( t m ; w m 1 , z m ) = λ ( t m ) = j J 1 g ( t m ; q j ) β j

can generate complicated integrals Λ ( t m ; w m 1 , z m ) = Λ ( t m ) that are not increasing in t m in general. Restricting all the associated slopes β j for j J 1 to be positive-valued is a straightforward way to guarantee that Λ ( t m ) is increasing in t m . More general models with

λ ( t m ; w m 1 , z m ) = j J 1 g ( t m ; q j ) x j , 2 ( w m 1 , z m ) β j

generate integrals Λ ( t m ; w m 1 , z m ) that need to be increasing in each unique choice for the terms x j , 2 ( w m 1 , z m ) . This can be guaranteed in a straightforward way once again by restricting all the associated slopes β j for j J 1 to be positive-valued.

On the other hand, when J 2 is nonempty, the assumption of all positive-valued slope parameters would be too restrictive. For example, consider a model based on two primary predictors: t m and a time-invariant indicator variable z for membership in one of two groups of study participants with additive effects to t m and z. The term

x 1 T ( t m ; w m 1 , z m ) β 1 = x 1 T ( t m ) β 1

models the dependence of the conditional hazard function on t m in terms of p 1 parameters while

x 2 T ( w m 1 , z m ) β 2 = z β p

provides for a shift in the conditional hazard function for the group coded as z = 1 compared to the conditional hazard function for the group coded as z = 0 . In general, this shift can be in the negative or the positive direction, and so the slope of β 2 needs to be allowed to be negative. However, allowing for negative entries for β 2 means that x T ( t m ; w m 1 , z m ) β and h T ( t m ; w m 1 , z m ) β can sometimes take on negative or zero values. See Section 2.3 for how to handle such cases.

Hazard ratios are naturally generated for models based on proportional hazard rates. Hazard ratios are not needed to assess a fractional polynomial model but can be generated if desired. For example, consider the simple fractional polynomial model λ ( t m ; t m 1 , z ) based on the current time value t m , the previous time value t m 1 , and a single covariate z with values 1 and 2. The associated hazard ratio is given by λ ( t m ; t m 1 , 2 ) / λ ( t m ; t m 1 , 1 ) , that is, the conditional hazard rate for z = 2 divided by the conditional hazard rate for z = 1 . Note that in general this is a function of the current time t m and the previous time t m 1 .

2.3. Maximum Likelihood Estimation

Let s S = { s : 1 s n } denote indexes for n study participants who provide multiple event time data consisting of observations

w m , s = ( t 1 , s t m , s c 1 , s c m , s z 1 , s T z m , s T ) T

for 1 m M ( s ) M . This allows for missing data, but just for large time indexes M ( s ) < m M . This is the kind of missing data occurring in the example data set of Section 3. More complex missing data can be readily handled if desired (see [11] ). The total number of measurements is thus

M ( S ) = s S M ( s ) .

All times t m , s for all participants s can be uncensored, but not all of them should be censored.

Let x ( t m , s ; w m 1 , s , z m , s ) be the observed vectors determined by some fractional polynomial model. For each s S and 1 m M ( s ) , the conditional likelihood terms L m , s ( β ) satisfy

L m , s ( β ) = { f ( t m , s ; w m 1 , s , z m , s ) , c m , s = 0 S ( t m , s ; w m 1 , s , z m , s ) , c m , s = 1

so that the associated log-likelihood term l m , s ( β ) = log ( L m , s ( β ) ) satisfies

l m , s ( β ) = ( 1 c m , s ) log ( λ ( t m , s ; w m 1 , s , z m , s ) ) Λ ( t m , s ; w m 1 , s , z m , s ) .

The likelihood L ( S ; β ) equals the product over s S and 1 m M ( s ) of the likelihood terms L m , s ( β ) with associated log-likelihood

l ( S ; β ) = log ( L ( S ; β ) ) = s S m = 1 M ( s ) l m , s ( β ) .

The maximum likelihood estimate β ( S ) of β maximizes the log-likelihood l ( S ; β ) over β , which is achieved by solving the estimating equations

l ( S ; β ) β = 0.

For simplicity of notation, parameter estimates β ( S ) are denoted as functions of the index set S for the observed data used in their computation without hat (^) symbols.

The gradient vector l ( S ; β ) β has entries satisfying

l ( S ; β ) β j = s S m = 1 M ( s ) l m , s ( β ) β j = s S m = 1 M ( s ) ( ( 1 c s m , ) x j ( t m , s ; w m 1 , s , z m , s ) x T ( t m , s ; w m 1 , s , z m , s ) β h j ( t m , s ; w m 1 , s , z m , s ) )

for j J . The Hessian matrix H ( S ; β ) has entries satisfying

2 l ( S ; β ) β j β j = s S m = 1 M ( s ) 2 l m , s ( β ) β j β j = s S m = 1 M ( s ) ( ( 1 c m , s ) x j ( t m , s ; w m 1 , s , z m , s ) x j ( t m , s ; w m 1 , s , z m , s ) ( x T ( t m , s ; w m 1 , s , z m , s ) β ) 2 )

for j , j J . The gradient vector and Hessian matrix can be used in a Newton-Raphson algorithm to compute the maximum likelihood estimate β ( S ) .

When x T ( t m , s ; w m 1 , s , z m , s ) β 0 or h T ( t m , s ; w m 1 , s , z m , s ) β 0 for some s, set

λ ( t m , s ; w m 1 , s , z m , s ) = δ

for some small positive value δ like 0.00001 and also set

Λ ( t m , s ; w m 1 , s , z m , s ) = δ t m , s .

However, leave the gradient vector and Hessian matrix unchanged so that maximum likelihood estimation uses actual derivatives with respect to the parameter vector β .

Models based on a single transform with p = 1 can be computed directly. In this case, β = ( β 1 ) and the gradient is a scalar value satisfying

l ( S ; β ) β 1 = s S m = 1 M ( s ) ( 1 c m , s β 1 h 1 ( t m , s ; w m 1 , s , z m , s ) )

so that

β 1 ( S ) = n s S m = 1 M ( s ) c m , s s S m = 1 M ( s ) h 1 ( t m , s ; w m 1 , s , z m , s ) .

The covariance matrix for the parameter estimate vector β ( S ) can be estimated in two ways: the model-based estimate treats the assumed model as the true model while the robust empirical (or sandwich) estimate allows for the true model to be different from the assumed model. The model-based estimate of the covariance matrix for β ( S ) equals the J × J matrix satisfying

Σ MB ( β ( S ) ) = H 1 ( S ; β ( S ) ) ,

that is, the negative of the inverse of the Hessian matrix evaluated at the estimated parameter vector β ( S ) . The robust empirical estimate equals the J × J matrix satisfying

Σ RE ( β ( S C ) ) = H 1 ( S ; β ( S ) ) G ( β ( S ) ) G T ( β ( S ) ) H 1 ( S ; β ( S ) )

where G ( β ( S ) ) is the J × m ( S ) matrix with entries G j , m , s ( β ( S ) ) for 1 j J , 1 m M ( s ) , and s S satisfying

G j , m , s ( β ( S ) ) = l m , s ( β ) β j .

Note that the gradient vector l ( S ; β ( S ) ) β equals the J × 1 vector generated

by summing the rows of G ( β ( S ) ) . The diagonal entries of Σ MB ( β ( S ) ) and Σ RE ( β ( S ) ) are estimates of variances for the parameter estimates corresponding to the entries of β ( S ) , and so associated standard errors equal the square roots of these diagonal entries. These standard errors can be used to compute tests for zero individual model parameters to use in assessing specific theoretically important models. However, tests for parameters of adaptively generated models are typically significant as a consequence of the adaptive modeling process (as summarized in Section 2.5). Consequently, results for such tests are not reported for adaptive models generated in example analyses, but are reported for theory-based models.

2.4. Likelihood Cross-Validation

Partition the index set S into k > 1 disjoint sets S ( h ) , called folds, for h H = { h : 1 h k } . Note that all M ( s ) measurements w m , s for a given index s are allocated to the same fold. The LCV score for some model is defined as

LCV = h H ( L ( S ( h ) ; β ( S \ S ( h ) ) ) ) 1 / M ( S ) .

In other words, evaluate the likelihood for each fold S ( h ) using the parameter vector β ( S \ S ( h ) ) computed by maximizing the likelihood for the complement S \ S ( h ) of the fold, normalize it by the total number of measurements M ( S ) , and multiply these normalized deleted fold likelihoods together to get the LCV score.

When one or more slopes β j 0 for some j J 1 are generated as part of the estimation computations, reset l ( S ; β ) = Δ for some large value Δ such as 700 unless l ( S ; β ) is already smaller than Δ . For a model with parameter vector estimate β ( S ) having one or more slopes β j ( S ) 0 for some j J 1 , reset the LCV score to a very small value δ such as 10−12. These adjustments guarantee that the adaptive modeling process of Section 2.5 generates models with acceptable parameter vector estimates β ( S ) .

A larger LCV score indicates a better model, but not necessarily a distinctly (substantially, significantly) better model. A χ 2 -based LCV ratio test, analogous to a likelihood ratio test, can be used to decide if there is a distinct improvement or not in the LCV score. Following [7] , these tests are expressed in terms of the percent decrease (PD) in the LCV score for the model with the smaller score compared to the model with the larger score and a cutoff for a distinct PD, changing with the sample size. A PD larger than the cutoff indicates that the model with the larger score distinctly (substantially, significantly) improves on the model with the smaller score. Otherwise, the model with the smaller score is a competitive alternative and, if also simpler, it is a parsimonious, competitive alternative and so a preferable choice. Examples of LCV ratio tests are provided in Section 3.

2.5. Adaptive Model Selection

An effective choice of a fractional polynomial conditional hazard rate model based on a subset of the primary predictors t m , s and the coordinates of w m 1 , s and z m , s is identified adaptively using a heuristic search process controlled by tolerance parameters indicating how much of a change in the LCV score is allowable at each stage of the process. Starting from a base conditional hazard rate model, which is usually the constant model, power transforms and also geometric combinations if requested are systematically added to the model. Then, the expanded model is contracted, removing extraneous transforms if any and adjusting the powers of the remaining transforms. LCV ratio tests are used to decide whether to stop the contraction or continue removing transforms. Only a brief overview of the adaptive modeling process is provided here; details are provided in [11] . Note that geometric combinations generalize standard interactions and so provide for a nonlinear assessment of the concept of moderation [12] . Adaptive modeling applies to both independent and correlated outcomes in a variety of other regression contexts such as linear regression [7] , logistic regression [13] , Poisson regression [14] [15] , and discrete regression [16] .

Power transforms of time values t m , s can cause computational problems for the adaptive modeling process when some of those time values are large. For this reason, fractional polynomial conditional hazard rate models are computed using the normalized event time values t m , s = t m , s / t * and associated vectors w m , s computed using these normalized times where

t * max { t m , s : 1 m M ( s ) , s S }

is an upper bound on the observed time values. This generates estimates λ ( t m , s ; w m 1 , s , z m , s ) and S ( t m , s ; w m 1 , s , z m , s ) of the conditional hazard rate and the excess time probability function, respectively. Then, define the estimated conditional hazard rate and excess time probability function for the original time values t m , s as

λ ( t m , s ; w m 1 , s , z m , s ) = λ ( t m , s / t * ; w m 1 , s , z m , s )

and

S ( t m , s ; w m 1 , s , z m , s ) = S ( t m , s / t * ; w m 1 , s , z m , s )

for any t m , s , w m 1 , s , and z m , s . This adjustment guarantees that power transforms are computed using bounded time values within the interval (0, 1]. It is not needed when t * 1 to start with, but that is unlikely to hold except in rare cases.

2.6. Modeling Constraints

A variety of constraints are needed so that a fractional polynomial provides an appropriate conditional hazard rate model. Specifically, covariate values z i , m for 1 i r , 1 m M are restricted to be nonnegative (Section 2.2). Power transforms g ( t ; q ) have powers restricted to satisfy q > 1 (Section 2.2). Conditional hazard rates λ ( t m , s ; w m 1 , s , z m , s ) are adjusted to be positive-valued and their integrals Λ ( t m , s ; w m 1 , s , z m , s ) to be increasing in t m , s (as addressed in Section 2.3). Slope parameters β j for j J 1 corresponding to power transforms depending on time t are restricted to be positive-valued (as addressed in Section 2.4). Also, fractional polynomial conditional hazard rate models are computed using normalized event times (as addressed in Section 2.5).

3. Example Analyses

Example analyses are presented in this section demonstrating the use and applicability of conditional hazard regression modeling based on fractional polynomials. These analyses use data on recurrence of bladder cancer tumors as described in Section 3.1. Examples are provided of both theory-based modeling and adaptive modeling of these data. Analyses are conducted using SAS® Version 9.4. A SAS macro for conducting adaptive modeling in a variety of regression contexts including conditional hazard regression is available on request from the author.

3.1. Multiple Tumor Event Times for Bladder Cancer Patients

Data are provided in [5] on the recurrence of up to M = 4 tumors for n = 86 patients initially diagnosed with bladder cancer collected by [17] . Patients have from M ( s ) = 1 to M ( s ) = 4 possibly censored recurrence times for a total of M ( S ) = 179 time measurements. LCV scores are computed using k = 5 folds with fold sizes ranging from 15 (17.4%) to 20 (23.3%) patients and from 31 (17.3%) to 42 (23.5%) time measurements so that both fold sizes and fold complement sizes are not proportionately sparse. The cutoff for a distinct percent decrease (PD) in the LCV score for data with m ( S ) = 179 observations is 1.07%.

Patients have follow-up times t s max ranging from 0 to 64 months for s S . Observed tumor recurrence event times t s , m cum t s max have values increasing in When a patient has a total of 0 m 3 tumor recurrences with t m cum < t s max , a censored time t s , m + 1 cum = t s max is added to the measurements for this patient, and so then m ( s ) = m + 1 4 for that patient and the only possible censored event time for all patients is the last one t s , M ( s ) cum . In cases where patients have m = 4 total tumor recurrences and t s , m cum < t s max , a fifth censored time could be added but has not been to restrict to at most 4 event times (as in [5] ). One patient has follow-up time t s max = 0 with no observed tumor recurrences. Technically, this introduces a single censored event time t s , 1 cum = 0 , which is trivial (i.e., the associated likelihood has value 1), and so the follow-up time for this patient has been changed to t s max = 1 so that t s , 1 cum = 1 is non-trivial. Example analyses use the consecutive times between tumors given by the differences t s , m = t s , m cum t s , m 1 cum with t s , 0 cum = 0 . Censoring is unchanged.

Available covariates include treatment group, the initial number of tumors, and the initial tumor size. Patients are categorized into two treatment groups: 48 (55.8%) receiving a placebo treatment (group = 1) and 38 (44.2%) receiving thiotepa treatment (group = 2). Initial numbers of tumors range from 1 - 8 with 51 (59.3%), 11 (12.8%), 10 (11.6%), and 14 (16.3%) patients having 1, 2, 3, and 4-8, respectively. Initial tumor sizes range from 1 - 7 with 49 (57.0%), 10 (11.6%), 16 (18.6%), and 11 (12.8%) patients having 1, 2, 3, and 4-7, respectively. These covariates are time-invariant. They are denoted as group, number, and size in example analyses for brevity. The primary theory-based model for the data is the one based on these three covariates considered jointly.

Time-varying covariates are also considered in example analyses. These address possible dependence of the conditional hazard rate on event order and time. The dependence predictors include tumor order m with possible values 1 - 4, the current observed time t m , the prior observed time t m 1 with t 0 = 0 , and the prior cumulative time t m 1 cum (or, equivalently, the sum of the prior time values t 0 + + t m 1 ) with t 0 cum = 0 . Note that the last two of these predictors induce dependence of an order 1 autoregressive type. Dependence predictors of higher order autoregressive types may have value especially for data with relatively large maximum numbers of event times M, but these are not considered in example analyses for brevity.

Of the 179 tumor recurrence times, 86 (48.0%) occur at tumor order 1, 46 (25.7%) at tumor order 2, 27 (15.1%) at tumor order 3, and 20 (11.2%) at tumor order 4. Also, 67 (37.4%) of these recurrence times are censored.

The example analyses are actually conducted using times t m cum / t * normalized by t * = 60 and their differences. However, the unadjusted symbols t m cum and t m are used to denote times in the example analyses to reduce the complexity of the notation. The patient index s is dropped for the same reason.

3.2. Modeling the Conditional Hazard Rate

Existing methods [2] [3] [4] [5] for modeling multiple event times are extensions of Cox regression methods, and so ignore the general dependence on current and prior times except for how those times affect covariates. Thus, they restrict to estimating only such covariate effects. However, fractional polynomial conditional hazard rate models need to estimate dependence effects prior to conducting theory-based or adaptive modeling based on other covariates. This section provides example analyses addressing dependence for the bladder cancer event times using fractional polynomials.

To address the dependence of conditional hazard rates over time, consider models based on the following three time-varying predictors: tumor order m, the current time value t m , and the prior time value t m 1 . The adaptive additive model in these three dependence predictors is based on untransformed tumor order m (i.e., transformed by the power 1) without an intercept and with LCV score 1.07006. The associated moderation model in these three dependence predictors (i.e., allowing for geometric combinations) is based on the single power transform m 0. 99 without an intercept, almost the same model as the additive model.

An alternate approach for accounting for dependence would be the adaptive model based on the following three time-varying predictors: tumor order m, the current time value t m , and the prior cumulative time value t m 1 cum , that is, with the prior time value replaced by the prior cumulative time value. The adaptive additive model in these three dependence predictors is based on untransformed event order m without an intercept, the same model as generated using the prior time value. The associated moderation model in these three dependence predictors is based on

the single power transformed geometric combination ( m 1.5 ( t m 1 cum ) 0.2 t m 0.08 ) 1.2 with

an intercept and LCV score 1.07036. The associated additive model with LCV score 1.07006 has non-distinct PD 0.03% (i.e., less than the cutoff of 1.07% for a distinct PD), and so is preferable as a parsimonious, competitive alternative.

This preferable dependence model generates increasing conditional hazard rate estimates of 1.73, 3.47, 5.20, and 6.93 as tumor order m varies from 1 - 4. Estimated excess time probability curves are plotted in Figure 1. The excess time probability decreases with increased times between tumors and more quickly for larger tumor orders m. Also, observed ranges for times between tumors tend to be smaller for larger tumor orders, contributing to higher estimated conditional hazard rates and more quickly decreasing estimated excess time probability curves. This dependence model is used next to address theory-based effects to non-dependence covariates using fractional polynomial conditional hazard rate models.

3.3. Modeling Theory-Based Effects

Analyses of multiple event times using the methods of [2] [3] [4] are supported by SAS PROC PHREG (for proportional hazards regression). Results generated using SAS for these methods applied to the bladder cancer event times are reported in [18] and described in what follows. Results for the method of [5] applied

Figure 1. The estimated excess time probability as a function of time between bladder cancer tumors by tumor order.

to the bladder cancer event times are reported in that article.

The model of [2] is called the intensity model since it models the hazard rate. The joint intensity model in group, number, and size has associated p-values 0.022, <0.001, and 0.538, respectively, generated using model-based standard errors. These results support the conclusion of significant joint effects on the hazard rate due to group and number but not due to size. The estimate for group is negative indicating that thiotepa treatment reduces the hazard rate for times between bladder cancer tumors. The estimate for number is positive indicating that a larger number of initial tumors increases the hazard rate for times between bladder cancer tumors. The model of [3] is called the proportional means model. It has the same slope estimates as the intensity model but uses the robust empirical standard errors to test these slope estimates. The joint proportional means model in group, number, and size has associated p-values 0.075, 0.005, and 0.572, respectively, supporting a significant joint effect due to number (and also increasing in number as before), but not due to group and size.

The model of [4] is called a marginal Cox model since it treats the effects of group, number, and size as changing with tumor order m. It can be applied to times between tumors, and is then called a gap model, or alternately to cumulative times. It is more complex having different slopes for group, number, and size for each of the 4 values for tumor order m. Of these 12 slopes, only two estimates for the gap model are significant: the estimates for number for tumor orders 1 and 4 ( p = 0.002 and p = 0.034 , both with positive slope estimates). Results reported in [5] for their alternate marginal Cox model applied to the bladder cancer data are described as "providing some evidence, though not very strong, of thiotepa slowing down tumor recurrence" (they focus on only the effects to thiotepa over the 4 tumor orders controlling for number and size rather than on the effects for all three covariates in combination).

Results for the theory-based conditional hazard rate model based on fractional polynomials are reported in Table 1 including model-based and robust empirical tests for group, number, and size. Robust empirical test results are more conservative in the sense that standard errors are larger and so p-values are larger. However, for both of these two types of tests, estimates for group and number

Table 1. Assessing the theory-based model for times between bladder cancer tumors using conditional hazard rate modeling based on fractional polynomials.

SE—standard error; 1Group has values 1 (placebo) and 2 (thiotepa), number has values 1 - 8, and size has values 1 - 7. 2Tests based on the model controlling for all three covariates as well as for tumor order to address dependence over time.

are significant at p < 0.05 while the estimate for size is not significant. This holds while controlling jointly for all three of these covariates as well as for the dependence predictor untransformed tumor order m as identified in Section 3.1. As for the intensity model of [2] , the estimate for group is negative indicating that thiotepa treatment reduces the hazard rate for times between bladder cancer tumors and the estimate for number is positive indicating that a larger number of initial tumors increases the hazard rate for times between bladder cancer tumors.

The impact of these dependence and non-dependence covariates can be further assessed adaptively by contracting the full model in all 4 covariates to a parsimonious competitive alternative while restricting the contraction to leave the covariates untransformed. The generated model is based on untransformed effects to tumor order, group, and number with the estimate for group negative as before and the estimate for number positive as before. The inclusion of group and number in this adaptively reduced model means that the removal of each of these from the model generates a distinct PD in the LCV score, providing further support for conclusions about the significance of their joint effects on time between bladder cancer tumors.

The above result indicates that there is a distinct effect to group controlling for number. This raises the question of what effect group has by itself, not controlling for number and size, but still allowing for dependence on untransformed tumor order. The estimated slope for group in this model is negative as before, but it is non-significant for both the model-based ( p = 0.349 ) and the robust empirical ( p = 0.436 ) tests. This indicates that the conclusion of significance of the effect of group based on theory-based models requires controlling for number.

3.4. Adaptive Modeling Results

This section addresses adaptive modeling of times between bladder cancer tumors. These models assume that dependence is a function of only tumor order m as supported by the results of Section 3.1, but allows for its possible transformation rather than keeping it untransformed as in Sections 3.1-3.2. Section 3.3.1 addresses the impact of group by itself while Section 3.3.2 addresses the combined impact of group and number. Effects to size are not addressed for brevity. This seems reasonable given that size has weak effects for the theory-based models of Section 3.2.

3.4.1. Adaptive Modeling in Terms of Group

The adaptive additive model in tumor order m (to account for dependence) and group is the model based on untransformed tumor order by itself and does not depend on group. This is the same model reported in Section 3.1 with LCV score 1.07006. The associated adaptive moderation model in these two predictors is the model based on the two transformed geometric combinations ( group m 1.22 ) 0.4 and ( group m 7 ) 5.31 without an intercept and with LCV score 1.08477. The LCV score for the additive model generates a distinct PD of 1.36%, indicating that group distinctly moderates the effect of tumor order on time between bladder cancer tumors.

Estimated conditional hazard rates for this moderation model are depicted in Figure 2. For all 4 tumor orders, the conditional hazard rate for patients given thiotepa is lower than for patients given placebo treatment with the effect strongest for tumor order 2.

3.4.2. Adaptive Modeling in Terms of Group and Number

In order to generate models in number that are more readily interpretable, analyses in this section use min ( number , 4 ) , that is, number bounded to be no more than 4, combining the smaller categories 4 - 8 into one. The additive model in tumor order m, group, and min ( number , 4 ) is the model based on m 0.6 , ( min ( number , 4 ) ) 12 , and group with LCV score 1.11380. The associated moderation model in these three predictors is the model based on the two transformed geometric combinations

( group m 2 ( min ( number , 4 ) ) 0.6 ) 0.5

and

( group ( min ( number , 4 ) ) 19 m 7 ) 1

without an intercept and with LCV score 1.12023. The LCV score for the additive model generates a non-distinct PD of 0.57%, indicating that the additive model is preferable as a competitive alternative. It is also simpler since the effect of group is the same for all values of tumor order and of min ( number , 4 ) .

Estimated conditional hazard rates for patient on thiotepa treatment based on the above additive model are depicted in Figure 3. Estimated conditional hazard rates are increasing in tumor order with, for each tumor order, essentially the

Figure 2. The estimated conditional hazard rate as a function of tumor order and treatment group (placebo or thiotepa) for times between bladder cancer tumors.

same values for numbers 1 - 3 and much larger values for numbers 4 - 8 combined. In all cases, estimated conditional hazard rates are larger by 1.07 units for patients on placebo treatment (not plotted in Figure 3).

Figure 4 provides plots for excess time probability curves for number 1 compared to numbers 4 - 8, each by group for tumor order 1. For each group, the curves for numbers 4 - 8 decrease more quickly than the curves for number 1 (which is about the same as for number 2 and for number 3, but these are not plotted). For each of the number 1 and 4 - 8 cases, the curves for the placebo

Figure 3. The estimated conditional hazard rate as a function of tumor order and initial number for times between bladder cancer tumors for patients on thiotepa treatment; estimates for placebo treatment are all 1.07 units larger.

Figure 4. The estimated excess time probability as a function of time between bladder cancer tumors by initial number 1 and numbers 4-8, each by group for tumor order 1.

group decrease more quickly than the curves for the thiotepa group with a stronger thiotepa benefit for number 1 than for numbers 4 - 8.

Figure 5 provides plots for excess time probability curves for the same cases as in Figure 4 but for tumor order 2. The patterns are similar to those of Figure 4, but with curves decreasing more quickly than associated curves of Figure 4. Plots are not provided for tumor order 3 - 4, but in general curves tend to decrease more quickly for larger tumor orders, but observed data are also relatively sparse for tumor order 3 - 4.

4. Discussion

An adaptive approach is formulated for analyzing multiple event time data. Conditional hazard rates are modeled in terms of dependence on both time and covariates using fractional polynomials, that is, weighted sums of products of powers of time and other available predictors. Models are restricted so that conditional hazard rates have positive values and so that estimated excess time probability functions decrease with time for functions of the other predictors determined by the model. Maximum likelihood is used to estimate parameters adjusting for right censored event times.

Likelihood cross-validation (LCV) scores are used to compare models. LCV scores are normalized products of deleted likelihoods for k = 5 subsets, called folds, of the data computed using parameters estimated using the complements of those folds. A model with a larger LCV score is a better model, but may not be a distinctly better model. LCV ratio tests generalizing standard likelihood ratio tests are used to identify distinct differences in LCV scores. These tests are based on a cutoff for a distinct percent decrease in the LCV score.

Figure 5. The estimated excess time probability as a function of time between bladder cancer tumors by initial number 1 and numbers 4-8, each by group for tumor order 2.

Adaptive searches through alternative models are used to identify effective models for a given data set in the sense that the removal of any of the model’s terms generates a distinct decrease in the LCV score. This is accomplished in two stages: an expansion to grow the model followed by a contraction to prune ineffective terms if any from the expanded model. Geometric combinations equal to products of powers of multiple predictors can optionally be generated. These generalize standard interactions and so provide for a nonlinear assessment of moderation.

Conditional hazards regression is demonstrated using data consisting of up to four possibly censored tumor recurrence times for patients diagnosed with bladder cancer. Dependence is modeled for times between tumor recurrence in terms of tumor order, the current time to recurrence, the prior time to recurrence, and the prior cumulative time. Adaptive analyses indicate that dependence for these times between tumor recurrence is reasonably modeled in terms of tumor order by itself, indicating that adaptive modeling of the tumor recurrence data can reasonably base dependence solely on tumor order. Figure 1 provides plots for associated estimated excess time probability curves by tumor order. This dependence model is used to conduct theory-based modeling of the times between tumor recurrence.

The example analyses start with an assessment of theory-based modeling. The primary theory-based model for times between tumor recurrence is based on the joint covariate effects of treatment group (thiotepa versus placebo), initial number of tumor (1 - 8), and initial size of the tumors (1 - 7). Existing methods for analyzing these data provide conflicting conclusions for this theory-based model. The intensity model of [2] identifies significant effects to group and initial number using model-based tests for zero slopes. The proportional means model of [3] generates the same slope estimates as the intensity model but uses robust empirical tests which identify only a significant effect to initial number. The marginal Cox model of [4] assumes that the slopes for the three covariates change with the four tumor orders, thereby increasing the number of parameters four-fold and so possibly overfitting the data. This model only identifies significant effects to initial number for two of the four tumor orders. The alternate marginal Cox model of [5] focuses on the group effect controlling for the other effects and concludes that there is “some evidence, though not very strong” of a group effect. It is not clear which of these models should be used to reach an appropriate theory-based conclusion.

On the other hand, fractional polynomial modeling indicates that there are significant effects to group and initial number using both model-based and robust empirical tests (Table 1). This approach differs from the approaches of [2] and [3] by estimating the dependence as well as the covariate effects, which results in consistent conclusions about the group effect for both model-based and robust empirical tests. It differs from the marginal Cox approaches of [4] and [5] in using tumor order to model dependence rather than having all covariate effects change with tumor order, which apparently is more effective due to generating a more parsimonious model. All five approaches indicate that initial tumor size has no effect.

Further assessment of the theory-based model using fractional polynomial modeling indicates that there is a distinct effect to group, but only after controlling for the effect to initial number and not when considered by itself. Further adaptive modeling indicates that group considered by itself distinctly moderates the effect of tumor order on times between tumor recurrence. Figure 2 provides a comparison of estimated conditional hazard rates by tumor order and group for this moderation model.

Adaptive modeling also identifies combined effects to tumor order, group, and initial number of tumors, but with the latter covariate adjusted to combine the small categories of 4 - 8 numbers in order to provide for a more readily interpretable effect to initial number of tumors. The additive model in these predictors is preferable over the associated moderation model. Figure 3 displays the estimated conditional hazard rate over tumor order and initial number for the thiotepa group. The conditional hazard rate for each tumor order is almost the same for each of initial numbers 1 - 3 and then increases for initial numbers 4 - 8 combined and this pattern increases with increasing tumor order. In all cases, the conditional hazard rate for patients on placebo treatment is 1.07 units higher. Figure 4 & Figure 5 provide plots of estimated excess time probability curves generated by this model for tumor order 1 and 2, respectively. Plots are not provided for tumor order 3 and 4, but in general these curves tend to decrease more quickly for larger tumor order. However, observed data are relatively sparse for tumor order 3 and 4.

The example analyses are limited in only addressing recurrent event time data and not more general multiple event time data, but the methods have been formulated to handle more general cases. Censoring for these analyses can only occur for the last event for each participant. However, conditional hazards regression has been formulated in terms of previous time values whether they are censored or not and so generalizes to more complex censoring situations. Conditional hazards regression can also be applied to model clustered multiple event time data (as in [19] ) as long as the clusters can be assigned appropriate event orders indexes. Future work is needed to investigate these more complex multiple event time data issues.

In summary, conditional hazard regression modeling has been formulated to model dependence across multiple event times as well as the effects of covariates. It has also been demonstrated using both theory-based and adaptive analyses of times between tumor recurrence for bladder cancer patients. Results of example analyses indicate that adaptive conditional hazard rate modeling can generate useful insights into multiple event time data, but future work is needed to apply this method to more general sets of data.

Acknowledgements

Prior development of the SAS macro used to conduct reported example analyses was supported in part by grants R01 AI57043 from the National Institute of Allergy and Infectious Diseases and R03 MH086132 from the National Institute of Mental Health. Work reported on in this article was not supported by external funding.

Conflicts of Interest

The authors declare no conflicts of interest.

References

[1] Kalbfleisch, J.D. and Schaubel, D.E. (2023) Fifty Years of the Cox Model. Annual Review of Statistics and Its Application, 10, 1-23.
https://doi.org/10.1146/annurev-statistics-033021-014043
[2] Andersen, P.K. and Gill, R.D. (1982) Cox’s Regression Model for Counting Processes: A Large Sample Study. The Annals of Statistics, 10, 1100-1120.
https://doi.org/10.1214/aos/1176345976
[3] Lin, D.Y., Wei, L.J., Yang, I. and Ying, Z. (2000) Semiparametric Regression for the Mean and Rate Functions of Recurrent Events. Journal of the Royal Statistical Society Series B, 62, 711-730.
https://doi.org/10.1111/1467-9868.00259
[4] Prentice, R.L., Williams, B.J. and Peterson, A.V. (1981) On the Regression Analysis of Multivariate Failure Time Data. Biometrika, 68, 373-379.
https://doi.org/10.1093/biomet/68.2.373
[5] Wei, L.J., Lin, D.Y. and Weissfeld, L. (1989) Regression Analysis of Multivariate Incomplete Failure Time Data by Modeling Marginal Distributions. Journal of the American Statistical Association, 84, 1065-1073.
https://doi.org/10.1080/01621459.1989.10478873
[6] Royston, P. and Altman, D.G. (1994) Regression Using Fractional Polynomials of Continuous Covariates: Parsimonious Parametric Modeling. Applied Statistics, 43, 429-467.
https://doi.org/10.2307/2986270
[7] Knafl, G.J. (2018) Adaptive Fractional Polynomial Modeling. Open Journal of Statistics, 8, 159-186.
https://doi.org/10.4236/ojs.2018.81011
[8] Knafl, G.J. (2023) An Adaptive Approach for Hazard Regression Modeling. Open Journal of Statistics, 13, 300-315.
https://doi.org/10.4236/ojs.2023.133016
[9] Amorim, L. and Cai, J. (2014) Modelling Recurrent Events: A Tutorial for Analysis in Epidemiology. International Journal of Epidemiology, 44, 324-333.
https://doi.org/10.1093/ije/dyu222
[10] Royston, P. and Sauerbrei, W. (2008) Multivariable Model-Building: A Practical Approach to Regression Analysis Based on Fractional Polynomials for Modelling Continuous Variables. John Wiley & Sons, Hoboken.
https://doi.org/10.1002/9780470770771
[11] Knafl, G.J. and Ding, K. (2016) Adaptive Regression for Modeling Nonlinear Relationships. Springer International Publishing, Berlin.
[12] Baron, R.M. and Kenny, D.A. (1986) The Moderator-Mediator Variable Distinction in Social Psychology Research: Conceptual, Strategic, and Statistical Considerations. Journal of Personality & Social Psychology, 51, 1173-1182.
https://doi.org/10.1037/0022-3514.51.6.1173
[13] Knafl, G.J. (2018) A Reassessment of Birth Defects for Children of Participants of the Air Force Health Study. Open Journal of Epidemiology, 8, 187-200.
https://doi.org/10.4236/ojepi.2018.84015
[14] Knafl, G.J. and Meghani, S.H. (2021) Modeling Individual Patient Count/Rate Data over Time with Applications to Cancer Pain Flares and Cancer Pain Medication Usage. Open Journal of Statistics, 11, 633-654.
https://doi.org/10.4236/ojs.2021.115038
[15] Knafl, G.J. (2022) Adaptive Regression for Nonlinear Interrupted Time Series Analyses with Application to Birth Defects in Children of Vietnam War Veterans. Open Journal of Statistics, 12, 789-809.
https://doi.org/10.4236/ojs.2022.126045
[16] Knafl, G.J. and Meghani, S.H. (2021) Regression Modeling of Individual-Patient Correlated Discrete Outcomes with Applications to Cancer Pain Ratings. Open Journal of Statistics, 12, 456-485.
https://doi.org/10.4236/ojs.2022.124029
[17] Byar, D.P. (1980) The Veterans Administration Study of Chemoprophylaxis for Recurrent Stage I Bladder Tumors: Comparisons of Placebo, Pyridoxine, and Topical Thiotepa. In: Pavone-Macaluso, M., Smith, P.H. and Edsmyn, F., Eds., Bladder Tumors and Other Topics in Urological Oncology, Plenum, New York, 363-370.
https://doi.org/10.1007/978-1-4613-3030-1_74
[18] SAS Institute Inc. (2004) SAS/STAT 9.1 User’s Guide: Volume 5. SAS Institute Inc., Cary.
[19] Lee, E., Wei, L. and Amato, D. (1992) Cox-Type Regression Analyses for Large Numbers of Small Groups of Correlated Failure Time Observations. In: Klein, J.P. and Goel, P.K., Eds., Survival Analysis: State of the Art, Nato Science, Vol. 211, Springer, Dordrecht, 237-247.
https://doi.org/10.1007/978-94-015-7983-4_14

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.