An Adaptive Approach for Hazard Regression Modeling

Abstract

Regression models for survival time data involve estimation of the hazard rate as a function of predictor variables and associated slope parameters. An adaptive approach is formulated for such hazard regression modeling. The hazard rate is modeled using fractional polynomials, that is, linear combinations of products of power transforms of time together with other available predictors. These fractional polynomial models are restricted to generating positive-valued hazard rates and decreasing survival times. Exponentially distributed survival times are a special case. Parameters are estimated using maximum likelihood estimation allowing for right censored survival times. Models are evaluated and compared using likelihood cross-validation (LCV) scores. LCV scores and tolerance parameters are used to control an adaptive search through alternative fractional polynomial hazard rate models to identify effective models for the underlying survival time data. These methods are demonstrated using two different survival time data sets including survival times for lung cancer patients and for multiple myeloma patients. For the lung cancer data, the hazard rate depends distinctly on time. However, controlling for cell type provides a distinct improvement while the hazard rate depends only on cell type and no longer on time. Furthermore, Cox regression is unable to identify a cell type effect. For the multiple myeloma data, the hazard rate also depends distinctly on time. Moreover, consideration of hemoglobin at diagnosis provides a distinct improvement, the hazard rate still depends distinctly on time, and hemoglobin distinctly moderates the effect of time on the hazard rate. These results indicate that adaptive hazard rate modeling can provide unique insights into survival time data.

Share and Cite:

Knafl, G. (2023) An Adaptive Approach for Hazard Regression Modeling. Open Journal of Statistics, 13, 300-315. doi: 10.4236/ojs.2023.133016.

1. Introduction

Cox regression, also called proportional hazard regression, is the commonly used approach for modeling survival time or time-to-event data [1] . The hazard function is assumed to equal a product of an unspecified function of time and the exponent of a linear function of available predictors with estimation addressing only the slope parameters for the predictors while ignoring the underlying dependence of the hazard function on time. An important extension would be an approach that estimates the dependence on time as well as on the predictors. Kooperberg, Stone, and Truong [2] provide such an extension that generalizes standard proportional hazard regression by using splines to estimate the dependence on time. More recently, Qian and Peng [3] have proposed using quantile regression as an alternative for survival time modeling while Bouaziz and Nuel [4] have proposed the use of models based on piecewise constant hazard rates. See [5] for an overview of hazard rate modeling.

The objectives of this article are to formulate an adaptive approach for hazard regression modeling based on fractional polynomials [6] [7] and then to demonstrate this approach using two survival time data sets, one for lung cancer patients and one for multiple myeloma patients. This is a novel, original approach accounting for the dependence of the hazard rate on time while allowing for non-proportionality of the hazard rate as well as for nonlinearity in time and in other available predictors.

2. Methods

2.1. Survival Times

Let T > 0 denote a continuous random survival time variable with conditional cumulative distribution function F ( t ; z ) = P ( T t | Z = z ) for time values t > 0 where Z is a random r × 1 vector with vector value z consisting of predictor variables Z i with values z i for 1 i r . T has conditional density function

f ( t ; z ) = d F ( t ; z ) d t ,

conditional survival function

S ( t ; z ) = 1 F ( t ; z ) = P ( T > t | Z = z ) ,

and conditional hazard function

λ ( t ; z ) = lim t 0 P ( t < T t + Δ t | T > t , Z = z ) Δ t = f ( t ; z ) S ( t ; z ) .

Integration gives

Λ ( t ; z ) = 0 t λ ( w ; z ) d w = log ( S ( t ; z ) )

so that S ( t ; z ) = exp ( Λ ( t ; z ) ) and f ( t ; z ) = λ ( t ; z ) exp ( Λ ( t ; z ) ) . The exponential distribution generates the special case with the hazard function constant in time t so that λ ( t ; z ) = λ ( z ) and S ( t ; z ) = exp ( λ ( z ) t ) .

2.2. Fractional Polynomial Hazard Rate Modeling

Let u be a primary predictor and q a real-valued power. As in [7] , define the associated general power transform g ( u ; q ) as

g ( u ; q ) = { cos ( π q ) | u | q , u < 0 0 , u = 0 u q , u > 0

where π is the usual constant. Note that when u = t , only the case u > 0 can hold. Predictors z i , for 1 i r , are typically nonnegative, and so assume this holds so that the case u < 0 is not needed and g ( u ; q ) is always nonnegative. Indicator predictors u with values 0 or 1 are untransformed (i.e., q = 1 ).

Let the hazard function λ ( t ; z ) be a general fractional polynomial in the primary predictors t and the coordinates z i of z , that is, a linear combination of products of power transforms of those primary predictors. Specifically, let x ( t ; z ) be a p × 1 vector of power transforms x j ( t ; z ) = g ( u j ; q j ) for 1 j p where u j is either t, z i for 1 i r , or a geometric combination, that is, a product of power transforms of some subset of t and z i for 1 i r . Note that geometric combinations generalize standard interactions and so provide for a nonlinear assessment of the concept of moderation [8] .

The fractional polynomial hazard rate is defined as

λ ( t ; z ) = x T ( t ; z ) β

where x T denotes the transpose of x while β is a p × 1 vector of slope parameters β j for 1 j p . An intercept parameter can be included in the model using the unit transform as a primary predictor. This definition requires that λ ( t ; z ) be constrained to take on positive values as addressed later. This restriction could be resolved by modeling instead the natural log of λ ( t ; z ) as a linear function in β (as in [2] and [9] ), but then the integral Λ ( t ; z ) cannot always be expressed in closed form. The advantage of the above definition is that power transforms g ( u j ; q j ) are simply integrated with respect to t, which can also speed up computation of estimates for the parameter vector β . The possible forms for x j ( t ; z ) are a power transform of t including the unit transform g ( t ; 0 ) 1 , a power transform of z i for 1 i r , a power-transformed geometric combination in t together with one or more z i for 1 i r , and a power-transformed geometric combination in two or more z i for 1 i r not including t. Consequently, x j ( t ; z ) can be expressed as the product of a function of t and a function of z , that is,

x j ( t ; z ) = x j , 1 ( t ) x j , 2 ( z ) = g ( t ; q j ) x j , 2 ( z )

where x j , 2 ( z ) is either the unit transform or depends on some subset of z i for 1 i r but not on t. When for all j q j = 0 , none of the x j ( t ; z ) = x j ( z ) depend on t, and so the fractional polynomial model is the same as the model generated by the exponential distribution with λ ( t ; z ) = λ ( z ) = x T ( z ) β . Also, x j , 2 ( z ) equals the unit transform when x j ( t ; z ) = x j , 1 ( t ) depends only on t. The power q j is either the power for a single power transform of t or equals a product of a power q of t combined with a second power q transforming a geometric combination containing the power q of t. In general, x j , 2 ( z ) can contain power transforms of z i for 1 i r .

Integration gives

Λ ( t ; z ) = h T ( t ; z ) β

where h ( t ; z ) is the p × 1 vector with entries

h j ( t ; z ) = 0 t x j ( w ; z ) d w = ( 0 t g ( w ; q j ) d w ) x j , 2 ( z ) .

Assume that q j > 1 , so that

h j ( t ; z ) = g ( t ; q j + 1 ) q j + 1 x j , 2 ( z )

with g ( t ; q j + 1 ) q j + 1 a positive function of time t. The second term x j , 2 ( z ) is

nonnegative, but not identically 0, for example, when z consists of a single indicator predictor z. By definition, S ( t ; z ) is a decreasing function of t for each unique possible value of the vector z , and so Λ ( t ; z ) must be an increasing function of t for functions of z determined by the fractional polynomial model.

Let J = { j : 1 j p } , J 1 = { j : 1 j p , q j 0 } , and J 2 = J \ J 1 . Partition x ( t ; z ) into x 1 ( t ; z ) with entries x j ( t ; z ) for j J 1 and associated slope parameter vector β 1 as well as x 2 ( z ) with entries x j ( z ) for j J 2 depending only on z and not on t with associated slope parameter vector β 2 . The hazard function λ ( t ; z ) then satisfies

λ ( t ; z ) = x T ( t ; z ) β = x 1 T ( t ; z ) β 1 + x 2 T ( z ) β 2

so that

λ ( t ; z ) = j J 1 g ( t ; q j ) x j , 2 ( z ) β j + j J 2 x j , 2 ( z ) β j .

Assume for now that J 2 is empty. For models based on a single transform of t,

λ ( t ; z ) = λ ( t ) = g ( t ; q 1 ) β 1

can be guaranteed to be positive by restricting the associated slope β 1 to be positive-valued. For models based on multiple transforms of t but not on z ,

λ ( t ; z ) = λ ( t ) = j J 1 g ( t ; q j ) β j

can generate complicated integrals Λ ( t ; z ) = Λ ( t ) that are not increasing in t in general. Restricting all the associated slopes β j for j J 1 to be positive-valued is a straight forward way to guarantee that Λ ( t ) is increasing in t. More general models with

j J 1 g ( t ; q j ) x j , 2 ( z ) β j

generate integrals Λ ( t ; z ) that need to be increasing in each unique choice for the terms x j , 2 ( z ) . This can be guaranteed in a straight forward way once again by restricting all the associated slopes β j for j J 1 to be positive-valued.

On the other hand, when J 2 is nonempty, the assumption of all positive-valued slope parameters would be too restrictive. For example, consider a model based on two primary predictors: t and an indicator variable z for membership in one of two groups of study participants with additive effects to t and z. The term

x 1 T ( t ; z ) β 1 = x 1 T ( t ) β 1

models the dependence of the hazard function on t in terms of p 1 parameters while x 2 T ( z ) β 2 = z β p provides for a shift in the hazard function for the group coded as a 1 compared to the hazard function for the group coded as a 0. In general, this shift can be in the negative or the positive direction, and so the slope of β 2 needs to be allowed to be negative. However, this means that x T ( t ; z ) β and h T ( t ; z ) β can sometimes take on negative or zero values. See Section 2.3 for how to handle such cases.

Hazard ratios are naturally generated for Cox regression models based on proportional hazard rates. Hazard ratios are not needed to assess a fractional polynomial model but can be generated if desired. For example, consider the simple fractional polynomial model λ ( t ; z ) based on time t and a single indicator z for one of two groups. The associated hazard ratio is given by λ ( t ; 1 ) / λ ( t ; 0 ) , that is, the hazard rate for z = 1 divided by the hazard rate for z = 0 . Note that in general this is a function of time t.

2.3. Maximum Likelihood Estimation

Observed survival times can be of two types: an actual survival time or a right censored survival time with the actual survival time only known to be larger than the observed right censored time. Thus, the observations have the form O s = { t s , c s , z s } for s S = { s : 1 s n } where t s > 0 is a time value, c s is an indicator for whether t s is right censored (i.e., c s = 1 ) or an actual, uncensored survival time (i.e., c s = 0 ), and z s is a possibly trivial r × 1 predictor vector with values z s , i for 1 i r . Note that all observations can be uncensored, but not all observations should be censored.

Let x ( t s ; z s ) be the associated observed vectors determined by some fractional polynomial model. For each s S , the likelihood term L ( O s ; β ) for observation O s satisfies

L ( O s ; β ) = { f ( t s ; z s ) , c s = 0 S ( t s ; z s ) , c s = 1

so that the associated log-likelihood term satisfies

l ( O s ; β ) = log ( L ( O s ; β ) ) = ( 1 c s ) log ( λ ( t s ; z s ) ) Λ ( t s ; z s ) .

The likelihood L ( S ; β ) equals the product over s S of the likelihood terms L ( O s ; β ) with associated log-likelihood

l ( S ; β ) = log ( L ( S ; β ) ) = s S l ( O s ; β ) .

The maximum likelihood estimate β ( S ) of β maximizes the log-likelihood l ( S ; β ) over β , which is achieved by solving the estimating equations

l ( S ; β ) β = 0.

For simplicity of notation, parameter estimates β ( S ) are denoted as functions of the index set S for the observed data used in their computation without hat (^) symbols.

The gradient vector has entries satisfying

l ( S ; β ) β j = s S l ( O s ; β ) β j = s S ( ( 1 c s ) x j ( t s ; z s ) x T ( t s ; z s ) β h j ( t s ; z s ) )

for j J . The Hessian matrix has entries satisfying

2 l ( S ; β ) β j β j = s S 2 l ( O s ; β ) β j ' β j = s S ( ( 1 c s ) x j ( t s ; z s ) x j ( t s ; z s ) ( x T ( t s ; z s ) β ) 2 )

for j , j J . The gradient vector and Hessian matrix can be used in a Newton-Raphson algorithm to compute the maximum likelihood estimate β ( S ) .

When x T ( t s ; z s ) β 0 or h T ( t s ; z s ) β 0 for some s, set λ ( t s ; z s ) = δ for some small positive value δ like 0.00001. Also, set Λ ( t s ; z s ) = δ t . However, leave the gradient vector and Hessian matrix unchanged so that maximum likelihood estimation uses actual derivatives with respect to the parameter vector β .

The estimated covariance matrix for the parameter estimate vector β ( S )

equals the matrix with entries ( 2 l ( S ; β ( S ) ) β j β j ) 1 . Variances for the model

parameters are given by the diagonal entries of this covariance matrix. These can be used to compute z tests for zero individual model parameters for assessing specific theoretically important models. However, tests for parameters of adaptively generated models are typically significant as a consequence of the adaptive modeling process (as summarized in Section 2.5). Consequently, results for such tests are not reported for models generated in example analyses.

Models based on a single transform with p = 1 can be computed directly. In these cases, β = ( β 1 ) and the gradient is a scalar value satisfying

l ( S ; β ) β 1 = s S ( 1 c s β 1 h 1 ( t s ; z s ) )

so that

β 1 ( S ) = n s S c s s S h 1 ( t s ; z s ) .

2.4. Likelihood Cross-Validation

Partition the index set S into k > 1 disjoint sets S ( h ) , called folds, for h H = { h : 1 h k } . The LCV score for a given model is defined as

LCV = h H ( L ( S ( h ) ; β ( S \ S ( h ) ) ) ) 1 / n .

In other words, evaluate the likelihood for each fold S ( h ) using the parameter vector β ( S \ S ( h ) ) computed by maximizing the likelihood for the complement S \ S ( h ) of the fold, normalize it by the sample size n, and multiply these normalized deleted fold likelihoods together to get the LCV score.

When one or more slopes β j 0 for some j J 1 are generated as part of the estimation computations, reset l ( S ; β ) = Δ for some large value Δ such as 700 unless l ( S ; β ) is already smaller than Δ . For a model with parameter vector estimate β ( S ) having one or more slopes β j ( S ) 0 for some j J 1 , reset the LCV score to a very small value δ such as 10−12. These adjustments guarantee that the adaptive modeling process of Section 2.5 generates models with acceptable parameter vector estimates β ( S ) .

A larger LCV score indicates a better model, but not necessarily a distinctly (substantially, significantly) better model. A χ 2 -based LCV ratio test, analogous to a likelihood ratio test, can be used to decide if there is a distinct improvement or not in the LCV score. Following [7] , these tests are expressed in terms of the percent decrease (PD) in the LCV score for the model with the smaller score compared to the model with the larger score and a cutoff for a distinct PD, changing with the sample size. A PD larger than the cutoff indicates that the model with the larger score distinctly (substantially, significantly) improves on the model with the smaller score. Otherwise, the model with the smaller score is a competitive alternative and, if also simpler, it is a parsimonious, competitive alternative and so a preferable choice. Examples of LCV ratio tests are provided in Section 3.

2.5. Adaptive Model Selection

An effective choice of a fractional polynomial hazard rate model based on a subset of the primary predictors t and the coordinates of z , is identified adaptively using a heuristic search process controlled by tolerance parameters indicating how much of a change in the LCV score is allowable at each stage of the process. Starting from a base hazard rate model, which is usually the constant model, power transforms and also geometric combinations if requested are systematically added to the model. Then, the expanded model is contracted, removing extraneous transforms if any and adjusting the powers of the remaining transforms. LCV ratio tests are used to decide whether to stop the contraction or continue removing transforms. Only a brief overview of the adaptive modeling process is provided here; details are provided in [10] . Adaptive modeling applies to both independent and correlated outcomes in a variety of regression contexts such as linear regression [7] , logistic regression [11] , Poisson regression [12] [13] , and discrete regression [14] .

Power transforms of general time variables T can cause computational problems for the adaptive modeling process when some time values are large. For this reason, fractional polynomial hazard rate models are computed using the normalized survival time variable T with observed values t s = t s / t * where

t * = max { t s : s S } .

This generates estimates λ ( t ; z ) and S ( t ; z ) of the hazard rate and survival function, respectively. Then, define the estimated hazard rate and survival function for the original survival time variable T as

λ ( t ; z ) = λ ( t / t * ; z ) and S ( t ; z ) = S ( t / t * ; z )

for any t and z . This adjustment guarantees that power transforms are computed using bounded time values within the interval (0, 1]. It is not needed when t * 1 to start with, but that is unlikely to hold.

2.6. Modeling Constraints

A variety of constraints are needed so that a fractional polynomial provides an appropriate hazard rate model. Specifically, predictors z i for 1 i r are restricted to be nonnegative. Power transforms g ( t ; q ) have powers restricted to satisfy q > 1 . Estimates of hazard rates λ ( t ; z ) and their integrals Λ ( t ; z ) are adjusted to be positive-valued (as addressed in Section 2.3). Slope parameters β j for j J 1 corresponding to predictors depending on time t are restricted to be positive-valued (as addressed in Section 2.4). Also, fractional polynomial hazard rate models are computed using the normalized survival times (as addressed in Section 2.5).

3. Example Analyses

Example analyses are presented in this section demonstrating the use and applicability of adaptive hazard regression modeling based on factional polynomials. Two survival time data sets are used in these analyses including survival times for lung cancer patients in Section 3.1 and for multiple myeloma patients in Section 3.2.

All analyses are conducted using SAS® Version 9.4. A SAS macro for conducting adaptive modeling in a variety of regression contexts including hazard regression is available on request from the author.

3.1. Example Analyses of Survival Times for Lung Cancer Patients

Data are provided in [15] consisting of n = 137 survival times for lung cancer patients, ranging from 1 to 999 days with 6 (6.6%) right censored. The two largest survival times of 991 and 999 days are much larger than the next largest survival time of 587 days and so have the potential for being highly influential. Note that since t * = 999 is of the order of 103, even moderate-sized powers can generate very large transformed time values. For example, the power q = 4 generates some transformed time values of the order of 1012, indicating the need for analyzing normalized time values. There are several available predictors other than time, but the example analyses only address cancer cell type with 27 (19.7%), 27 (19.7%), 48 (35.0%), and 35 (25.6%) patients having adeno, large, small, and squamous cell types, respectively.

The cutoff for a distinct percent decrease (PD) in the LCV score for data with n = 137 observations is 1.39%. All analyses use k = 5 folds with fold sizes ranging from 19 (13.9%) to 32 (23.4%) patients so that both fold sizes and fold complement sizes are not proportionately sparse.

The adaptively generated hazard rate model in time alone is based on the single time transform t 0.16 without an intercept and with LCV score 2.68420. In contrast, the constant hazard rate model has LCV score 2.61568 with distinct PD 2.55% (i.e., greater than the cutoff of 1.39% for a distinct PD). Consequently, the hazard rate is distinctly nonconstant in time. Figure 1 provides the plot for the estimated hazard rate as a function of time. The hazard rate decreases nonlinearly from 15.1 at 1 day to 5.0 at 999 days.

The impact of cell type on the hazard rate is assessed using the adaptive model based on time, indicators for each of the four cell types, and geometric combinations in these five primary predictors. The generated model is based on indicators for having the squamous and large cell types with an intercept and no transforms of time. The LCV score is 2.92430. In contrast, the LCV score of 2.68420 for the model based on time alone generates a distinct PD of 8.21%. Consequently, consideration of cell type provides a distinct improvement.

Furthermore, the hazard rate is reasonably considered to no longer depend on

Figure 1. The estimated hazard rate as a function of time for lung cancer patients.

time after controlling for cell type and so supports the use of the exponential distribution for modeling lung cancer survival times. Under the cell type model, the estimated hazard rate is 13.7 for patients with adeno or small cell types, decreases to 5.8 for patients with large cell type, and then decreases further to 4.4 for patients with squamous cell type. Figure 2 provides the plot of the estimated survival probability as a function of time. The estimated probability of survival decreases over time with the value at any given time smallest (worst) for patients with adeno or small cell type, larger for patient with large cell type, and largest for patients with squamous cell type.

The standard Cox regression model for survival as a one-way analysis of variance factor based on cell type has non-significant F test with p = 0.57 . Under the reduced model based on effects to large and squamous cell types as generated adaptively, both effects are nonsignificant ( p = 0.17 for large and p = 0.71 for squamous cell types). These results indicate that Cox regression can in some cases be unable to identify effects on survival identifiable with adaptive hazard regression modeling and in this case using the special case based on the exponential distribution.

3.2. Example Analyses of Survival Times for Multiple Myeloma Patients

Data are provided in [16] of n = 65 survival times for multiple myeloma patients, ranging from 1.25 to 92 days with 17 (26.2%) right censored. There are several available predictors other than time, but the example analyses only address hemoglobin at diagnosis. Hemoglobin values range from 4.9 to 14.6 g/dL.

Figure 2. The estimated probability of survival as a function of time by cell type for lung cancer patients.

The cutoff for a distinct percent decrease (PD) in the LCV score for data with n = 65 observations is 2.91%. All analyses use k = 5 folds with fold sizes ranging from 8 (12.3%) to 16 (24.6%) patients so that both fold sizes and fold complement sizes are not proportionately sparse.

The adaptively generated hazard rate model in time alone is based on the single time transform t 26 with an intercept and with LCV score 1.07551. In contrast, the constant hazard rate model has LCV score 1.02430 with distinct PD 4.76% (i.e., greater than the cutoff of 2.91% for a distinct PD). Consequently, the hazard rate is distinctly nonconstant in time. Figure 3 provides the plot for the estimated hazard rate as a function of time. The hazard rate is essentially constant at 2.7 for 1.25 to 67 days and then increases very quickly to 44.2 at 92 days.

The adaptively generated model additive in time and hemoglobin (i.e., not considering geometric combinations) is based on t 26 and hemoglobin−0.9 without an intercept. The LCV score is 1.11557. In contrast, the LCV score of 1.07551 for the time alone model generates a distinct PD 3.59%. Consequently, consideration of hemoglobin provides a distinct improvement. Furthermore, the adaptively generated model in time, hemoglobin, and geometric combinations is based on hemoglobin−0.9 and

( t 26 hemoglobin 3 ) 3.4 = t 88.4 hemoglobin 10.2

without an intercept. The LCV score is 1.14286. In contrast, the LCV score for the additive model generates a distinct PD 2.39%. Consequently, hemoglobin distinctly moderates (or modifies) the effect of time on the hazard rate.

Interpretation of the model based on hemoglobin can be complicated. A more readily interpretable adaptive model is the one with the continuous hemoglobin predictor replaced by the ordinal hemoglobin quintile predictor (with values 1 - 5

Figure 3. The estimated hazard rate as a function of time for multiple myeloma patients.

corresponding to consecutive quintiles). The adaptively generated moderation model using hemoglobin quintile (HG_quintile) instead of hemoglobin is based on HG_quintile2.5,

( t 26 HG_quintile 7 ) 3.2 = t 83.2 HG_quintile 22.4 ,

( t 11 HG_quintile ) 6 = t 66 HG_quintile 6 ,

and an intercept. The LCV score is 1.21511. In contrast, the LCV score of 1.14286 for the moderation model in unadjusted hemoglobin generates a distinct PD 5.95%. Consequently, consideration of hemoglobin quintiles not only provides a more interpretable model but also provides a distinct improvement.

Table 1 contains ranges for exact quintiles, each consisting of 13 (20%) patients, as well as estimated hazard rates within these quintiles. For larger quintiles, the estimated hazard rate is a decreasing value, constant in time except for the largest time value in four of the five quintiles (all but quintile 2). Figure 4 contains the plot of the survival function by quintile. The estimated probability of survival decreases over time with the value at any given time increasing (better) for patients within quintiles 1 - 5, but more so for quintiles 4 - 5.

Figure 5 contains the plot of the hazard ratio for quintile 5 compared to quintile 1 as a function of time. The hazard ratio is constant at 0.31 up to about day 58 then decreases quickly to 0 by about day 68 and stays there to day 92. However, patients in quintile 1 have observed survival times of at most 66 days (Table 1), and so hazard rates based on quintile 1 after that day and associated hazard ratios may be of questionable value.

Figure 4. The estimated probability of survival as a function of time by quintiles 1 - 5 for hemoglobin at diagnosis for multiple myeloma patients.

Table 1. Estimated hazard rates over time by hemoglobin quintiles for multiple myeloma patients.

HG-hemoglobin; 1All quintiles contain exactly 13 (20%) of the 65 patients.

Figure 5. The estimated hazard ratio as a function of time for the fifth quintile of hemoglobin at diagnosis compared to the first quintile for multiple myeloma patients.

4. Discussion

An adaptive approach is formulated for hazard regression modeling. Hazard rates are based on fractional polynomials, that is, linear combinations of products of powers of time and other available predictors. Models are restricted so that hazard rate estimates are positive-valued and so that estimated survival functions are decreasing in time for appropriate functions of the non-time predictors. Models are estimated using maximum likelihood estimation accounting for the possibility of right censored survival times.

Models are compared and evaluated using k-fold likelihood cross-validation (LCV) scores, that is, normalized products of deleted likelihoods for k subsets of the data, called folds, computed using parameters estimated using the complements of those folds. Larger LCV scores indicate better models, but not necessarily distinctly better models. Distinct differences in LCV scores are assessed using LCV ratio tests generalizing standard likelihood ratio tests expressed in terms of a cutoff for a distinct percent decrease in the LCV score.

Effective models of a specific type (e.g., time alone, additive, or moderation) for analyzing a given survival time data set are generated using adaptive searches through alternate models of that specific type. The search first expands the model in terms of power transforms of time and other available predictors. The expansion can optionally generate geometric combinations of available predictors, that is, products of powers of multiple predictors generalizing standard interactions and providing for a nonlinear extension to standard linear moderation. The expanded model is then contracted by removing extraneous transforms and adjusting the powers for the remaining transforms to improve the LCV score. An LCV ratio test is used to decide whether to stop the contraction. This guarantees that the generated model is effective in the sense that the removal of each of its transforms generates a distinct percent decrease in the LCV score.

Adaptive hazard rate modeling using fractional polynomials is demonstrated using two different survival time data sets. The first data set consists of survival times for patients with lung cancer of differing cell types (adeno, large, small, or squamous). The second data set consists of survival times for multiple myeloma patients with varying values for hemoglobin at diagnosis.

For the lung cancer patients, the hazard rate depends distinctly nonlinearly on time (Figure 1). However, cell type provides a distinct improvement. The hazard rate no longer depends on time after accounting for cell type and so supports the use of the exponentially distributed survival times for lung cancer patients. The estimated hazard rate is a decreasing constant for patients having lung cancer with the following three sets of cell types: adeno or small, large, and squamous. The associated estimated survival function decreases over time but more slowly as cell type changes from adeno or small to large and then to squamous (Figure 2). Moreover, Cox regression is unable to identify a cell type effect.

For the multiple myeloma data, the hazard rate also depends distinctly on time (Figure 3). Consideration of hemoglobin at diagnosis provides a distinct improvement with the hazard rate still depending distinctly on time so that the exponential distribution is not an appropriate choice for modeling these data. Furthermore, hemoglobin at diagnosis distinctly moderates the effect of time on the hazard rate so that an assumption of proportional hazards is not applicable for these data. Consideration of quintiles for hemoglobin at diagnosis provides for a readily interpretable effect of hemoglobin on the hazard rate (Table 1) and on the associated survival function (Figure 4). While the hazard ratio need not be considered to understand adaptive hazard regression models, it can be generated if desired and can be a function of time. For example, Figure 5 displays the hazard ratio for the fifth hemoglobin quintile versus the first quintile as a function of time.

Adaptive hazard rate modeling is not restricted to proportional hazard rates as for Cox regression. It also accounts for the dependence of the hazard rate on time. In contrast to the approach of [2] , the dependence on time is based on fractional polynomials rather than on splines and so generates smoother nonlinear curves. Moreover, it models the hazard rate directly rather than modeling the log of the hazard rate so that the integral of the hazard rate can be computed in closed form resulting in reduced times to compute parameter estimates. In contrast to [3] , it addresses the full distribution of survival times as opposed to selected quantiles. In contrast to [4] , hazard rate models based on fractional polynomials are more extensive than models based on piecewise constant hazard rates. Also, consideration of geometric combinations accounts for varying dependence of the hazard rate on time for differing sets of predictor values.

In summary, adaptive hazard regression modeling has been formulated to generalize exponential distribution modeling to account for nonlinearity of the hazard rate as well as additive and moderation effects due to other available predictors. It has also been demonstrated using analyses of survival times for lung cancer patients and for multiple myeloma patients. Results of these analyses indicate that adaptive hazard rate modeling can provide unique insights into survival time data. For example, it is possible to assess whether there is a distinct dependence of the hazard rate on time (as held for both the lung cancer data and the multiple myeloma data), whether consideration of another available predictor can provide distinct additive effects (as for lung cancer cell types), and whether another available predictor can moderate the effect of time on the hazard rate (as for hemoglobin at diagnosis for multiple myeloma patients). However, adaptive hazard rate modeling is limited to modeling independent times to the occurrence of a single event, for example, times to death. Future research is needed to address modeling of correlated times to recurrent events, for example, times to multiple hospitalizations for cancer patients. An extension is also needed to address discrete survival times.

Acknowledgements

Prior development of the SAS macro used to conduct reported example analyses was supported in part by grants R01 AI57043 from the National Institute of Allergy and Infectious Diseases and R03 MH086132 from the National Institute of Mental Health. Work reported on in this article was not supported by external funding.

Conflicts of Interest

The authors declare no conflicts of interest.

References

[1] Cox, D.R. (1975) Partial Likelihood. Biometrika, 62, 269-276.
https://doi.org/10.1093/biomet/62.2.269
[2] Kooperberg, C., Stone, C.J. and Truong, Y.K. (1995) Hazard Regression. Journal of the American Statistical Association, 90, 78-94.
https://doi.org/10.1080/01621459.1995.10476491
[3] Qian, J. and Peng, L. (2010) Censored Quantile Regression with Partially Functional Effects. Biometrika, 97, 839-850.
https://doi.org/10.1093/biomet/asq050
[4] Bouaziz, O. and Nuel, G. (2017) L0 Regularization for the Estimation of Piecewise Constant Hazard Rates in Survival Analysis. Applied Mathematics, 8, 377-394.
https://doi.org/10.4236/am.2017.83031
[5] Turkson, A.J. (2022) Perspectives on Hazard Rate Functions: Concepts; Properties; Theories; Methods; Computations; and Application to Real-Life Data. Open Access Library Journal, 9, e8275.
https://doi.org/10.4236/oalib.1108275
[6] Royston, P. and Altman, D.G. (1994) Regression Using Fractional Polynomials of Continuous Covariates: Parsimonious Parametric Modeling. Journal of the Royal Statistical Society. Series C: Applied Statistics, 43, 429-467.
https://doi.org/10.2307/2986270
[7] Knafl, G.J. (2018) Adaptive Fractional Polynomial Modeling. Open Journal of Statistics, 8, 159-186.
https://doi.org/10.4236/ojs.2018.81011
[8] Baron, R.M. and Kenny, D.A. (1986) The Moderator-Mediator Variable Distinction in Social Psychology Research: Conceptual, Strategic, and Statistical Considerations. Journal of Personality & Social Psychology, 51, 1173-1182.
https://doi.org/10.1037/0022-3514.51.6.1173
[9] Royston, P. and Sauerbrei, W. (2008) Multivariable Model-Building: A Practical Approach to Regression Analysis Based on Fractional Polynomials for Modelling Continuous Variables. John Wiley & Sons, Hoboken.
https://doi.org/10.1002/9780470770771
[10] Knafl, G.J. and Ding, K. (2016) Adaptive Regression for Modeling Nonlinear Relationships. Springer International Publishing, Switzerland.
[11] Knafl, G.J. (2018) A Reassessment of Birth Defects for Children of Participants of the Air Force Health Study. Open Journal of Epidemiology, 8, 187-200.
https://doi.org/10.4236/ojepi.2018.84015
[12] Knafl, G.J. and Meghani, S.H. (2021) Modeling Individual Patient Count/Rate Data over Time with Applications to Cancer Pain Flares and Cancer Pain Medication Usage. Open Journal of Statistics, 11, 633-654.
https://doi.org/10.4236/ojs.2021.115038
[13] Knafl, G.J. (2022) Adaptive Regression for Nonlinear Interrupted Time Series Analyses with Application to Birth Defects in Children of Vietnam War Veterans. Open Journal of Statistics, 12, 789-809.
https://doi.org/10.4236/ojs.2022.126045
[14] Knafl, G.J. and Meghani, S.H. (2021) Regression Modeling of Individual-Patient Correlated Discrete Outcomes with Applications to Cancer Pain Ratings. Open Journal of Statistics, 12, 456-485.
https://doi.org/10.4236/ojs.2022.124029
[15] Kalbfleisch, J.D. and Prentice, R.L. (1980) The Statistical Analysis of Failure Time Data. John Wiley & Sons, New York.
[16] Krull, J.M., Uthoff, V.A. and Harley, J.B. (1975) A Step-Up Procedure for Selecting Variables Associated with Survival. Biometrics, 31, 49-57.
https://doi.org/10.2307/2529709

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.