Risk Factors Associated with Hospital Mortality: Analysis of the Length ofStay Using Risk Prediction Cox Regression Non-Proportional Hazard Model ()
1. Introduction
The ability to evaluate hospital performance using patient outcome data depends upon many factors. In principle, the outcome needs to reflect features that are directly affected by the quality of hospital care, to name but a few: mortality, readmission rates of patients and employee satisfaction [1] [2]. One of the important factors that we shall include in this study is the Diagnostic Related Groups (DRGs). This concept was first developed at Yale University in 1975 [3]. The main objective was to group patients with similar treatments and conditions for comparative studies. DRGs were designed to be homogeneous units of hospital activity to which binding prices could be attached. A central theme in the advocacy of DRGs was that this reimbursement system would, by constraining the hospitals, oblige their administrators to alter the behavior of the physicians and surgeons comprising their medical staff. Hospitals were forced to leave the nearly risk-free world of cost reimbursement, and face the uncertain financial consequences associated with the provision of health care. DRGs were designed to provide practice pattern information that administrators could use to influence individual physician behavior. From a statistical viewpoint, DRG’s are considered artificial clusters of subjects.
Krumholz et al. [4] discussed several factors that should be considered when assessing hospital quality. These relate to differences in the chronic and clinical acuity of patients at hospital presentation, the number of patients treated at a hospital, the frequency of the outcome studied, the extent to which the outcome reflects a hospital quality signal, and the form of the performance metric used to assess hospital quality. However, issues related to DRG have not been considered as factors of importance. Since the outcome of interest is hospital mortality, any attempt to derive risk-adjusted mortality that does not take into account the relative importance of DRG will produce biased estimates. This issue will be highlighted within the statistical models that are considered in this paper.
2. Study Design and Study Variables
Hospital discharge status, available from the hospital medical records from January 2016 through December 2018, was extracted. For each subject, the age at admission, length of stay, presence of metabolic syndrome (at least one of its components) as we define in the next section, and DRG membership were included in this cross-sectional retrospective design. The study was reviewed and approved by the Institutional Review Board at the King Faisal Specialist Hospital and Research Center (KFSHRC).
2.1. Dependent Variable
Length of stay (LOS) is the dependent variable, measured on a continuous scale. It is considered a time-to-event variable, where the event is in hospital death. Patients who survive the observation period are considered censored.
2.2. Independent Variables
1) DRG
Because DRG is a categorical variable with an excessive number of levels, our modeling strategy used DRG as a categorical variable and is modeled as a fixed effect variable. We selected the five most prevalent DRGs in the database.
The evaluation process of hospital performance will then be based on an effective risk adjustment. Though one might wish to have additional information on patient attributes and clinical severity, even with currently available data, we should evaluate whether a more flexible risk adjustment model will improve performance. Table 1 shows the summary measures of age at admission for each of the selected DRG.
We were able to obtain such information for five DRG groups, as listed below in Table 1:
a) Acute Leukemia (R60B)
b) Lymphoma (R61B)
c) Endocrine metabolic diseases (K64B)
d) Kidney diseases (L04C)
e) Diseases of the respiratory systems (E62B)
Table 1. Summary measures of patients’ age at admission (AAA) for each DRG.
DRG |
N |
MEAN |
SD |
MEDIAN |
IQR |
Minimum |
Maximum |
Acute Leukemia |
1308 |
16.7 |
18.8 |
9.5 |
21 |
1 |
218 |
Endocrine Metabolic Disease |
1127 |
4.17 |
3.03 |
4 |
1 |
1 |
314 |
Kidney disease |
1173 |
5.71 |
4.2 |
5 |
2 |
1 |
48 |
Lymphoma |
1273 |
11 |
12 |
7 |
9 |
1 |
174 |
Respiratory System Diseases |
1013 |
9 |
10.14 |
6 |
7 |
1 |
107 |
1) Metabolic Syndrome as an independent covariate
Although Saudi Arabia reports one of the highest prevalence levels of obesity and diabetes, a very limited number of epidemiological studies have examined the prevalence of metabolic syndrome in Saudi Arabia [5]. The prevalence of metabolic syndrome in Saudi Arabia was found to be 39.8% (34.4% in men and 29.2% in women) and 31.6% (45.0% in men and 35.4% in women), according to the NCEP ATP III and IDF criteria, respectively. Metabolic syndrome was also observed to be more prevalent among men and older subjects. The most frequently observed component of metabolic syndrome was found to be low levels of high-density lipoprotein (HDL), followed by abdominal obesity. The World Health Organization’s widely used definition of metabolic syndrome is [6]. The components of each definition and criteria for making the diagnosis of the metabolic syndrome are summarized in Table 1. In addition, definitions were proposed by the European Group for the Study of Insulin Resistance [7] and the American Association of Clinical Endocrinologists. Essentially, these are modifications of the WHO and NCEP definitions, respectively.
Recently, the International Diabetes Federation has proposed a new definition (see Table 2) that it hopes will become the international standard. This definition is similar to the NCEP definition, being based on relatively simple measures applicable in a clinical or epidemiological setting, but differs in three important respects. Central obesity, as determined by waist circumference, is mandatory, and different waist cut points for different ethnic groups are given based on available data linking waist circumference to other components of the syndrome. Finally, the IDF definition uses a lower fasting glucose level than the original NCEP definition, using the American Diabetes Association 2003 cut point for impaired fasting glucose [8].
The abbreviations of the medical official bodies given in Table 2 are:
IDF = International Diabetes Federation
WHO = World Health Organization
EGIR = European Group for the Study of Insulin Resistance
NCEP = National Cholesterol Education Program
Table 2. Definition of the components of metabolic syndrome [Journal of the Royal Society of Medicine, Vol. 99, September 2006].
Component |
IDF |
WHO |
EGIR |
NCEP |
M |
F |
M |
F |
M |
F |
M |
F |
1. Central Obesity |
≥102 |
≥88 |
≥102 |
≥88 |
≥94 |
≥80 |
≥102 |
≥88 |
2. Raised TG |
≥1.7 |
≥1.7 |
≥1.7 |
≥1.7 |
≥2.0 |
≥2.0 |
≥1.7 |
≥1.7 |
3. Low HDL |
<1.03 |
<1.29 |
≤.9 |
≤1 |
≤1 |
≤1 |
≤1.03 |
≤1.29 |
4. Hypertension |
≥130/85 |
≥130/85 |
≥140/90 |
≥140/90 |
≥140/90 |
≥140/90 |
≥130/85 |
≥130/85 |
5. Fasting Glucose |
≥5.6 |
≥5.6 |
≥6.1 |
≥6.1 |
≥6.1 |
≥6.1 |
≥6.1 |
≥6.1 |
The main reason why the metabolic syndrome attracts the attention of clinicians and epidemiologists is that its components are associated with increased morbidity, long-term disability and eventually death. It is also believed that both cancer and heart disease are consequences of the syndrome if left untreated. However, some researchers question the importance of including the syndrome as a risk factor for heart diseases beyond what is expected in a multifactorial model [9]. The volume of studies on the distribution of the syndrome in different populations is quite large and spans countries from all over the world. In our study, a patient who has at least one component of the metabolic syndrome will be coded as (1); otherwise, it will be given the code (0).
In addition to the above two categorical variables, patients’ age at admission (AAA) will be recorded as a continuous variable in the constructed risk prediction model.
3. Statistical Analysis
3.1. Modeling Time-to-Event Data
In this section, we shall develop a Cox proportional hazard model. The Cox model is one of the most accurate methods belonging to the class of semiparametric statistical models. This model has the advantage that it can use different types of independent variables (continuous, ordered categorical, and nominal variables). In the second stage of the analysis, we develop a predictive model for the risk of overstaying. The regression coefficients, their standard errors, and the corresponding p-values are presented in Table 3.
Table 3. Results of fitting Cox proportional hazard model using R.
Covariate |
Coef |
Exp (coef) |
Se (coef) |
Z |
P-value |
log (AAA) |
0.038 |
1.038 |
0.002 |
21.384 |
<2e−16*** |
metabolic |
0.278 |
1.320 |
0.083 |
3.335 |
0.001*** |
Endocrine |
2.433 |
11.391 |
0.096 |
25.287 |
<2e−16*** |
Kidney |
1.431 |
4.183 |
0.101 |
14.209 |
<2e−16*** |
Lymph |
0.903 |
2.467 |
0.093 |
9.738 |
<2e−16*** |
Respiratory |
0.753 |
2.123 |
0.098 |
7.636 |
2.24e−14*** |
Note that: Acute Leukemia is the reference category.
The estimated survival curve is given in Figure 1.
In order to construct a plausible Cox regression model [10] to be able to reliably predict the risk of death of any patient with a particular covariate profile, we need to make sure that the assumptions of the Cox model are satisfied. There are three basic assumptions:
1) The proportional hazards assumption should be verified.
2) Examining influential observations (or outliers).
3) Detecting nonlinearity in the relationship between the log hazard and the covariates that are measured on the continuous scale.
In order to check these model assumptions, we use residuals. The common residuals for the Cox model include:
Figure 1. Survival probabilities.
Schoenfeld residuals to check the proportional hazards assumption [11]
Martingale residual to assess nonlinearity [12]
Deviance residual (symmetric transformation of the Martingale residuals), to examine influential observations [12]
The proportional hazards (PH) assumption can be checked using statistical tests and graphical diagnostics based on the scaled Schoenfeld residuals [13]. If the proportionality assumption is not satisfied, we use the weighted Cox regression [14]-[16].
In principle, the Schoenfeld residuals are independent of time. A plot that shows a non-random pattern against time is evidence of violation of the PH assumption.
The function cox.zph() in the survival package in R [17] provides a convenient solution to test the proportional hazards assumption for each covariate included in a Cox regression model fit.
For each covariate, the function cox.zph() correlates the corresponding set of scaled Schoenfeld residuals with time to test for independence between residuals and time. Additionally, it performs a global test for the model as a whole.
The proportional hazard assumption is supported by a non-significant relationship between residuals and time and refuted by a significant relationship. The testing for proportionality assumption can be achieved analytically and graphically. Using the R program, we found that,
DRG_CODE: p-value < 0.0000001, GLOBAL: p-value < 0.0000004
From the output above, the tests are statistically significant for the DRG covariate, and the global test is also statistically significant. This means there is not enough evidence in the data to support the proportionality assumption. The violation of the proportionality assumption can also be detected graphically using the survival function plot. From Figure 2, the crossing survival functions for each of the DRG indicate that the proportionality assumption is violated. Therefore, we cannot assume the proportional hazards.
One solution is the weighted Cox regression, which is also valid for non-proportional hazards.
Figure 2. Crossing survival curves is indicative of a violation of the proportional hazard assumption.
3.2. Weighted Cox Regression Model
In the standard unweighted partial likelihood, all patients contribute to the estimation of the regression coefficients to the same extent. This might not be desirable when the cohort is heterogeneous due to known subgroups that are associated with different prognoses. In this situation, it is reasonable to fit a separate Cox model for each subgroup. This can be done by using only the data from the subgroup of interest or by including information from the other subgroups [15] [16].
3.3. Testing Influential Observation
It’s also possible to check outliers by visualizing the deviance residuals. The deviance residual is a normalized transformation of the martingale residual [13]. These residuals should be roughly symmetrically distributed at about zero with a standard deviation of 1. From Figure 3, the pattern looks symmetric around 0. Therefore, we conclude that there are no influential observations.
3.4. Testing for Outliers
The below index plots in Figure 4 show that comparing the magnitudes of the largest dfbeta values to the regression coefficients suggests that none of the
Figure 3. Deviance residual plot to check for influential observations.
Figure 4. Testing for outliers using deviance residuals.
observations is terribly influential individually, even though some of the dfbeta values for age are large compared with the others. It’s also possible to check outliers by visualizing the deviance residuals. The deviance residual is a normalized transformation of the martingale residual. These residuals should be roughly symmetrically distributed about zero with a standard deviation of 1. Positive values correspond to individuals that “died too soon” compared to expected survival times. Negative values correspond to individuals that “lived too long”. Very large or small values are outliers, which are poorly predicted by the model.
3.5. Testing for the Functional Form of the Continuous Variable Age at Admission
Plotting the Martingale residuals against continuous covariates is a common approach used to detect nonlinearity or, in other words, to assess the functional form of a covariate.
For a given continuous covariate, patterns in the plot may suggest that the variable is not properly fit. Nonlinearity is not an issue for categorical variables, so we only examine plots of martingale residuals and partial residuals against a continuous variable, which is Age-At-Admission.
Martingale residuals may present any value in the range (−INF, +1). A value of martingale residuals near 1 represents individuals that “died too soon”, and large negative values correspond to individuals that “lived too long”.
To assess the functional form of a continuous variable in a Cox proportional hazards model, we’ll use the function ggcoxfunctional [in the survminer R package].
As shown in Figure 5, the function ggcoxfunctional displays graphs of continuous
Figure 5. Testing for the functional form of the continuous covariate (Age at Admission).
covariates against martingale residuals of null Cox proportional hazards model. It is evident from Figure 5 that the logarithmic transformation of age at admission seems quite appropriate [14].
Table 4. Weighted Cox regression. Model fitted by weighted estimation using R.
Variable |
Coef |
Se (coef) |
Exp (coef) |
P-value |
DRG_CODEEndocrin |
2.004 |
0.098 |
7.422 |
0.0000 |
DRG_CODEKidney_D |
1.144 |
0.099 |
3.141 |
0.00001 |
DRG_CODELymph |
0.604 |
0.107 |
1.829 |
0.00001 |
DRG_CODERespirat |
0.451 |
0.101 |
1.570 |
0.00001 |
Log(AAA) |
0.038 |
0.0023 |
1.039 |
0.00001 |
Metabolic |
0.242 |
0.102 |
1.273 |
0.0019 |
Comparing the coefficients in Table 3 (Cox proportional hazard model fit) to their values in Table 4 (weighted Cox regression), we notice differences in the values of the estimates of the regression parameters corresponding to the categorical variable DRG.
4. Risk Prediction Function
The estimated survival function and the Cox regression estimated parameters are very useful in building risk prediction functions. We shall use the following notations:
(X1 = Endocrine, β1 = 2.004), (X2 = Kidney β2 = 1.144), (X3 = Lymph β3 = 0.604)
(X4 = Respiratory β4 = 0.451), (X5 = Log(AAA) β5 = 0.038), (X6 = metabolic β6 = 0.242)
We define the linear predictor as the function
.
The risk prediction equation is given by [18]:
(1)
The components of the prediction analyses are:
(2)
x1 = 1 if patient belongs to Endocrine disease group, and 0 otherwise,
x2 = 1 if patient belongs to Kidney disease group, and 0 otherwise,
x3 = 1 if patient belongs to Lymph disease group, and 0 otherwise
x4 = 1 if patient belongs to the Respiratory disease group, and 0 otherwise
x5 = Log(AAA), and x6 = 1 if patient belongs to Metabolic disease, and 0 otherwise.
Hence,
(3)
and
is the survival function at time t.
Equations (1)-(3) complete the specifications of the Cox regression prediction model. To clarify the utility of the above approach, we will consider the example below.
The above coding of the DRG’s means that the “Acute Leukemia” is taken to be the reference group. Moreover, from the Cox regression model, we estimate the 24-hour survival probability to be:
Now suppose that we have two Respiratory disease patients (x4 = 1). The age of one patient is 20 years, who has at least one component of the metabolic syndrome (x6 = 1), and the age of the other patient is 60 years, without metabolic syndrome. That is
, for the first patient, and the risk of staying over 24 hours is
For the other patient we have
, and the risk of staying over 24 hours is
Therefore, the relative risk is R1/R2 = 01.66.
This means that a 20-year-old Respiratory patient, who has metabolic syndrome, has a 66% increase in the risk of overstaying relative to a 60-year-old Respiratory disease patient who does not have metabolic syndrome.
5. Discussion
The results of this hospital mortality data analysis revealed that the fundamental assumption in Cox regression of Proportional Hazard must be assessed before estimating the risk prediction. More than one-fifth of the studies were found to include a probable non-proportional Cox model without the authors mentioning or adjusting it. The problems associated with the misunderstanding and misuse of statistical methods in medical research are well documented in [13]-[15]. Adding data from other subgroups in a penalized weighted Cox model should increase the power through a larger sample size compared to the classical subgroup analysis that ignores the information from all other individuals. Weights are based on the probability of belonging to the subgroup of interest.
6. Conclusion
Reporting survival analysis and testing of hospital mortality data must be preceded by testing the assumption of proportionality of the Cox proportional hazard regression model. Neglecting background assumptions of most used statistical methods may lead to unreliable results and conclusions. Obviously, this has implications for patient care as biased results may alter the conclusions and the objectives of the study. Therefore, involving epidemiologists and biostatisticians in the planning stage of the study is highly recommended.
Acknowledgements
The authors acknowledge the constructive comments made by an anonymous reviewer.
Disclosure Statement
Dr. Mawahib Ahmed conceived the idea; Dr. M. Shoukri provided the data and helped with the analysis. All authors collaborated to write the final version of the paper.