Risk Factors Associated with Hospital Mortality: Analysis of the Length ofStay Using Risk Prediction Cox Regression Non-Proportional Hazard Model

Abstract

Background: In-hospital mortality is a key indicator of the quality of care. Studies so far have demonstrated the influence of patient and hospital-related factors on in-hospital mortality. Currently, new variables, such as components of metabolic syndrome as comorbid conditions, are being incorporated as independent risk factors. We aimed to identify which individual, clinical and hospital characteristics are related to hospital mortality. Objectives: Demonstrate that the Cox proportional hazard model is not appropriate for the analysis of hospital mortality data when diagnostic-related groups are incorporated in the covariate structure. Methods: A retrospective single-center observational study design was used. Sampling was conducted between January 2016 and December 2018. Patients over 10 years, admitted to the emergency department with a precited stay of at least 1 hour were included. Multivariate Cox regression for survival data analyses was employed to analyze the data. Results: The sample consisted of 5897 patients. The mean age of all patients was 32.21 ± 0.29 years old, and the mean length of stay (LOS) was 9.47 ± 0.16 hours. We also categorized patients according to five Diagnosis Related Groups (DGR). Among the patients,1308 suffered from acute leukemia, 1127 had endocrine diseases, 1173 with kidney diseases, and 1016 had respiratory problems. At least one component of metabolic syndrome was present in 27.5% of the patients. During the observation period, 2299 (39%) died in hospital, and 3598 (61%) were discharged alive. We used the multivariate Cox regression non-proportional hazard model to evaluate the joint effect of these factors on the “Length of Stay” or LOS (the dependent variable of Cox regression). Age at admission, the presence of metabolic syndrome, and the DRG were significantly associated with the LOS.

Share and Cite:

Ahmed, M. , Elfaham, A. , Asad, A. , Alsaeidi, A. and Shoukri, M. (2024) Risk Factors Associated with Hospital Mortality: Analysis of the Length ofStay Using Risk Prediction Cox Regression Non-Proportional Hazard Model. Open Journal of Emergency Medicine, 12, 156-168. doi: 10.4236/ojem.2024.124018.

1. Introduction

The ability to evaluate hospital performance using patient outcome data depends upon many factors. In principle, the outcome needs to reflect features that are directly affected by the quality of hospital care, to name but a few: mortality, readmission rates of patients and employee satisfaction [1] [2]. One of the important factors that we shall include in this study is the Diagnostic Related Groups (DRGs). This concept was first developed at Yale University in 1975 [3]. The main objective was to group patients with similar treatments and conditions for comparative studies. DRGs were designed to be homogeneous units of hospital activity to which binding prices could be attached. A central theme in the advocacy of DRGs was that this reimbursement system would, by constraining the hospitals, oblige their administrators to alter the behavior of the physicians and surgeons comprising their medical staff. Hospitals were forced to leave the nearly risk-free world of cost reimbursement, and face the uncertain financial consequences associated with the provision of health care. DRGs were designed to provide practice pattern information that administrators could use to influence individual physician behavior. From a statistical viewpoint, DRG’s are considered artificial clusters of subjects.

Krumholz et al. [4] discussed several factors that should be considered when assessing hospital quality. These relate to differences in the chronic and clinical acuity of patients at hospital presentation, the number of patients treated at a hospital, the frequency of the outcome studied, the extent to which the outcome reflects a hospital quality signal, and the form of the performance metric used to assess hospital quality. However, issues related to DRG have not been considered as factors of importance. Since the outcome of interest is hospital mortality, any attempt to derive risk-adjusted mortality that does not take into account the relative importance of DRG will produce biased estimates. This issue will be highlighted within the statistical models that are considered in this paper.

2. Study Design and Study Variables

Hospital discharge status, available from the hospital medical records from January 2016 through December 2018, was extracted. For each subject, the age at admission, length of stay, presence of metabolic syndrome (at least one of its components) as we define in the next section, and DRG membership were included in this cross-sectional retrospective design. The study was reviewed and approved by the Institutional Review Board at the King Faisal Specialist Hospital and Research Center (KFSHRC).

2.1. Dependent Variable

Length of stay (LOS) is the dependent variable, measured on a continuous scale. It is considered a time-to-event variable, where the event is in hospital death. Patients who survive the observation period are considered censored.

2.2. Independent Variables

1) DRG

Because DRG is a categorical variable with an excessive number of levels, our modeling strategy used DRG as a categorical variable and is modeled as a fixed effect variable. We selected the five most prevalent DRGs in the database.

The evaluation process of hospital performance will then be based on an effective risk adjustment. Though one might wish to have additional information on patient attributes and clinical severity, even with currently available data, we should evaluate whether a more flexible risk adjustment model will improve performance. Table 1 shows the summary measures of age at admission for each of the selected DRG.

We were able to obtain such information for five DRG groups, as listed below in Table 1:

a) Acute Leukemia (R60B)

b) Lymphoma (R61B)

c) Endocrine metabolic diseases (K64B)

d) Kidney diseases (L04C)

e) Diseases of the respiratory systems (E62B)

Table 1. Summary measures of patients’ age at admission (AAA) for each DRG.

DRG

N

MEAN

SD

MEDIAN

IQR

Minimum

Maximum

Acute Leukemia

1308

16.7

18.8

9.5

21

1

218

Endocrine Metabolic Disease

1127

4.17

3.03

4

1

1

314

Kidney disease

1173

5.71

4.2

5

2

1

48

Lymphoma

1273

11

12

7

9

1

174

Respiratory System Diseases

1013

9

10.14

6

7

1

107

1) Metabolic Syndrome as an independent covariate

Although Saudi Arabia reports one of the highest prevalence levels of obesity and diabetes, a very limited number of epidemiological studies have examined the prevalence of metabolic syndrome in Saudi Arabia [5]. The prevalence of metabolic syndrome in Saudi Arabia was found to be 39.8% (34.4% in men and 29.2% in women) and 31.6% (45.0% in men and 35.4% in women), according to the NCEP ATP III and IDF criteria, respectively. Metabolic syndrome was also observed to be more prevalent among men and older subjects. The most frequently observed component of metabolic syndrome was found to be low levels of high-density lipoprotein (HDL), followed by abdominal obesity. The World Health Organization’s widely used definition of metabolic syndrome is [6]. The components of each definition and criteria for making the diagnosis of the metabolic syndrome are summarized in Table 1. In addition, definitions were proposed by the European Group for the Study of Insulin Resistance [7] and the American Association of Clinical Endocrinologists. Essentially, these are modifications of the WHO and NCEP definitions, respectively.

Recently, the International Diabetes Federation has proposed a new definition (see Table 2) that it hopes will become the international standard. This definition is similar to the NCEP definition, being based on relatively simple measures applicable in a clinical or epidemiological setting, but differs in three important respects. Central obesity, as determined by waist circumference, is mandatory, and different waist cut points for different ethnic groups are given based on available data linking waist circumference to other components of the syndrome. Finally, the IDF definition uses a lower fasting glucose level than the original NCEP definition, using the American Diabetes Association 2003 cut point for impaired fasting glucose [8].

The abbreviations of the medical official bodies given in Table 2 are:

IDF = International Diabetes Federation

WHO = World Health Organization

EGIR = European Group for the Study of Insulin Resistance

NCEP = National Cholesterol Education Program

Table 2. Definition of the components of metabolic syndrome [Journal of the Royal Society of Medicine, Vol. 99, September 2006].

Component

IDF

WHO

EGIR

NCEP

M

F

M

F

M

F

M

F

1. Central Obesity

≥102

≥88

≥102

≥88

≥94

≥80

≥102

≥88

2. Raised TG

≥1.7

≥1.7

≥1.7

≥1.7

≥2.0

≥2.0

≥1.7

≥1.7

3. Low HDL

<1.03

<1.29

≤.9

≤1

≤1

≤1

≤1.03

≤1.29

4. Hypertension

≥130/85

≥130/85

≥140/90

≥140/90

≥140/90

≥140/90

≥130/85

≥130/85

5. Fasting Glucose

≥5.6

≥5.6

≥6.1

≥6.1

≥6.1

≥6.1

≥6.1

≥6.1

The main reason why the metabolic syndrome attracts the attention of clinicians and epidemiologists is that its components are associated with increased morbidity, long-term disability and eventually death. It is also believed that both cancer and heart disease are consequences of the syndrome if left untreated. However, some researchers question the importance of including the syndrome as a risk factor for heart diseases beyond what is expected in a multifactorial model [9]. The volume of studies on the distribution of the syndrome in different populations is quite large and spans countries from all over the world. In our study, a patient who has at least one component of the metabolic syndrome will be coded as (1); otherwise, it will be given the code (0).

In addition to the above two categorical variables, patients’ age at admission (AAA) will be recorded as a continuous variable in the constructed risk prediction model.

3. Statistical Analysis

3.1. Modeling Time-to-Event Data

In this section, we shall develop a Cox proportional hazard model. The Cox model is one of the most accurate methods belonging to the class of semiparametric statistical models. This model has the advantage that it can use different types of independent variables (continuous, ordered categorical, and nominal variables). In the second stage of the analysis, we develop a predictive model for the risk of overstaying. The regression coefficients, their standard errors, and the corresponding p-values are presented in Table 3.

Table 3. Results of fitting Cox proportional hazard model using R.

Covariate

Coef

Exp (coef)

Se (coef)

Z

P-value

log (AAA)

0.038

1.038

0.002

21.384

<2e−16***

metabolic

0.278

1.320

0.083

3.335

0.001***

Endocrine

2.433

11.391

0.096

25.287

<2e−16***

Kidney

1.431

4.183

0.101

14.209

<2e−16***

Lymph

0.903

2.467

0.093

9.738

<2e−16***

Respiratory

0.753

2.123

0.098

7.636

2.24e−14***

Note that: Acute Leukemia is the reference category.

The estimated survival curve is given in Figure 1.

In order to construct a plausible Cox regression model [10] to be able to reliably predict the risk of death of any patient with a particular covariate profile, we need to make sure that the assumptions of the Cox model are satisfied. There are three basic assumptions:

1) The proportional hazards assumption should be verified.

2) Examining influential observations (or outliers).

3) Detecting nonlinearity in the relationship between the log hazard and the covariates that are measured on the continuous scale.

In order to check these model assumptions, we use residuals. The common residuals for the Cox model include:

Figure 1. Survival probabilities.

  • Schoenfeld residuals to check the proportional hazards assumption [11]

  • Martingale residual to assess nonlinearity [12]

  • Deviance residual (symmetric transformation of the Martingale residuals), to examine influential observations [12]

The proportional hazards (PH) assumption can be checked using statistical tests and graphical diagnostics based on the scaled Schoenfeld residuals [13]. If the proportionality assumption is not satisfied, we use the weighted Cox regression [14]-[16].

In principle, the Schoenfeld residuals are independent of time. A plot that shows a non-random pattern against time is evidence of violation of the PH assumption.

The function cox.zph() in the survival package in R [17] provides a convenient solution to test the proportional hazards assumption for each covariate included in a Cox regression model fit.

For each covariate, the function cox.zph() correlates the corresponding set of scaled Schoenfeld residuals with time to test for independence between residuals and time. Additionally, it performs a global test for the model as a whole.

The proportional hazard assumption is supported by a non-significant relationship between residuals and time and refuted by a significant relationship. The testing for proportionality assumption can be achieved analytically and graphically. Using the R program, we found that,

DRG_CODE: p-value < 0.0000001, GLOBAL: p-value < 0.0000004

From the output above, the tests are statistically significant for the DRG covariate, and the global test is also statistically significant. This means there is not enough evidence in the data to support the proportionality assumption. The violation of the proportionality assumption can also be detected graphically using the survival function plot. From Figure 2, the crossing survival functions for each of the DRG indicate that the proportionality assumption is violated. Therefore, we cannot assume the proportional hazards.

One solution is the weighted Cox regression, which is also valid for non-proportional hazards.

Figure 2. Crossing survival curves is indicative of a violation of the proportional hazard assumption.

3.2. Weighted Cox Regression Model

In the standard unweighted partial likelihood, all patients contribute to the estimation of the regression coefficients to the same extent. This might not be desirable when the cohort is heterogeneous due to known subgroups that are associated with different prognoses. In this situation, it is reasonable to fit a separate Cox model for each subgroup. This can be done by using only the data from the subgroup of interest or by including information from the other subgroups [15] [16].

3.3. Testing Influential Observation

It’s also possible to check outliers by visualizing the deviance residuals. The deviance residual is a normalized transformation of the martingale residual [13]. These residuals should be roughly symmetrically distributed at about zero with a standard deviation of 1. From Figure 3, the pattern looks symmetric around 0. Therefore, we conclude that there are no influential observations.

3.4. Testing for Outliers

The below index plots in Figure 4 show that comparing the magnitudes of the largest dfbeta values to the regression coefficients suggests that none of the

Figure 3. Deviance residual plot to check for influential observations.

Figure 4. Testing for outliers using deviance residuals.

observations is terribly influential individually, even though some of the dfbeta values for age are large compared with the others. It’s also possible to check outliers by visualizing the deviance residuals. The deviance residual is a normalized transformation of the martingale residual. These residuals should be roughly symmetrically distributed about zero with a standard deviation of 1. Positive values correspond to individuals that “died too soon” compared to expected survival times. Negative values correspond to individuals that “lived too long”. Very large or small values are outliers, which are poorly predicted by the model.

3.5. Testing for the Functional Form of the Continuous Variable Age at Admission

Plotting the Martingale residuals against continuous covariates is a common approach used to detect nonlinearity or, in other words, to assess the functional form of a covariate.

For a given continuous covariate, patterns in the plot may suggest that the variable is not properly fit. Nonlinearity is not an issue for categorical variables, so we only examine plots of martingale residuals and partial residuals against a continuous variable, which is Age-At-Admission.

Martingale residuals may present any value in the range (−INF, +1). A value of martingale residuals near 1 represents individuals that “died too soon”, and large negative values correspond to individuals that “lived too long”.

To assess the functional form of a continuous variable in a Cox proportional hazards model, we’ll use the function ggcoxfunctional [in the survminer R package].

As shown in Figure 5, the function ggcoxfunctional displays graphs of continuous

Figure 5. Testing for the functional form of the continuous covariate (Age at Admission).

covariates against martingale residuals of null Cox proportional hazards model. It is evident from Figure 5 that the logarithmic transformation of age at admission seems quite appropriate [14].

Table 4. Weighted Cox regression. Model fitted by weighted estimation using R.

Variable

Coef

Se (coef)

Exp (coef)

P-value

DRG_CODEEndocrin

2.004

0.098

7.422

0.0000

DRG_CODEKidney_D

1.144

0.099

3.141

0.00001

DRG_CODELymph

0.604

0.107

1.829

0.00001

DRG_CODERespirat

0.451

0.101

1.570

0.00001

Log(AAA)

0.038

0.0023

1.039

0.00001

Metabolic

0.242

0.102

1.273

0.0019

Comparing the coefficients in Table 3 (Cox proportional hazard model fit) to their values in Table 4 (weighted Cox regression), we notice differences in the values of the estimates of the regression parameters corresponding to the categorical variable DRG.

4. Risk Prediction Function

The estimated survival function and the Cox regression estimated parameters are very useful in building risk prediction functions. We shall use the following notations:

(X1 = Endocrine, β1 = 2.004), (X2 = Kidney β2 = 1.144), (X3 = Lymph β3 = 0.604)

(X4 = Respiratory β4 = 0.451), (X5 = Log(AAA) β5 = 0.038), (X6 = metabolic β6 = 0.242)

We define the linear predictor as the function h( x )=exp( j=1 6 β j x j ) .

The risk prediction equation is given by [18]:

Risk( t )=1S ( t ) h( x ) (1)

The components of the prediction analyses are:

h( x )=exp( j=1 5 β j x j ) (2)

x1 = 1 if patient belongs to Endocrine disease group, and 0 otherwise,

x2 = 1 if patient belongs to Kidney disease group, and 0 otherwise,

x3 = 1 if patient belongs to Lymph disease group, and 0 otherwise

x4 = 1 if patient belongs to the Respiratory disease group, and 0 otherwise

x5 = Log(AAA), and x6 = 1 if patient belongs to Metabolic disease, and 0 otherwise.

Hence,

h( x )=exp( 2.004 x 1 +1.144 x 2 +0.604 x 3 +0.451 x 4 +0.038 x 5 + 0.242 x 6 ) (3)

and S( t ) is the survival function at time t.

Equations (1)-(3) complete the specifications of the Cox regression prediction model. To clarify the utility of the above approach, we will consider the example below.

The above coding of the DRG’s means that the “Acute Leukemia” is taken to be the reference group. Moreover, from the Cox regression model, we estimate the 24-hour survival probability to be:

S( 24 )=0.701

Now suppose that we have two Respiratory disease patients (x4 = 1). The age of one patient is 20 years, who has at least one component of the metabolic syndrome (x6 = 1), and the age of the other patient is 60 years, without metabolic syndrome. That is h( x )=exp( 0.451+0.038log( 20 )+0.242 )=2.24 , for the first patient, and the risk of staying over 24 hours is

R 1 =1 ( 0.701 ) 2.24 =54.8%

For the other patient we have h( x )=exp( 0.451+0.038log( 60 ) )=1.834 , and the risk of staying over 24 hours is

R 2 =1 ( 0.802 ) 1.834 =33.3%

Therefore, the relative risk is R1/R2 = 01.66.

This means that a 20-year-old Respiratory patient, who has metabolic syndrome, has a 66% increase in the risk of overstaying relative to a 60-year-old Respiratory disease patient who does not have metabolic syndrome.

5. Discussion

The results of this hospital mortality data analysis revealed that the fundamental assumption in Cox regression of Proportional Hazard must be assessed before estimating the risk prediction. More than one-fifth of the studies were found to include a probable non-proportional Cox model without the authors mentioning or adjusting it. The problems associated with the misunderstanding and misuse of statistical methods in medical research are well documented in [13]-[15]. Adding data from other subgroups in a penalized weighted Cox model should increase the power through a larger sample size compared to the classical subgroup analysis that ignores the information from all other individuals. Weights are based on the probability of belonging to the subgroup of interest.

6. Conclusion

Reporting survival analysis and testing of hospital mortality data must be preceded by testing the assumption of proportionality of the Cox proportional hazard regression model. Neglecting background assumptions of most used statistical methods may lead to unreliable results and conclusions. Obviously, this has implications for patient care as biased results may alter the conclusions and the objectives of the study. Therefore, involving epidemiologists and biostatisticians in the planning stage of the study is highly recommended.

Acknowledgements

The authors acknowledge the constructive comments made by an anonymous reviewer.

Disclosure Statement

Dr. Mawahib Ahmed conceived the idea; Dr. M. Shoukri provided the data and helped with the analysis. All authors collaborated to write the final version of the paper.

Conflicts of Interest

The authors declare no conflicts of interest regarding the publication of this paper.

References

[1] Bueno, H. (2010) Trends in Length of Stay and Short-Term Outcomes among Medicare Patients Hospitalized for Heart Failure, 1993-2006. JAMA, 303, 2141-2147.
https://doi.org/10.1001/jama.2010.748
[2] Rotter, T., Kinsman, L., James, E.L., Machotta, A., Gothe, H., Willis, J., et al. (2010) Clinical Pathways: Effects on Professional Practice, Patient Outcomes, Length of Stay and Hospital Costs. Cochrane Database of Systematic Reviews, No. 3, CD006632.
https://doi.org/10.1002/14651858.cd006632.pub2
[3] Alford, R. (1975) Health Care Politics: Ideological and Interest Group Barriers to Reform. University of Chicago Press.
[4] Krumholz, H.M. (2008) Outcomes Research: Generating Evidence for Best Practice and Policies. Circulation, 118, 309-318.
https://doi.org/10.1161/circulationaha.107.690917
[5] Al-Rubeaan, K., Bawazeer, N., Al Farsi, Y., Youssef, A.M., Al-Yahya, A.A., AlQumaidi, H., et al. (2018) Prevalence of Metabolic Syndrome in Saudi Arabia—A Cross Sectional Study. BMC Endocrine Disorders, 18, Article No. 16.
https://doi.org/10.1186/s12902-018-0244-4
[6] World Health Organization (2023) Noncommunicable Diseases.
http://www.who.int/mediacentre/factsheets/fs355/en/
[7] World Health Organization (2024) Chronic Obstructive Pulmonary Disease (COPD).
http://www.who.int/mediacentre/factsheets/fs315/en/
[8] Abdulkareem, A.O., Farag, Y.M.K., Al Sayyari, A.A., Mousa, D., Alhejaili, F., Al-Harbi, A., et al. (2010) Epidemiology of Chronic Kidney Disease in the Kingdom of Saudi Arabia (SEEK-Saudi Investigators)—A Pilot Study. Saudi Journal of Kidney Diseases and Transplantation, 21, 1066-1072.
[9] Brasel, K.J. (2007) Length of Stay: An Appropriate Quality Measure? Archives of Surgery, 142, 461-466.
https://doi.org/10.1001/archsurg.142.5.461
[10] Collett, D. (2003) Modeling Survival Data in Medical Research. 2nd Edition, Chapman & Hall/CRC.
[11] Schoenfeld, D. (1982) Partial Residuals for the Proportional Hazards Regression Model. Biometrika, 69, 239-241.
https://doi.org/10.1093/biomet/69.1.239
[12] Therneau, T.M., Grambsch, P.M. and Fleming, T.R. (1990) Martingale-Based Residuals for Survival Models. Biometrika, 77, 147-160.
https://doi.org/10.1093/biomet/77.1.147
[13] Lin, D.Y. (1991) Goodness-of-Fit Analysis for the Cox Regression Model Based on a Class of Parameter Estimators. Journal of the American Statistical Association, 86, 725-728.
https://doi.org/10.1080/01621459.1991.10475101
[14] Royston, P. and Sauerbrei, W. (2008) Multivariable Model‐Building: A Pragmatic Approach to Regression Analysis Based on Fractional Polynomials for Modelling Continuous Variables. Wiley.
https://doi.org/10.1002/9780470770771
[15] Sasieni, P. (1993) Maximum Weighted Partial Likelihood Estimators for the Cox Model. Journal of the American Statistical Association, 88, 144-152.
https://doi.org/10.1080/01621459.1993.10594304
[16] Schemper, M., Wakounig, S. and Heinze, G. (2009) The Estimation of Average Hazard Ratios by Weighted Cox Regression. Statistics in Medicine, 28, 2473-2489.
https://doi.org/10.1002/sim.3623
[17] Dunkler, D., Ploner, M., Schemper, M. and Heinze, G. (2018) Weighted Cox Regression Using the R Package Coxphw. Journal of Statistical Software, 84, 1-26.
https://doi.org/10.18637/jss.v084.i02
[18] Klein, J., Rizzo, J., Zhang, M. and Keiding, N. (2001) Statistical Methods for the Analysis and Presentation of the Results of Bone Marrow Transplants. Part 2: Regression Modeling. Bone Marrow Transplantation, 28, 1001-1011.
https://doi.org/10.1038/sj.bmt.1703271

Copyright © 2025 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.