1. Introduction
One of the main indicators for assessing an overall population health status is the rate of all-cause mortality over time [1]-[4]. This indicator (mortality) portrays the cumulative effects of health care facilities and expenditures, socioeconomic factors, and environmental factors, among others [5]-[8]. The impacts of these factors on mortality may vary by geographic location, as some studies have shown that these factors occur at relatively high levels (e.g., state, country) [9]-[11], but limited work has been done at small-area levels. To fill this gap at the small-area level, this paper’s objective is to identify small areas (counties) with unusually high or low mortality and assess the relationship between mortality rates and county-level risk factors. For this purpose, we examined spatiotemporal trends in mortality for the counties of Florida from 2016 to 2020 by using Bayesian hierarchical models.
Bayesian hierarchical models are very flexible for incorporating spatial and temporal structures of morality rates [12] [13]. Having such structures greatly increases the computational time of the Markov chain Monte Carlo algorithm [14] [15], which is traditionally used to estimate Bayesian models. Alternatively, a more efficient algorithm is the integrated nested Laplace approximation (INLA) in R [16], which is suitable for assessing spatiotemporal variations in mortality rates in small areas such as counties. To explain spatiotemporal variations in mortality rates, a modeling process incorporates risk factors for all-cause mortality, such as high-risk behaviors, sociocultural factors, accessibility to health care resources, and environmental factors of residential areas [6] [7] [17]-[19]. At the individual level, high-risk behaviors (e.g., smoking, drinking, sedentary and poor diet), socioeconomic factors (e.g., income, education level, unemployment, poverty), health resources (e.g., insurance, personal doctor, availability of health facilities), and environmental factors (e.g., infections, air and water pollution, occupation) have an impact on all-cause mortality rates. Depending on the availability of data at the individual level, a unit-level model that is often used in small area estimation can also be used for disease mapping [20]. In residential areas, county-level characteristics may also explain the variation in all-cause mortality rates. For this purpose, several county-level covariates are included in this study, such as the percentages of unemployment, education and health insurance, median income, income inequality index, median age, and health expenditure, which may impact mortality rates. Examining which of these risk factors contribute to geographic disparities in the all-cause mortality rate is an important endeavor for developing small-area-level health-related policies and interventions.
2. Methods: Data and Modeling
2.1. Sources of Data
The data for this study were obtained from the Department of Health State of Florida, Bureau of Community and Health Assessment, Division of Public Health Statistics & Performance Management (DPHSM). All-cause death counts were aggregated at the record level by county for the 2016-2020 period. Those counts were the most recent publicly available data. Because an all-cause death count is not impacted by bias in determining the cause of death, it is an appropriate outcome for assessing the heterogeneity of mortality rates across counties over time to help develop interventions and policies for those most at risk. In Florida, there are 67 counties, and from each county, data on county-level characteristics (factors) were also obtained from the DPHSM. These factors (see Table 1) are used in the geospatial model, which is defined below.
2.2. Statistical Analysis
For space-time mapping of mortality rates in small areas, total death counts are recorded along with the characteristics of the areas over time. To determine which areas exhibit higher or lower mortality rates, the standardized mortality ratio (SMR) is commonly computed for each area and specific time. The SMR is defined as the ratio of observed death counts to expected death counts [21] [22]. An SMR greater than 1 suggests higher mortality, and vice versa. Although its interpretability is simpler, it is usually an unstable indicator due to variation in the sizes of populations of small areas. A better approach is to use a spatiotemporal model that borrows information from neighboring areas and time points. To specify the spatiotemporal model, let the outcome variable
be the observed death count for the ith area and tth time (
), and let a q-dimensional vector
contain county-level risk factors with associated parameters
, having a
dimension. We assume that the outcome variable follows a Poisson distribution with mean
, where
is the number of expected
deaths in the ith area and at the tth time, calculated as
;
is
the number of individuals at risk; and
is the county-specific relative risk of death from all causes. Specifically,
(1)
where,
(2)
Furthermore, the log-risk is modeled as
(3)
where
is the overall log-risk and the q-dimensional vector of covariates,
, contains the following county-level variables: median income, median age, expenditure on health, percentage of health insurance, percentage of people with less than a high school education, income inequality index and unemployment rate. In Model (3), the spatial term
, is modeled as
, where
represents spatially unstructured heterogeneity distributed as
, while
represents spatially structured random effects, accounting for similarities of neighboring counties to have similar relative risks because of sharing common risk factors [23]-[25]. The distribution of
is given, based on the conditional autoregressive model [26], as
(4)
where
and
and
if area
and area
are neighbors; otherwise, 0. The temporal term,
, is defined as
, where
represents the temporally structured effect and is modeled using a random walk of first order as given below.
,
;
and
represents temporally unstructured heterogeneity distributed as
; and
denotes the interaction between space and time,
[27].
The parameters in models (1 - 4) are estimated using the Bayesian approach. For these parameters, noninformative prior distributions are determined. Accordingly, each element of
has a normal distribution with mean zero and a small precision (i.e., N(0, 0.001)); 1/
,
,
,
and
, each, follow a Gamma (1, 0.01), a default prior in INLA. Using the posterior distribution of
(relative rate of all-cause mortality for the ith county and in the tth year), it is possible to compute and map the probability that the mortality risk in a county exceeds a given threshold [28]. Thus, an exceedance probability can be used to determine whether a county should be classified as having an excess risk (hotspot) of mortality [29]. In the Bayesian context, model comparisons can be performed using the deviance information criterion (DIC) [30] and the Watanabe-Akaike information criterion (WAIC) [31] [32]. The model with the smallest values of DIC and WAIC is preferred as the “best” model in terms of goodness of fit to the data. For estimation, we used the integrated nested Laplace approximation (INLA) software [16]. The software is particularly efficient for fitting spatial mapping models for disease incidences and can easily be used by researchers and practitioners [33].
3. Results
The descriptive statistics of the county-level characteristics are given in Table 1. The average number of all-cause deaths per county showed a slight upward increase from 2016 to 2020, except for a decrease in 2019. The median income, median age and health insurance coverage increased over time, while the percentage of adults with less than a high school education and unemployment rate (except for 2020) declined over time. The means of the income inequality index and expenditures remained roughly constant. Looking at the magnitude of the standard deviations of death counts over time, it was clear that there was heterogeneity in mortality across counties. The presence of variation was also visually confirmed by examining spatial mappings of SMR over time (see Figure 1). According to the spatial distribution of SMR, higher mortality was observed in the northern counties than in the southern part of Florida. To explain such variability, a Bayesian spatiotemporal model (GSTM), given in Equation (3), was used to provide spatial smoothing of the unadjusted SMR estimates, providing more stable estimates of mortality rates.
Table 1. Descriptive statistics of county-level variables by year.
Variables |
2016 |
2017 |
2018 |
2019 |
2020 |
Mean |
SD |
Mean |
SD |
Mean |
SD |
Mean |
SD |
Mean |
SD |
All-cause death count |
787.2 |
146.25 |
795.4 |
149.00 |
797.9 |
154.86 |
774.2 |
147.13 |
877.6 |
180.80 |
Median income ($) |
45,205 |
8382.3 |
47144 |
9125.6 |
49,046 |
9807.5 |
51,290 |
10,300 |
53,012 |
11,145 |
Health insurance (%) |
84.16 |
3.87 |
85.39 |
3.63 |
86.57 |
3.46 |
87.15 |
3.55 |
87.35 |
3.36 |
Expenditure ($) |
60.74 |
37.73 |
59.44 |
34.53 |
60.83 |
36.37 |
60.15 |
37.41 |
60.22 |
38.10 |
Less than high school (%) |
15.81 |
6.71 |
15.17 |
6.48 |
14.64 |
6.17 |
14.39 |
6.22 |
14.07 |
6.09 |
Median age (year) |
43.09 |
6.26 |
43.30 |
6.34 |
43.57 |
6.42 |
43.76 |
6.53 |
44.00 |
6.58 |
Income inequality index |
0.461 |
0.05 |
0.461 |
0.049 |
0.463 |
0.049 |
0.463 |
0.049 |
0.466 |
0.048 |
Unemployment rate |
5.08 |
0.85 |
4.34 |
0.71 |
3.78 |
0.62 |
3.67 |
0.68 |
6.64 |
1.53 |
Figure 1. Spatial distribution of standardized mortality ratios (SMRs) of all-cause deaths in Florida (2016-2020).
We fitted our data with three competing models (see Table 2 for model specifications): 1) A fixed-effects model without spatial or temporal random effects (Model 1); 2) A mixed-effects model with a parametric temporal trend (Model 2); 3) A mixed-effects model with a nonlinear nonparametric trend (Model 3). According to the Bayesian model comparison criteria displayed in Table 2, Model 1 has the largest values of DIC (8005.22) and WAIC (18820.35), suggesting that ignoring spatial or temporal variation is not a good approach. Alternatively, both Model 2 and Model 3 incorporate spatial and temporal variations. Model 3 has a better fit to the data than Model 2 since its values of DIC (3536.84) and WAIC (3526.38) are smaller than those of Model 2. Thus, Model 3 is the best model, and the results from this model are presented in Table 3.
Table 2. Model specifications and comparisons (DIC = deviance information criterion, WAIC = Watanabe–Akaike information criterion).
Model Specification |
Formula |
DIC |
WAIC |
Model 1: Fixed effects (no spatial or temporal
random effects) |
|
8005.22 |
18820.35 |
Model 2: Mixed effects with parametric trend
(spatial and temporal random effects) |
|
3619.97 |
3991.69 |
Model 3: Mixed effects with nonparametric trend (spatial and temporal random effects) |
|
3536.84 |
3526.38 |
Table 3. Estimated posterior mean (PM) and standard deviation (SD) of population parameters of Model (3) along with a lower limit (LCL) and upper limit (UCI) of the 95% equal-tail credible interval (CI).
Predictor |
PM |
SD |
LCL |
UCI |
Black percent |
−0.0010 |
0.0017 |
−0.0043 |
0.0023 |
Expenditure |
0.0002 |
0.0003 |
−0.0005 |
0.0008 |
Hispanic percent |
−0.0048 |
0.0018 |
−0.0083 |
−0.0011 |
Income inequality |
0.0155 |
0.0832 |
−0.1480 |
0.1783 |
Insurance |
−0.0036 |
0.0028 |
−0.0091 |
0.0019 |
Less than high school |
0.0034 |
0.0025 |
−0.0015 |
0.0083 |
Median income |
−0.0055 |
0.0013 |
−0.0081 |
−0.0028 |
Median age |
−0.0067 |
0.0028 |
−0.0121 |
−0.0012 |
Unemployment |
0.0008 |
0.0027 |
−0.0045 |
0.0061 |
Figure 2. Spatial distribution of the posterior means of the RRs of all-cause deaths in Florida (2016-2020).
Figure 3. The county-specific posterior exceedance probability that a county’s relative rate of all-cause mortality is greater than 1 (2016-2020).
In Table 3, the posterior means, standard deviations and 95% credible intervals (CIs) for the fixed-effects parameters are displayed. The results show that the median income has a strong inverse association with the all-cause mortality rate (
, CI: (−0.0081, −0.0028), which does not include zero) after controlling for other variables. This shows that residents living in counties with higher median incomes have a decreased mortality rate. It was also observed that there was a strong decrease in mortality for counties with higher percentages of Hispanic people per county after adjusting for other variables (
, CI: (−0.0083, −0.0011)). In addition, the greater the median age per county was, the lower the rate of mortality (
, CI: (−0.0121, −0.0012)).
From the fitted model, adjusted relative rates (RR) of mortality were also obtained for the purpose of spatial mapping. Figure 2 displays the spatial and temporal distributions of RR. Figure 2 shows that there was not much temporal variation from 2016 to 2020, implying that the trend of the relative rates of mortality for counties in Florida remained stable over that period. Figure 2 shows that there is a spatial clustering of high relative rates of mortality (pink and red) in the northern part of Florida. Specifically, Union and Washington Counties had the highest mortality rates consistently across the 5-year period considered. On the other hand, Collier County in South Florida experienced the lowest relative mortality rate (blue) throughout the study period. Identifying hot spots for all-cause mortality is a useful guide for implementing prevention and health policies. For this purpose, we can use an exceedance probability. The exceedance probability is the probability that a relative risk estimate for a county is greater than a given threshold value, such as 1 or more. Figure 3 shows a map of the exceedance probabilities at a threshold value of 1. For ease of interpretation of the exceedance map, note that for those counties with probabilities close to 1, it is likely that their relative rates (RR) exceed 1; close to zero suggests that it is not very likely that their RRs are greater than 1. As shown in Figure 3, the highest exceedance probabilities (i.e., Pr(RR > 1) was greater than 0.8) were found in northern Florida and partly in central Florida, accounting for approximately 73% of the total counties (67). Thus, further investigation is warranted for identifying other prominent risk factors (observable or unobservable) with the aim of reducing the burden of high mortality. Specifically, effective intervention programs may be implemented for those counties with high mortality rates.
4. Discussion
In this study, our analysis showed that there was not much temporal heterogeneity in the mortality rate over the 2016-2020 period, while there was substantial variability in mortality across small areas (counties). The geographical distributions (see Figures 1-3) of mortality rates revealed that relatively worse occurrences of mortality were clustered in the northern part of the state, and better mortality rates were consistently found in the southern part throughout the study period. Specifically, in the northern part, 84% of the counties had high mortality where the RRs were greater than 1, which means that there were more deaths than expected and implies the existence of health disparities. To explain this disparity, county-level covariates were used in Model 3. The findings indicated that county-level relative rates of mortality were greater in counties with lower median incomes, younger median ages, and lower percentages of the Hispanic population after controlling for unemployment rate, health insurance, education, income inequality index, and health expenditure.
One of the significant results of this study was that the percentage of the Hispanic population at the county level had an inverse association with all-cause mortality. A plausible reason for such a finding may be that a greater percentage of the Hispanic population lives in southern Florida and larger cities with higher median incomes. This is consistent with what is known as a Hispanic paradox [34] [35]. In other words, although the fact that most Hispanic subpopulations have lower socioeconomic status and access to health care, their health and mortality outcomes are better or similar to those of the white population [34]-[36]. Another variable that had a strong negative association with the mortality rate was the median age per county. Higher mortality rates were observed in counties of Florida with relatively younger populations (those with a lower median age) [37]. This may seem contrary to expectations but considering the structure of the number of deaths by county, approximately 72% of the rural counties had a median age of less than 45 years. Of the total number of counties (67) in Florida, 48% are rural counties that carry 73% of the high death burden. The reason why higher mortality occurs mostly in rural areas warrants further investigation. For our study, the unit of analysis is a county but not an individual. Accordingly, the interpretations of our results should reflect ecological effect. All our variables considered in this study were aggregated over individual residents and thus the results of our analyses cannot be interpreted at the individual level.
There are some limitations to our ecological analyses. First, the analyses were limited by not having data on important risk factors for mortality. It was not possible to obtain county-level data on unhealthy lifestyle behaviors, health-impairing environmental exposures, or access to resources. Second, interpretations of the results of the analyses require caution since our ecological model (the unit of analysis is a county) does not lead to inferences at the individual level. Third, if possible, using smaller granular-level detailed data may provide more insight to support policy decision-making.
In conclusion, using a Bayesian method, we investigated county-level characteristics that partly explained the spatiotemporal variations in all-cause deaths in the 2016-2020 period in Florida. The findings showed that counties with a higher median income, percentage of Hispanic population and median age had a lower mortality rate after adjusting for other covariates. Although we showed that there was geographical variation in all-cause mortality rates across counties, we observed a high burden of death and a stagnation in the decline of mortality in rural counties. Thus, a better understanding of other risk factors would help practitioners and public officials use measures such as interventions and proper resource allocations to high-risk areas.
Data Availability
The datasets were obtained from publicly archived located at: https://www.flhealthcharts.gov/ChartsReports/rdPage.aspx?rdReport=NonVitalInd.Data.