Application of Multivariate Geostatistics in Environmental Epidemiology: Case Study from Houston, Texas ()
Received 24 December 2015; accepted 25 April 2016; published 28 April 2016
1. Introduction
Assessing causal inferences between exposure to air pollution and respiratory and cardiovascular diseases provides epidemiological evidence regarding the adverse health effects of air pollution. Most of relevant studies quantify the possible causality through association between the pollutants and the health outcomes due to exposure to pollutants either singularly or jointly. However, the effect of local air pollution versus neighboring levels is not fully explored yet [1] [2] , which points to the importance of studying community-related characteristics which could help in estimating the causal inference between air pollution and its respiratory and cardiovascular adverse health effects. Application of multivariate geostatistical techniques can assist in this regards because it has the advantage of making use of the spatial correlations between observations pertinent to neighborhoods to predict values at unsampled places [3] [4] .
This study capitalizes on previous studies which assess the effect of ozone and particulate matter on CVD and COPD ER visits in Harris County, Texas from 2004 to 2009 [5] [6] by studying the effect of joint exposure to air pollutants and other factors like temperature and relative humidity on both health outcomes instead of univariate modeling. The use of multivariate geostatistics has been useful in other applications like mining, but with the increased quality of spatial health data, their relevance is gradually increasing. This study presents a bivariate example with two dependent variables (CVD and COPD ER visits, which could be replicated to incorporate more dependent variables. This can provide immeasurable benefits to epidemiologists and policy makers by incorporating more outcomes that have been documented to be associated, thus, representing a more realistic situation than modeling the effects on one outcome at a time.
2. Data and Methods
Daily ozone, PM2.5, wind speed, wind direction, temperature, and relative humidity were obtained from EPA’s AirData website [7] . Harris County daily emergency room visits for COPD and CVD from 2004 to 2009 data comes from the Emergency Room Utilization Study, University of Texas School of Public Health [8] . COPD ER visits are those with a primary diagnosis of chronic obstructive pulmonary disease (COPD) including chronic bronchitis (ICD-9 codes 490 - 491), emphysema (ICD-9 code 492), asthma (ICD-9 code 493), bronchiectasis (ICD-9 code 494), and chronic airway obstruction (ICD-9 code 496) [9] . CVD emergency room visits are those with primary diagnosis of heart failure (ICD-9 code 428), heart rhythm disturbances (ICD-9 codes 426 - 427), cerebrovascular events (ICD-9 codes 430 - 438), ischemic heart disease (ICD-9 codes 410 - 414, 429), and peripheral vascular disease (ICD-9 codes 440 - 448) [10] . Harris County zip codes from Texas State Data Center [11] . R version 3.1.1 [12] was used to conduct the statistical analyses. O3 and PM2.5 measurements were interpolated in zip codes where no observations were sampled, using kriging. Statistical analyses were conducted using R version 3.1.1.
3. Results and Discussion
Table 1 presents descriptive statistics for the collected variables. Analyzed variables included O3 and PM2.5 lagged at two and one day as well. Data exhibited high shifts of skewness and kurtosis from 0 and 3, respectively, which are the characteristic values of normal distribution. Therefore, the variables were of non-symmetric distributions, with long tails and several outliers. The total number of CVD ER visits during the six years was 96,596. Those visits kept increasing with time, starting with 3309 visits in 2004 and ending with 30,765 in 2009,
Table 1. Summary statistics for the dependent and independent variables.
a greater than 900% increase (Figure 1). Modeling the association of each visit to temperature, relative humidity, wind speed, wind direction, age, ethnicity, gender, day of the week, PM2.5 of the same day, PM2.5 of the previous day, O3 of the same day, and O3 of the previous day showed that the following variables were significant: visits on Friday, Saturday, Sunday and Tuesday, being African American male, wind speed, O3 of the same day, PM2.5 of the same day and previous day [6] . The total number of COPD ER visits was 95,765. Similar to CVD ER visits. COPD ER visits had positive significant associations with the being African American male, relative humidity, O3 of the same day, and PM2.5 of the same day [5] . Pair-wise correlations between the variables are given in Table 2. The most notable associations include the strong positive association between CVD and COPD ER visits of 0.74 and the negative association between temperature and COPD ER visits of −0.27.
The number of CVD and COPD ER visits was then summed up by date and zip code. Fisher’s F test to test the null hypothesis of non-changing variances of each outcome between the years, gave p-values less than the significance level of 0.05. The same result was obtained for t test. Hence we reject the null hypothesis of equal variances and means respectively. That is, neither CVD nor COPD ER cases for Houston during those years can be combined as if they were coming from one statistical distribution. Consequent analyses were conducted on the 2009 data. Multivariate multiple regression results showed that the model is statistically significant and that lagged ozone and PM2.5 concentrations were not statistically significant to both outcomes. Interestingly, only PM2.5, relative humidity and temperature were significant to both CVD and COPD ER visits (Table 3). The effect
Figure 1. Daily Numbers of CVD and COPD ER visits from January 2004 to December 2009 in Houston, Texas.
Table 2. Pearson pair-wise associations.
Table 3. Summary of type of association investigated by dependent variable.
of these three factors on the mechanisms of toxicity for respiratory and cardiovascular disease can be attributed to the oxidative stress leading to inflammation, which generates the physiological processes evolving as respiratory and/or cardiovascular symptoms like narrowing of airways, shortness of breath, wheezing, cough, and the ability of particles to penetrate the lung wall accumulating in the pulmonary interstitium between the lung and the bloodstream [13] - [15] . As for heat’s effect on the health outcomes, research shows that hot-humid conditions are more challenging for patients with respiratory and cardiovascular diseases because heat and humidity affects the body's ability to preserve normothermia [16] [17] . The increase in humidity combined with particulate matter pollution allows for the delivery of the pollutant deeper into the lung [18] . Both heat and humidity exacerbate respiratory and cardiovascular conditions [19] .
As for the spatial association between CVD and COPD ER visits across the different zip codes of Houston, the codispersion coefficients were constant with a variance of 0.001 (Figure 2). These are the ratios of the cross versus the direct variograms which can capture spatial association that is not possible to calculate following conventional methods. Therefore, we can assume the presence of intrinsic correlation whereby the model of intrinsic correlation is entirely specified by its spatial structure, and by the variance-covariance matrix [3] [20] [21] . Tests of normality lead to rejection of the null hypothesis of normality for CVD and COPD ER visits. Therefore, they were transformed using the logarithmic function. In order to predict future numbers of ER visits for each of the zip codes variograms were estimated, which exhibited anisotropy. This indicates that the independent variables had strong influence on the spatial variability of the two health outcomes. Directional variograms were estimated to characterize the spatial variability in different directions. Fitted variogram parameters (nugget, range, sill, smoothness parameter, practical range) were used in ordinary kriging [22] . Cross validation proved reliability of the approach used. This approach predicts that CVD and COPD ER visits will increase by more than 34% and 24% respectively. These predictions can be compared to future ER visits from following years. Knowing that Houston is one of the top cities with unpaid ER visits [23] , this high percent carries multi-fold responsibilities: The need for more sophisticated interventions and resources in order to manage the continually increasing number of visits.
The persistent increase in CVD and COPD ER visits (Figure 3) in Houston, Texas could be attributed to variations in data coverage and completeness as well as socioeconomic factors including immigrations due to severe weather disasters where the majority of movers are usually those who belong to the traditionally under-served under-represented groups [5] [6] . The previous models were univariate, i.e.; the joint effects of the risk factors were investigated to one health outcome at a time. While these analyses provided good prediction because of inclusion of multipollutants (O3 of the same day, O3 of the previous day, PM2.5 of the same day, and PM2.5 of the previous day), they did not investigate the joint effect of the risk factors on both outcomes. This study allowed us to see how CVD and COPD ER visits are related once control variables have been included (through the examination of the correlation of the residuals), and that we can test the effect of ozone and PM2.5 both across CVD and COPD ER visits jointly [24] . Figure 4 gives a proposed decision tree for applying multivariate geostatistics in environmental epidemiology, which was conducted in this study. Grayed boxes were not followed.
Figure 2. Codispersion coefficients against distance in miles.
Figure 3. Total number of ER visits for CVD and COPD in Houston, Texas.
Figure 4. Proposed decision tree for applying multivariate geostatistics in environmental epidemiology.
4. Conclusion
This paper presents an approach to study the multivariate spatial nature between health outcomes and multipollutants using multivariate geostatistics. The example presented modeled the effect of ozone, PM2.5, relative humidity, wind speed, and temperature on both CVD and COPD ER visits at the same time. Now that we answered questions like the effect on both outcomes simultaneously, we can address more complex questions like the effect on more health outcomes that have been historically associated to each other. Further investigation is recommended in order to address potential issues like computational complexity and bias [22] .