Statistical Assessment of Neighborhood Socioeconomic Deprivation Environment in Spatial Epidemiologic Studies ()
Received 17 December 2015; accepted 11 June 2016; published 14 June 2016;

1. Introduction
Health-related behaviors and outcomes display significant geographic variations. Neighborhood socioeconomic environment (SES) has been associated with health-related behaviors [1] - [4] , incidence [5] - [7] and poor prognosis [8] of diseases, and premature mortality [5] [9] - [12] . Population-based data sources from local and federal governments (e.g. U.S. Census) provide a number of SES-related data elements and are commonly used to assess the role of neighborhood SES in health behaviors and outcomes. However, there is no consensus on which neighborhood measures, at which geographic level should be used to examine socioeconomic disparities in health behaviors and outcomes. Neighborhood SES has been defined inconsistently across studies, which may contribute to inconsistent findings regarding the relationships between neighborhood SES and health behaviors and outcomes [13] . Various single SES indicators at different geographic levels (e.g. county, census tract, block group) have been used as neighborhood SES measures. It remains unclear regarding appropriate SES indicators for a specific geographic region at a specific geographic level.
Neighborhood SES is a complex concept consisting of multiple aspects of socioeconomic resources. A variety of single-variable measures makes it possible to develop a composite index to comprehensively assess neighborhood SES environment. We propose that, compared with single-variable measures, a composite index can more accurately reflect neighborhood deprivation by capturing more dimensions of socioeconomic resources.
In this study, we apply 2000 U.S. Census data to identify individual socioeconomic variables that significantly reflect socioeconomic deprivation across four geographic areas at three geographic levels. We compare composite indexes with six socioeconomic indicators reflecting different aspects of socioeconomic deprivation environment.
2. Methods
2.1. Data Source
U.S. Census data have been widely applied to assess neighborhood socioeconomic context. For the 2000 census and before, the Census Bureau collected population and housing data from all households and socioeconomic data from about one in six households every ten years at a single point in time. From 2006, these information has been collected over time with households sampled per year by the American Community Survey (ACS) and only the cumulative five-year ACS approximating the sample proportion achieved by the decennial census. Considering ACS margins of error for small areas, we applied 2000 U.S. data for the socioeconomic information of geographic areas. In this study, ethical review was not needed because only public-use area-level Census data were applied.
2.2. Single SES Variables
To capture broad aspects of socioeconomic deprivation context, based on the literature [5] [10] [14] - [16] , we selected 21 Census variables at three geographic levels (county, census tract, and block group) (Table 1). These variables, which reflect neighborhood socioeconomically deprived resources from six different domains, include 1) education (the percentage of population without high school education); 2) occupation (the percentage of population in working class, the percentage of civilian labor force unemployed); 3) housing conditions (the percentage of household rent, the percentage of vacant household, the percentage of household with at least one person per room, the percentage of female headed households with dependent children, the percentage of household with public assistance, the percentage of household with no car, the percentage of household with no phone, the percentage of occupied household with incomplete plumbing, the percentage of household with no kitchen); 4) income and poverty (income disparity, the percentage of household with low income, the percentage of households below federal poverty line, the percentage of population below federal poverty line); 5) racial composition (the percentage of non-Hispanic African Americans, the percentage of Hispanic population, the
![]()
Table 1. Variables selected to comprise deprivation index at three levels in four areas.
aI: the nation; bII: California; cIII: Georgia; dIV: Louisiana; eHH: household; *variables selected for constructing the composite index.
percentage of population foreign-born); and 6) residential stability (the percentage of residents aged 65 or older, the percentage of persons with the same house at least five years). To examine the influence of geographic size, we performed the analysis across the nation and three states that have different socioeconomic characteristics and are involved in the Surveillance, Epidemiology, and End Results Program of the National Cancer Institute.
2.3. Statistical Analysis
2.3.1. Development of Neighborhood Socioeconomic Deprivation Index
Using a multivariate common factor analysis with the “varimax” rotation, we examined the internal structure of Census variables and identified their importance. We selected the common factor which predominantly accounted for total variance of all variables. A variable was selected to construct a composite index if its factor loading on the selected common factor was: 1) no less than 0.5; 2) the largest among its factor loadings across all common factors; and 3) at least 0.1 larger than the second largest factor loading across all common factors. A composite index was constructed by summing all selected variables that were standardized and weighted by their factor scoring coefficients. Cronbach alpha was applied to evaluate the internal consistency of selected variables with bigger value indicating greater internal consistency. A total of 12 composite index scores were independently developed for four geographic areas at three geographic levels, respectively.
2.3.2. Examination of the Agreements
To compare a composite index to single socioeconomic indicators, we selected six commonly-used variables from the aforementioned six domains (one per domain). They included the percentage of population without high school education, the percentage of civilian labor force unemployed, the percentage of households with public assistance, the percentage of population below federal poverty line, the percentage of non-Hispanic African Americans, and the percentage of residents age 65 or older. Regarding potential skewed distributions of Census variables, we categorized the composite index and six single indicators into quintiles (five categories) according to their distributions. The categorization is commonly and broadly applied to assess the effects of environmental exposures on health behaviors and outcomes in epidemiological studies. We examined the agreements between seven variables through computing weighted Kappa coefficients for each pair of these variables [17] . Based on previous literature [18] , the degree of agreement was defined as six categories, including 0 (no agreement, κ < 0), 1 (slight agreement, κ = 0.01 - 0.20), 2 (fair agreement, κ = 0.21 - 0.40), 3 (moderate agreement, κ = 0.41 - 0.60), 4 (substantial agreement, κ = 0.61 - 0.80), and 5 (perfect agreement, κ > 0.80). The data management and analysis were performed in SAS System (version 9.3, SAS Institute Inc., Cary, North Carolina).
3. Results
Table 1 shows the component structure of 12 geographic area- and level-specific composite SES indexes. The component of the composite index varied across examined geographic areas. These component variables selected for each of 12 composite indexes account for a large proportion of overall variance of all Census variables (ranged from 31.6% to 47.8%), and have high internal consistencies (Cronbach’s alpha ranged from 0.88 to 0.96). At a specific geographic region, the component of the composite index was similar at the census tract- and block group-level but different from that at the county level. The percentage of population below federal poverty line was consistently selected for the composite index, regardless of geographic areas and levels. In contrast, the residential stability domain did not significantly contribute to the composite index at any of geographic areas or levels.
The percentage of population without high school education and the percentage of households with public assistance were the component of the composite index for each of three states, regardless of geographic levels, but not for the nation. The percentage of non-Hispanic African Americans is one of significant contributors to the composite index in Georgia and Louisiana, the states with a relatively high proportion of African American residents.
At the census tract level, the composite indexes had moderate-to-substantial agreements with their components and no-to-moderate agreements with non-component variables (Table 2). Across the nation, the composite index showed a substantial similarity (κ category is 4) to its component variable (the percentage of population below federal poverty line), and slight-to-moderate similarities (κ categories range from 0 to 3) to non-compo- nent variables. This agreement difference between the composite index and component and non-component variables was also observed in three states. The percentage of population below federal poverty line had no-to- substantial agreements with other socioeconomic indicators (κ categories range from 0 to 4).
4. Discussion
Neighborhood SES has been widely used to assess socioeconomic gradients and inequalities in a variety of health behaviors and outcomes [1] - [12] . However, there is no consensus on the definition of neighborhood SES, and thus various socioeconomic variables have been used across studies. This may explain, at least in part, the inconsistent results of the role of neighborhood SES in health behaviors and outcomes [13] .
Using a uniform set of U.S. Census variables, we compared a composite index to six commonly-used socioeconomic indicators from different socioeconomic deprivation domains. The result showed that substantial
![]()
Table 2. Weighted Kappa agreement between seven socioeconomic variables at census tract level.
aPNH: % Population with less than high school; bPNE: % Civilian labor force unemployed; cPPA: % Household on public assistance; dPPV: % Population below federal poverty line; ePAA: % Non-Hispanic African Americans; fPOD: % Residents aged 65 or older; gthe nation (1st row); hCalifornia (2nd row); iGeorgia (3rd row); jLouisiana (4th row). 0: No agreement; 1: Slight agreement; 2: Fair agreement; 3: Moderate agreement; 4: Substantial agreement; and 5: Perfect agreement.
*Corresponding author.
Therefore, geographic area- and level-specific SES indicators should be used to define SES for the study area. In studies examining the role of general neighborhood SES in health behaviors and outcomes, a composite index is a measure of neighborhood SES better than single SES indicators. If we assess the role of a specific SES indicator, such as poverty, it is necessary to examine if that indicator substantially reflects overall SES environment of the studied geographic region at a certain geographic level. Otherwise, the SES indicator selected may not be generalizable to overall neighborhood SES environment. In this study, we only compare the composite SES index to six commonly-used Census variables from different socioeconomic domains. Further research may be necessary to compare neighborhood SES deprivation index to other variables or indexes of interest. However, our findings suggest that the assessment method of neighborhood SES environment should be paid more attention. Researchers should examine specific characteristics of SES environment in their own study regions to design an appropriate strategy in assessing neighborhood SES, instead of simply selecting SES variables applied in previous literature.
Regarding the margins of error of the ACS data, we apply the 2000 Census data which may not benefit recently-initiated studies. However, historic data source sometimes can be useful for prospective studies initiated in an earlier time-point. History of neighborhood exposures and their changes over time should be integrated into advanced statistical modeling to control for spatial uncertainty due to time-varying exposures and confounders for unbiased estimations of neighborhood effects on health behaviors and outcomes. In addition, the main purpose of this study is to address the strategy in assessing small-area neighborhood socioeconomic environment by comparing different socioeconomic variables to a composite index and examining the degree of their agreements using a uniform and reliable data source. Previous study has indicated that selecting different socioeconomic indicators can lead to inconsistent findings [13] ; therefore, it is necessary for researchers to select an appropriate approach in accurately assessing neighborhood SES environment.
In conclusion, geographic area- and unit-specific SES measures should be applied to identify and quantify socioeconomic inequalities in health behaviors and outcomes. A multivariate factor analysis with an appropriate rotation method is a useful approach to identify region- and geographic unit-specific SES indicators and construct a composite index. SES resources of the specific geographic area, along with the research question, should be taken into account in selecting a composite index or single indicators as a SES measure.
Acknowledgements
This work was supported in part by a career development award (K07 CA178331) and a research award (R21 CA169807) from the National Cancer Institute at the National Institutes of Health, and a research award (R01 AA021492) from the National Institute on Alcohol Abuse and Alcoholism at the National Institutes of Health. In addition, Y. L. is supported by the Barnes-Jewish Hospital Foundation, St. Louis, Missouri and the Breast Cancer Research Foundation. We also thank for the use of the Health Behavior, Communication and Outreach Core, part of a cancer center grant (P30 CA091842) funded by the National Cancer Institute at the National Institutes of Health. No conflicts of interest were declared.
NOTES
![]()
*Corresponding author.