Use of a Land Use Regression Model Methodology for the Estimation of Individual Long-Term PM2.5 Exposure Profiles of Urban Residents in Jiujiang City, China ()
1. Introduction
Fine particulate matter (PM2.5) is a common pollutant and ubiquitous component of urban air pollution, which mainly originates from industrial emissions, construction sites and vehicle exhaust emissions. Epidemiological studies have comprehensively established the adverse effects of PM2.5 on health, with long-term exposure to PM2.5 shown to be associated with various diseases such as cancer and respiratory disorders, as well as increased mortality rates (Moon et al., 2024; Bhavsar et al., 2024; Chen et al., 2024).
When investigating the impact of long-term exposure to fine particulate matter (PM2.5) on health, determining the level of PM2.5 exposure by wearing portable detectors presents a significant challenge, requiring the compliance of study subjects and large equipment and organization costs. Therefore, establishing methods for the efficient and accurate assessment of individual long-term PM2.5 exposure levels, is a key focus in the field of toxicological and epidemiological research. Land use regression (LUR) model technologies are widely used in the field of environmental science to describe the spatial and temporal distribution of urban air pollution. The LUR modeling technique was first proposed by Briggs et al. (1997). The core concept involves utilizing multiple variables centered on land use data as predictor variables, and air pollutant concentration data from existing monitoring stations as dependent variables, to construct a multiple regression model. This model enables the prediction of air pollutant concentrations in areas not covered by monitoring stations (Briggs et al., 1997). LUR models have been applied to epidemiological studies in recent years, providing the advantages of high efficiency, high precision and wide practical applicability for the assessment of long-term air pollution exposure levels (Nguyen et al., 2025; Gayraud et al., 2024; Rhee et al., 2024). However, existing LUR studies have been conducted mainly in medium and large cities, where a large number of air pollution monitoring stations are available, while it remains unclear whether LUR models can be effectively constructed for the prediction of long-term PM2.5 exposure levels for individuals in small cities with sparse monitoring station coverage.
Jiujiang City is located in Jiangxi Province in central China, it is a small city with only 7 air pollution monitoring stations distributed within the urban area. In order to provide accurate pollutant determination techniques for environmental and epidemiological investigations, this study developed a method that is able to conveniently and accurately assess long-term exposure to PM2.5 by individuals living in urban areas of Jiujiang City, based on LUR modeling techniques. Daily LUR models were constructed and the performance of the models was validated by inviting volunteers to estimate their PM2.5 exposure based on their time spent at work and home addresses. Furthermore, the volunteers were also asked to wear portable detectors during a 14-day period and the results of the two methods were compared in order to evaluate the accuracy of the LUR modeling technique.
2. Methods
2.1. Study Area
The study area was located in the urban area of Jiujiang City (China), which includes 2 administrative districts, Xunyang District and Lianxi District (29˚31'31'' - 29˚46'3''N, 115˚53'3'' - 116˚13'55''E) with a total area of 413.5 Km2 and a population of 810,000 in 2022. The highest altitude in Jiujiang City is 791.0 meters, while the lowest altitude is −9.37 meters. Jiujiang City is located in a subtropical monsoon climate zone, with an average annual temperature of 16˚C - 17˚C and an annual rainfall level of 1300 - 1600 mm. Jiujiang is largely an industrial city with numerous factories producing machinery, electronics, chemicals and textiles.
2.2. PM2.5 Monitoring Data
PM2.5 monitoring data was obtained from the Qingyue Open Environment Data
Figure 1. The locations of the 7 monitoring stations in the urban area of Jiujiang, China.
Center (data.epmap.org), derived from the hourly PM2.5 concentrations for 7 state-controlled monitoring stations (Jiujiang Bureau of Ecology and Environment). The PM2.5 concentrations were measured by chemiluminescence. The locations of the 7 monitoring stations within Jiujiang City are shown in Figure 1.
2.3. Predictor Variables
Elevation, population density, land use and road length were selected as predictor variables for this study. Basic information is provided for the predictor variables in Table 1. In order to account for the diffusion characteristics of air pollutants, buffer zones of various sizes were established for the predictive variables. For the land use variables, buffer zones were created with radii of 100, 250, 500, 1000, 2000, 3000 and 5000 meters. Within these buffer zones, the area of each land use type around monitoring stations were calculated. Similarly, for the road length variable, buffer zones were set with radii of 100, 250, 500, 750, 1000, 1500 and 2000 meters. The total road length around each monitoring station was calculated within these buffer zones. In total, 44 predictive variables were included and all data was processed using ArcGIS software v. 10.2 (ESRI China Information Technology Co., Ltd.).
Table 1. The basic information of predictive variables.
Predictor variables |
Data source |
Description |
Buffer radii (m) |
Land use |
The data FROM_GLC10_2017, obtained
from the Star Cloud Data Service
Platform of Pengcheng Laboratary
(data.starcloud.pcl.ac.cn). |
The spatial resolution was 10 m. There are 5 types
of land use as follow: vegetation (forest, grass,
shrub, wetland, tundra), impervious, bareland,
cropland, and waterbody (water, snow, ice). |
100, 250, 500,
1000, 2000,
3000, 5000 |
Population density |
WorldPop (hub.worldpop.org). |
Population counts in each cell (100 m × 100 m),
measured in 2020. |
NA |
Elevation |
The data ASTER GDEM V3, provided by Geospatial Data Cloud site,
Computer Network Information
Center, Chinese Academy of Sciences. (www.gscloud.cn). |
The spatial resolution was 30 m. Parsed through
2009 ASTER GDEM satellite remote sensing data,
released by NASA and METI on August 5, 2019. |
NA |
Road length |
Downloaded google map was vectorized to get vector data of Jiujiang road
network. |
The total length of all motorways, including
highways, urban expressways, urban arterial
roads, urban secondary arterial roads, urban
branch roads and rural roads. |
100, 250, 500,
750, 1000,
1500, 2000 |
2.4. Model Development
The LUR models were established as follows: 1) The daily average concentration of PM2.5 was calculated for each monitoring station as the dependent variable. 2) The 44 predictor variables were divided into 8 categories: elevation, population density, road length, vegetation, impervious, bare land, cropland and waterbody. Linear regression analyses were conducted between the dependent variable and each predictor variable, with the variable with the highest R2 value (named Ximax) selected for each category. 3) For each of the 8 categories, predictor variables that were highly correlated with Ximax were excluded (defined as a Pearson correlation coefficient greater than 0.6).
For the remaining variables, stepwise linear regression was performed based on the partial least squares method, to construct the LUR model. Variables were included in the model when they met a threshold of p < 0.05, while variables were removed from the model when they met a threshold of p > 0.10. Regression models were performed using R software v. 4.2.1.
2.5. Model Performance
The LUR model constructed in this study was evaluated using methods recommended by the European Study of Cohorts for Air Pollution Effects (ESCAPE) (Rob et al., 2013). Firstly, the larger adjusted R2 values in the final model corresponding to a better prediction effect. Secondly, leave-one-out-cross-validation (LOOCV) was performed, with a smaller root mean squared error (RMSE) or a larger adjusted R2 value in the final LOOCV model considered to represent a better final LUR model prediction accuracy (Wang et al., 2013).
2.6. Assessment of Individual PM2.5 Exposures Based on LUR Model Results
Based on the constructed daily LUR models, individual PM2.5 exposure levels were estimated for specific time periods based on the exposure conditions at work and home addresses. Spatial and temporal variations in PM2.5 exposure were considered using the following process: 1) Work and home addresses were converted to World Geodetic Coordinate System (WGS-84) coordinates; 2) Daily PM2.5 exposure levels were calculated for work and home addresses according to the constructed daily LUR models for Jiujiang City; 3) The daily PM2.5 exposures of each individual were estimated based on their hours spent at work and home; 4) Mean values were calculated as the average PM2.5 exposure level across all time periods.
2.7. Assessment of Individual Exposures Based on Portable PM2.5 Detector Data
To assess the accuracy of estimating individual exposure levels to PM2.5 based on LUR modeling techniques, a validation test was conducted from September 1st to 14th, 2023. In total, 20 volunteers who reside in the urban area of Jiujiang City were invited to participate in the study, by wearing a portable PM2.5 detector around their waist for a 14-day period in order to obtain their average PM2.5 exposure levels for this period. The portable detectors (CP-15-B5, Airhug, Beijing Yishan Technology Co. Ltd., China) utilized the principle of light scattering for measurements, with the laser scattered by dust particles and a photoelectric converter used to change the optical signal into an electrical signal. Subsequently, an algorithm was used to calculate the number of different particle sizes present in the air and the concentration of each particle fraction.
2.8. Measurement Method Comparison
In this study, the PM2.5 exposure levels of all 20 volunteers were measured from September 1st to 14th, 2023, using the LUR model and portable PM2.5 detector techniques. Scatter plots were prepared to describe the consistency of results and the paired t-test was used to further compare the differences between the data generated by the two methods. All statistical analyses were performed using R software v. 4.4.1, with the threshold for statistical significance set at 0.05.
3. Results
3.1. Daily LUR Models
Daily LUR models were constructed for the first 14 days of September 2023 and results showed that all 14 models had adjusted R2 values ranging from 0.85 to 0.94, while the RMSE values for LOOCV tests ranged from 3.53 to 4.88 μg/m3 and the R2 values of LOOCV tests were all greater than 0.61 (Table 2).
Table 2. The daily LUR model of PM2.5 concentration in Jiujiang urban area from September 1 to 14, 2023 (μg/m3).
Date |
Variable |
Beta |
p |
Adj-R2 |
LOOCV RMSE |
LOOCV R2 |
Sept. 1 |
Constant |
21.84 |
<0.001 |
0.90 |
3.58 |
0.75 |
|
Vege_2000 |
−1.03*10−4 |
0.011 |
|
|
|
|
Water_2000 |
1.27*10−4 |
0.012 |
|
|
|
Sept. 2 |
Constant |
23.34 |
<0.001 |
0.91 |
4.03 |
0.72 |
|
Vege_2000 |
−1.11*10−4 |
0.007 |
|
|
|
|
Water_2000 |
1.18*10−4 |
0.013 |
|
|
|
Sept. 3 |
Constant |
19.89 |
<0.001 |
0.89 |
4.11 |
0.68 |
|
Bare_5000 |
−2.97*10−3 |
0.012 |
|
|
|
|
Water_2000 |
1.47*10−4 |
0.005 |
|
|
|
Sept. 4 |
Constant |
26.09 |
<0.001 |
0.94 |
3.53 |
0.82 |
|
Vege_3000 |
−0.54*10−4 |
<0.001 |
|
|
|
|
Water_2000 |
1.12*10−4 |
<0.001 |
|
|
|
|
Crop_3000 |
−0.11*10−4 |
0.018 |
|
|
|
Sept. 5 |
Constant |
27.12 |
<0.001 |
0.85 |
4.88 |
0.61 |
|
Vege_3000 |
−0.64*10−4 |
0.001 |
|
|
|
Sept. 6 |
Constant |
21.13 |
<0.001 |
0.88 |
4.15 |
0.66 |
|
Vege_2000 |
−1.07*10−4 |
0.019 |
|
|
|
|
Water_2000 |
1.31*10−4 |
0.022 |
|
|
|
Sept. 7 |
Constant |
24.15 |
<0.001 |
0.90 |
3.97 |
0.70 |
|
Vege_2000 |
−1.20*10−4 |
0.006 |
|
|
|
|
Water_2000 |
0.80*10−4 |
0.050 |
|
|
|
Sept. 8 |
Constant |
25.81 |
<0.001 |
0.89 |
4.12 |
0.67 |
|
Vege_2000 |
−1.27*10−4 |
0.010 |
|
|
|
|
Water_2000 |
1.12*10−4 |
0.033 |
|
|
|
Sept. 9 |
Constant |
28.16 |
<0.001 |
0.89 |
4.16 |
0.65 |
|
Vege_2000 |
−1.43*10−4 |
0.008 |
|
|
|
|
Water_2000 |
1.32*10−4 |
0.023 |
|
|
|
Sept. 10 |
Constant |
24.92 |
<0.001 |
0.87 |
4.49 |
0.61 |
|
Vege_2000 |
−1.24*10−4 |
0.011 |
|
|
|
|
Water_2000 |
1.14*10−4 |
0.032 |
|
|
|
Sept. 11 |
Constant |
29.77 |
<0.001 |
0.90 |
3.99 |
0.69 |
|
Vege_3000 |
−0.64*10−4 |
0.003 |
|
|
|
|
Water_2000 |
1.03*10−4 |
0.030 |
|
|
|
Sept. 12 |
Constant |
30.06 |
<0.001 |
0.87 |
4.51 |
0.66 |
|
Vege_2000 |
−1.23*10−4 |
0.011 |
|
|
|
|
Water_2000 |
1.09*10−4 |
0.035 |
|
|
|
Sept. 13 |
Constant |
9.43 |
<0.001 |
0.92 |
3.82 |
0.78 |
|
Bare_5000 |
−2.22*10−3 |
0.012 |
|
|
|
|
Water_2000 |
1.56*10−4 |
0.001 |
|
|
|
Sept. 14 |
Constant |
18.24 |
<0.001 |
0.89 |
4.04 |
0.68 |
|
Vege_2000 |
−1.01*10−4 |
0.013 |
|
|
|
|
Water_2000 |
1.17*10−4 |
0.018 |
|
|
|
Adj-R2: adjusted R2. LOOCV: leave-one-out cross-validation. RMSE: root mean square error. Vege_2000: grid number of vegetated land within a radius of 2000 m. Vege_3000: grid number of vegetated land within a radius of 3000 m. Water_2000: grid number of water bodies within a radius of 2000 m. Bare_5000: grid number of bareland within a radius of 5000 m. Crop_3000: grid number of cropland within a radius of 3000 m.
3.2. Accuracy of the LUR Model for the Measurement of Long-Term
PM2.5 Exposures
For the 20 study volunteers, the average PM2.5 level estimated using the LUR model technique was 23.47 ± 9.00 μg/m3, while the level measured using the portable detector was 22.64 ± 6.15 μg/m3. The paired design t-test did not identify any statistically significant differences between the results of the two methods (t = 1.053, p = 0.306). In addition, the scatterplot shown in Figure 2 allowed further visual comparison of the consistency of results generated using the two methods.
Figure 2. Scatterplot comparing LUR modeling technique measurements to portable detector measurements.
4. Discussion
As the dependent variable for the construction of LUR model, the quality of PM2.5 monitoring data plays an important role in the quality of modeled predictions. The sources of monitoring data mainly include experimental design monitoring, routine monitoring and vehicle-mounted mobile monitoring. Among these, data from routine monitoring methods come from existing monitoring sites, usually managed and maintained by the governmental environmental protection department (Ma et al., 2024). Conventionally used monitoring data is often based on continuous monitoring, with the low cost of data collection making this the most commonly used approach for current global research (Alan et al., 2024; Ebrahimi et al., 2024; De et al., 2013). The PM2.5 monitoring data utilized in this study were based on routine monitoring station data for Jiujiang City.
When constructing a LUR model for predicting PM2.5 concentrations, any factor potentially associated with PM2.5 levels is typically utilized as a predictor variable. Commonly adopted predictor variables include land use, road information, population density, altitude, and the distribution of pollution emission sources. Land use serves as the core predictor variable in LUR models, with the accuracy and classification of land use data varying across different studies (Ma et al., 2024). The primary factor influencing the quality of land use data is data accessibility. Road information constitutes another critical predictor variable, primarily reflecting air pollutants emitted by transportation. The most direct indicator of traffic-related pollution is road traffic volume; however, accurate traffic volume data are often not readily available. Therefore, this study employed the total length of roads as a proxy for traffic pollution, which correlation analyses demonstrated to be strongly associated with PM2.5 concentrations at monitoring stations.
Since air pollutants are characterized by diffusion, it is usually necessary to set a number of buffers for each type of predictor variable in order to find the most relevant predictor variable for PM2.5 concentration. For the setting of the size of the buffer radius, a minimum radius is usually determined first, and then it is incremented by a certain law until the maximum radius. For example, in this study, we set buffers with radii of 100 m, 250 m, 500 m, 1000 m, 2000 m, 3000 m, and 5000 m for the vegetation land variables, and then explore the optimal buffer size from them. The setting of the minimum radius of the buffer is usually related to the accuracy of the data, while the setting of the maximum radius is related to the dispersion pattern of the pollutants (Azmi et al., 2023).
To date, no uniform standard has been established for the number of monitoring sites required for accurate LUR model performance. The ESCAPE study is one of the largest LUR model studies performed worldwide, involving multiple research institutions with 20 to 80 monitoring stations in each city, allowing its research method to be applied as a reference standard (Wang et al., 2013). However, it remains unclear whether LUR studies in small and medium-sized urban areas with few monitoring stations, are able to provide accurate data. It has been reported that the accuracy of the LUR model is not only related to the number of monitoring points, but also the quality of data for the predictive variables and land use classifications (Ma et al., 2024). A LUR study conducted in Sao Paulo, Brazil, used data from only 9 air quality monitoring sites, although the R2 value of the LUR model used to fit PM10 concentrations reached 64% (Habermann & Gouveia, 2013). These results show that when the data quality of predictive variables is high and variables are classified in a logical and reasonable manner, LUR models may still achieve high prediction accuracies when based on a small number of monitoring sites.
It is generally accepted that prediction models and LOOCV tests with larger adjusted R2 values, result in more accurate LUR model predictions. Ma et al. (2024) reviewed 155 LUR model studies conducted between 2011 and 2023, showing that the prediction accuracy R2 value of LUR models worldwide generally ranged between 50% and 90% (Ma et al., 2024). In the present study, the prediction accuracy R2 value of the LUR model ranged from 85% to 94%, indicating a high overall prediction accuracy. In addition, most LUR models established to date, have LOOCV test R2 values in the range of 40% to 80%, with the LUR model constructed in this study also falling within this range.
In this study, based on the constructed daily LUR models, the long-term PM2.5 exposure levels of individuals were further assessed according to their work and home addresses. The LUR modeling method established in this study was found to achieve good accuracy based on a comparison with portable air pollution detector data. Considering that the LUR modeling method does not require the cooperation of study subjects, it is more convenient to apply than the portable detector method, making it highly suitable for practical application in large-sample epidemiological studies to assess the long-term air-pollution exposure level of individuals, especially in populations who have fixed working and living environments.
5. Conclusion
This study established that daily LUR models can be constructed effectively for the assessment of individual long-term PM2.5 exposure levels based on work and home addresses. These findings suggest that this is an effective method for the estimation of long-term PM2.5 exposure levels for epidemiological studies in urban regions such as Jiujiang City, providing a technical reference for the accurate assessment of long-term air pollution exposure levels in populations of regions with few monitoring stations.
Acknowledgements
We acknowledge all volunteers who participated in the present study. This work was supported by Jiangxi Provincial Natural Science Foundation (grant number 20232BAB206143), Guiding Science and Technology Projects of Ji’an city (grant number 20233-023456), and Social Science Planning Program of Ji’an City (grant number 23GHA665).