Use of a Land Use Regression Model Methodology for the Estimation of Individual Long-Term PM2.5 Exposure Profiles of Urban Residents in Jiujiang City, China

Abstract

The purpose of this study was to establish a method able to accurately estimate the long-term exposure levels of individuals to fine particulate matter (PM2.5) in Jiujiang City (China) by constructing land use regression (LUR) models. Subsequently, the accuracy of models was further verified. PM2.5 concentrations were continuously collected daily from seven monitoring stations for the construction of daily LUR models from September 1 to 14, 2023. The constructed models used PM2.5 concentrations as the dependent variable, while land use, elevation, population density and road length were used as the predictive variables. Subsequently, twenty volunteers were invited to participate, with their daily PM2.5 exposure estimated based on their work address and home address, allowing their average exposure levels to be calculated. Furthermore, volunteers wore portable PM2.5 detectors continuously for a 14-day period and the average measured PM2.5 level was used as a comparative standard. Results showed that the adjusted R2 values for the 14 daily models ranged from 0.85 to 0.94, with the R2 values generated from leave-one-out-cross-validation tests all greater than 0.61, indicating good prediction accuracy. No significant differences were observed in the measurement accuracy of the LUR modeling method and measurements using a portable PM2.5 detector (p > 0.05). This study aimed to develop a novel method for the accurate and convenient measurement of individual long-term PM2.5 exposure levels for epidemiological studies in urban environments comparable to that of Jiujiang city.

Share and Cite:

Wang, W. Y. and Hu, S. S. (2025) Use of a Land Use Regression Model Methodology for the Estimation of Individual Long-Term PM2.5 Exposure Profiles of Urban Residents in Jiujiang City, China. Journal of Geoscience and Environment Protection, 13, 233-243. doi: 10.4236/gep.2025.131012.

1. Introduction

Fine particulate matter (PM2.5) is a common pollutant and ubiquitous component of urban air pollution, which mainly originates from industrial emissions, construction sites and vehicle exhaust emissions. Epidemiological studies have comprehensively established the adverse effects of PM2.5 on health, with long-term exposure to PM2.5 shown to be associated with various diseases such as cancer and respiratory disorders, as well as increased mortality rates (Moon et al., 2024; Bhavsar et al., 2024; Chen et al., 2024).

When investigating the impact of long-term exposure to fine particulate matter (PM2.5) on health, determining the level of PM2.5 exposure by wearing portable detectors presents a significant challenge, requiring the compliance of study subjects and large equipment and organization costs. Therefore, establishing methods for the efficient and accurate assessment of individual long-term PM2.5 exposure levels, is a key focus in the field of toxicological and epidemiological research. Land use regression (LUR) model technologies are widely used in the field of environmental science to describe the spatial and temporal distribution of urban air pollution. The LUR modeling technique was first proposed by Briggs et al. (1997). The core concept involves utilizing multiple variables centered on land use data as predictor variables, and air pollutant concentration data from existing monitoring stations as dependent variables, to construct a multiple regression model. This model enables the prediction of air pollutant concentrations in areas not covered by monitoring stations (Briggs et al., 1997). LUR models have been applied to epidemiological studies in recent years, providing the advantages of high efficiency, high precision and wide practical applicability for the assessment of long-term air pollution exposure levels (Nguyen et al., 2025; Gayraud et al., 2024; Rhee et al., 2024). However, existing LUR studies have been conducted mainly in medium and large cities, where a large number of air pollution monitoring stations are available, while it remains unclear whether LUR models can be effectively constructed for the prediction of long-term PM2.5 exposure levels for individuals in small cities with sparse monitoring station coverage.

Jiujiang City is located in Jiangxi Province in central China, it is a small city with only 7 air pollution monitoring stations distributed within the urban area. In order to provide accurate pollutant determination techniques for environmental and epidemiological investigations, this study developed a method that is able to conveniently and accurately assess long-term exposure to PM2.5 by individuals living in urban areas of Jiujiang City, based on LUR modeling techniques. Daily LUR models were constructed and the performance of the models was validated by inviting volunteers to estimate their PM2.5 exposure based on their time spent at work and home addresses. Furthermore, the volunteers were also asked to wear portable detectors during a 14-day period and the results of the two methods were compared in order to evaluate the accuracy of the LUR modeling technique.

2. Methods

2.1. Study Area

The study area was located in the urban area of Jiujiang City (China), which includes 2 administrative districts, Xunyang District and Lianxi District (29˚31'31'' - 29˚46'3''N, 115˚53'3'' - 116˚13'55''E) with a total area of 413.5 Km2 and a population of 810,000 in 2022. The highest altitude in Jiujiang City is 791.0 meters, while the lowest altitude is −9.37 meters. Jiujiang City is located in a subtropical monsoon climate zone, with an average annual temperature of 16˚C - 17˚C and an annual rainfall level of 1300 - 1600 mm. Jiujiang is largely an industrial city with numerous factories producing machinery, electronics, chemicals and textiles.

2.2. PM2.5 Monitoring Data

PM2.5 monitoring data was obtained from the Qingyue Open Environment Data

Figure 1. The locations of the 7 monitoring stations in the urban area of Jiujiang, China.

Center (data.epmap.org), derived from the hourly PM2.5 concentrations for 7 state-controlled monitoring stations (Jiujiang Bureau of Ecology and Environment). The PM2.5 concentrations were measured by chemiluminescence. The locations of the 7 monitoring stations within Jiujiang City are shown in Figure 1.

2.3. Predictor Variables

Elevation, population density, land use and road length were selected as predictor variables for this study. Basic information is provided for the predictor variables in Table 1. In order to account for the diffusion characteristics of air pollutants, buffer zones of various sizes were established for the predictive variables. For the land use variables, buffer zones were created with radii of 100, 250, 500, 1000, 2000, 3000 and 5000 meters. Within these buffer zones, the area of each land use type around monitoring stations were calculated. Similarly, for the road length variable, buffer zones were set with radii of 100, 250, 500, 750, 1000, 1500 and 2000 meters. The total road length around each monitoring station was calculated within these buffer zones. In total, 44 predictive variables were included and all data was processed using ArcGIS software v. 10.2 (ESRI China Information Technology Co., Ltd.).

Table 1. The basic information of predictive variables.

Predictor

variables

Data source

Description

Buffer radii (m)

Land use

The data FROM_GLC10_2017, obtained from the Star Cloud Data Service Platform of Pengcheng Laboratary (data.starcloud.pcl.ac.cn).

The spatial resolution was 10 m. There are 5 types of land use as follow: vegetation (forest, grass, shrub, wetland, tundra), impervious, bareland, cropland, and waterbody (water, snow, ice).

100, 250, 500, 1000, 2000, 3000, 5000

Population density

WorldPop (hub.worldpop.org).

Population counts in each cell (100 m × 100 m), measured in 2020.

NA

Elevation

The data ASTER GDEM V3, provided by Geospatial Data Cloud site, Computer Network Information Center, Chinese Academy of Sciences. (www.gscloud.cn).

The spatial resolution was 30 m. Parsed through 2009 ASTER GDEM satellite remote sensing data, released by NASA and METI on August 5, 2019.

NA

Road length

Downloaded google map was vectorized to get vector data of Jiujiang road network.

The total length of all motorways, including highways, urban expressways, urban arterial roads, urban secondary arterial roads, urban branch roads and rural roads.

100, 250, 500, 750, 1000, 1500, 2000

2.4. Model Development

The LUR models were established as follows: 1) The daily average concentration of PM2.5 was calculated for each monitoring station as the dependent variable. 2) The 44 predictor variables were divided into 8 categories: elevation, population density, road length, vegetation, impervious, bare land, cropland and waterbody. Linear regression analyses were conducted between the dependent variable and each predictor variable, with the variable with the highest R2 value (named Ximax) selected for each category. 3) For each of the 8 categories, predictor variables that were highly correlated with Ximax were excluded (defined as a Pearson correlation coefficient greater than 0.6).

For the remaining variables, stepwise linear regression was performed based on the partial least squares method, to construct the LUR model. Variables were included in the model when they met a threshold of p < 0.05, while variables were removed from the model when they met a threshold of p > 0.10. Regression models were performed using R software v. 4.2.1.

2.5. Model Performance

The LUR model constructed in this study was evaluated using methods recommended by the European Study of Cohorts for Air Pollution Effects (ESCAPE) (Rob et al., 2013). Firstly, the larger adjusted R2 values in the final model corresponding to a better prediction effect. Secondly, leave-one-out-cross-validation (LOOCV) was performed, with a smaller root mean squared error (RMSE) or a larger adjusted R2 value in the final LOOCV model considered to represent a better final LUR model prediction accuracy (Wang et al., 2013).

2.6. Assessment of Individual PM2.5 Exposures Based on LUR Model Results

Based on the constructed daily LUR models, individual PM2.5 exposure levels were estimated for specific time periods based on the exposure conditions at work and home addresses. Spatial and temporal variations in PM2.5 exposure were considered using the following process: 1) Work and home addresses were converted to World Geodetic Coordinate System (WGS-84) coordinates; 2) Daily PM2.5 exposure levels were calculated for work and home addresses according to the constructed daily LUR models for Jiujiang City; 3) The daily PM2.5 exposures of each individual were estimated based on their hours spent at work and home; 4) Mean values were calculated as the average PM2.5 exposure level across all time periods.

2.7. Assessment of Individual Exposures Based on Portable PM2.5 Detector Data

To assess the accuracy of estimating individual exposure levels to PM2.5 based on LUR modeling techniques, a validation test was conducted from September 1st to 14th, 2023. In total, 20 volunteers who reside in the urban area of Jiujiang City were invited to participate in the study, by wearing a portable PM2.5 detector around their waist for a 14-day period in order to obtain their average PM2.5 exposure levels for this period. The portable detectors (CP-15-B5, Airhug, Beijing Yishan Technology Co. Ltd., China) utilized the principle of light scattering for measurements, with the laser scattered by dust particles and a photoelectric converter used to change the optical signal into an electrical signal. Subsequently, an algorithm was used to calculate the number of different particle sizes present in the air and the concentration of each particle fraction.

2.8. Measurement Method Comparison

In this study, the PM2.5 exposure levels of all 20 volunteers were measured from September 1st to 14th, 2023, using the LUR model and portable PM2.5 detector techniques. Scatter plots were prepared to describe the consistency of results and the paired t-test was used to further compare the differences between the data generated by the two methods. All statistical analyses were performed using R software v. 4.4.1, with the threshold for statistical significance set at 0.05.

3. Results

3.1. Daily LUR Models

Daily LUR models were constructed for the first 14 days of September 2023 and results showed that all 14 models had adjusted R2 values ranging from 0.85 to 0.94, while the RMSE values for LOOCV tests ranged from 3.53 to 4.88 μg/m3 and the R2 values of LOOCV tests were all greater than 0.61 (Table 2).

Table 2. The daily LUR model of PM2.5 concentration in Jiujiang urban area from September 1 to 14, 2023 (μg/m3).

Date

Variable

Beta

p

Adj-R2

LOOCV RMSE

LOOCV R2

Sept. 1

Constant

21.84

<0.001

0.90

3.58

0.75

Vege_2000

−1.03*104

0.011

Water_2000

1.27*104

0.012

Sept. 2

Constant

23.34

<0.001

0.91

4.03

0.72

Vege_2000

−1.11*104

0.007

Water_2000

1.18*104

0.013

Sept. 3

Constant

19.89

<0.001

0.89

4.11

0.68

Bare_5000

−2.97*103

0.012

Water_2000

1.47*104

0.005

Sept. 4

Constant

26.09

<0.001

0.94

3.53

0.82

Vege_3000

−0.54*104

<0.001

Water_2000

1.12*104

<0.001

Crop_3000

−0.11*104

0.018

Sept. 5

Constant

27.12

<0.001

0.85

4.88

0.61

Vege_3000

−0.64*104

0.001

Sept. 6

Constant

21.13

<0.001

0.88

4.15

0.66

Vege_2000

−1.07*104

0.019

Water_2000

1.31*104

0.022

Sept. 7

Constant

24.15

<0.001

0.90

3.97

0.70

Vege_2000

−1.20*10−4

0.006

Water_2000

0.80*10−4

0.050

Sept. 8

Constant

25.81

<0.001

0.89

4.12

0.67

Vege_2000

−1.27*10−4

0.010

Water_2000

1.12*10−4

0.033

Sept. 9

Constant

28.16

<0.001

0.89

4.16

0.65

Vege_2000

−1.43*10−4

0.008

Water_2000

1.32*10−4

0.023

Sept. 10

Constant

24.92

<0.001

0.87

4.49

0.61

Vege_2000

−1.24*10−4

0.011

Water_2000

1.14*10−4

0.032

Sept. 11

Constant

29.77

<0.001

0.90

3.99

0.69

Vege_3000

−0.64*10−4

0.003

Water_2000

1.03*10−4

0.030

Sept. 12

Constant

30.06

<0.001

0.87

4.51

0.66

Vege_2000

−1.23*10−4

0.011

Water_2000

1.09*10−4

0.035

Sept. 13

Constant

9.43

<0.001

0.92

3.82

0.78

Bare_5000

−2.22*10−3

0.012

Water_2000

1.56*10−4

0.001

Sept. 14

Constant

18.24

<0.001

0.89

4.04

0.68

Vege_2000

−1.01*10−4

0.013

Water_2000

1.17*10−4

0.018

Adj-R2: adjusted R2. LOOCV: leave-one-out cross-validation. RMSE: root mean square error. Vege_2000: grid number of vegetated land within a radius of 2000 m. Vege_3000: grid number of vegetated land within a radius of 3000 m. Water_2000: grid number of water bodies within a radius of 2000 m. Bare_5000: grid number of bareland within a radius of 5000 m. Crop_3000: grid number of cropland within a radius of 3000 m.

3.2. Accuracy of the LUR Model for the Measurement of Long-Term PM2.5 Exposures

For the 20 study volunteers, the average PM2.5 level estimated using the LUR model technique was 23.47 ± 9.00 μg/m3, while the level measured using the portable detector was 22.64 ± 6.15 μg/m3. The paired design t-test did not identify any statistically significant differences between the results of the two methods (t = 1.053, p = 0.306). In addition, the scatterplot shown in Figure 2 allowed further visual comparison of the consistency of results generated using the two methods.

Figure 2. Scatterplot comparing LUR modeling technique measurements to portable detector measurements.

4. Discussion

As the dependent variable for the construction of LUR model, the quality of PM2.5 monitoring data plays an important role in the quality of modeled predictions. The sources of monitoring data mainly include experimental design monitoring, routine monitoring and vehicle-mounted mobile monitoring. Among these, data from routine monitoring methods come from existing monitoring sites, usually managed and maintained by the governmental environmental protection department (Ma et al., 2024). Conventionally used monitoring data is often based on continuous monitoring, with the low cost of data collection making this the most commonly used approach for current global research (Alan et al., 2024; Ebrahimi et al., 2024; De et al., 2013). The PM2.5 monitoring data utilized in this study were based on routine monitoring station data for Jiujiang City.

When constructing a LUR model for predicting PM2.5 concentrations, any factor potentially associated with PM2.5 levels is typically utilized as a predictor variable. Commonly adopted predictor variables include land use, road information, population density, altitude, and the distribution of pollution emission sources. Land use serves as the core predictor variable in LUR models, with the accuracy and classification of land use data varying across different studies (Ma et al., 2024). The primary factor influencing the quality of land use data is data accessibility. Road information constitutes another critical predictor variable, primarily reflecting air pollutants emitted by transportation. The most direct indicator of traffic-related pollution is road traffic volume; however, accurate traffic volume data are often not readily available. Therefore, this study employed the total length of roads as a proxy for traffic pollution, which correlation analyses demonstrated to be strongly associated with PM2.5 concentrations at monitoring stations.

Since air pollutants are characterized by diffusion, it is usually necessary to set a number of buffers for each type of predictor variable in order to find the most relevant predictor variable for PM2.5 concentration. For the setting of the size of the buffer radius, a minimum radius is usually determined first, and then it is incremented by a certain law until the maximum radius. For example, in this study, we set buffers with radii of 100 m, 250 m, 500 m, 1000 m, 2000 m, 3000 m, and 5000 m for the vegetation land variables, and then explore the optimal buffer size from them. The setting of the minimum radius of the buffer is usually related to the accuracy of the data, while the setting of the maximum radius is related to the dispersion pattern of the pollutants (Azmi et al., 2023).

To date, no uniform standard has been established for the number of monitoring sites required for accurate LUR model performance. The ESCAPE study is one of the largest LUR model studies performed worldwide, involving multiple research institutions with 20 to 80 monitoring stations in each city, allowing its research method to be applied as a reference standard (Wang et al., 2013). However, it remains unclear whether LUR studies in small and medium-sized urban areas with few monitoring stations, are able to provide accurate data. It has been reported that the accuracy of the LUR model is not only related to the number of monitoring points, but also the quality of data for the predictive variables and land use classifications (Ma et al., 2024). A LUR study conducted in Sao Paulo, Brazil, used data from only 9 air quality monitoring sites, although the R2 value of the LUR model used to fit PM10 concentrations reached 64% (Habermann & Gouveia, 2013). These results show that when the data quality of predictive variables is high and variables are classified in a logical and reasonable manner, LUR models may still achieve high prediction accuracies when based on a small number of monitoring sites.

It is generally accepted that prediction models and LOOCV tests with larger adjusted R2 values, result in more accurate LUR model predictions. Ma et al. (2024) reviewed 155 LUR model studies conducted between 2011 and 2023, showing that the prediction accuracy R2 value of LUR models worldwide generally ranged between 50% and 90% (Ma et al., 2024). In the present study, the prediction accuracy R2 value of the LUR model ranged from 85% to 94%, indicating a high overall prediction accuracy. In addition, most LUR models established to date, have LOOCV test R2 values in the range of 40% to 80%, with the LUR model constructed in this study also falling within this range.

In this study, based on the constructed daily LUR models, the long-term PM2.5 exposure levels of individuals were further assessed according to their work and home addresses. The LUR modeling method established in this study was found to achieve good accuracy based on a comparison with portable air pollution detector data. Considering that the LUR modeling method does not require the cooperation of study subjects, it is more convenient to apply than the portable detector method, making it highly suitable for practical application in large-sample epidemiological studies to assess the long-term air-pollution exposure level of individuals, especially in populations who have fixed working and living environments.

5. Conclusion

This study established that daily LUR models can be constructed effectively for the assessment of individual long-term PM2.5 exposure levels based on work and home addresses. These findings suggest that this is an effective method for the estimation of long-term PM2.5 exposure levels for epidemiological studies in urban regions such as Jiujiang City, providing a technical reference for the accurate assessment of long-term air pollution exposure levels in populations of regions with few monitoring stations.

Acknowledgements

We acknowledge all volunteers who participated in the present study. This work was supported by Jiangxi Provincial Natural Science Foundation (grant number 20232BAB206143), Guiding Science and Technology Projects of Ji’an city (grant number 20233-023456), and Social Science Planning Program of Ji’an City (grant number 23GHA665).

Conflicts of Interest

The authors declare no conflicts of interest regarding the publication of this paper.

References

[1] Alan, D., Payam, D., Marta, C., Gustavo, A., Lluis, B., Maria, F. et al. (2024) Development of Land Use Regression, Dispersion, and Hybrid Models for Prediction of Outdoor Air Pollution Exposure in Barcelona. Science of the Total Environment, 954, Article 176632.
https://doi.org/10.1016/j.scitotenv.2024.176632
[2] Azmi, W. N. F. W., Pillai, T. R., Latif, M. T., Koshy, S., & Shaharudin, R. (2023). Application of Land Use Regression Model to Assess Outdoor Air Pollution Exposure: A Review. Environmental Advances, 11, Article 100353.
https://doi.org/10.1016/j.envadv.2023.100353
[3] Bhavsar, N. A., Jowers, K., Yang, L. Z., Guha, S., Lin, X., Peskoe, S. et al. (2024). The Association between Long-Term PM2.5 Exposure and Risk for Pancreatic Cancer: An Application of Social Informatics. American Journal of Epidemiology, 2024, kwae271.
https://doi.org/10.1093/aje/kwae271
[4] Briggs, D. J., Collins, S., Elliott, P., Fischer, P., Kingham, S., Lebret, E. et al. (1997). Mapping Urban Air Pollution Using GIS: A Regression-Based Approach. International Journal of Geographical Information Science, 11, 699-718.
https://doi.org/10.1080/136588197242158
[5] Chen, C., Huang, K., Chen, C., Chang, Y., Li, H., Wang, T. et al. (2024). The Role of PM2.5 Exposure in Lung Cancer: Mechanisms, Genetic Factors, and Clinical Implications. EMBO Molecular Medicine, 17, 31-40.
https://doi.org/10.1038/s44321-024-00175-2
[6] De, H. K., Wang, M., Adam, M., Badaloni, C., Beelen, R., Birk, M. et al. (2013) Development of Land Use Regression Models for Particle Composition in Twenty Study Areas in Europe. Environmental Science & Technology, 47, 5778-5786.
https://doi.org/10.1021/es400156t
[7] Ebrahimi, A. A., Baziar, M., & Zakeri, H. R. (2024). Investigating the Impact of Urban-Environmental Factors on Air Pollutants: A Land Use Regression Model Approach and Health Risk Assessment. Environmental Geochemistry and Health, 46, Article No. 313.
https://doi.org/10.1007/s10653-024-02103-2
[8] Gayraud, L., Mortamais, M., Schweitzer, C., de Hoogh, K., Cougnard-Grégoire, A., Korobelnik, J. et al. (2024). Ambient Air Pollution Exposure and Incidence of Cataract Surgery: The Prospective 3 City-Alienor Study. Acta Ophthalmologica, 2024, 1-8.
https://doi.org/10.1111/aos.16790
[9] Habermann, M., & Gouveia, N. (2013). Aplicação de regressão baseada no uso do solo para predizer a concentração de material particulado inalável no município de São Paulo, Brasil. Engenharia Sanitaria e Ambiental, 17, 155-162.
https://doi.org/10.1590/s1413-41522012000200004
[10] Ma, X., Zou, B., Deng, J., Gao, J., Longley, I., Xiao, S. et al. (2024). A Comprehensive Review of the Development of Land Use Regression Approaches for Modeling Spatiotemporal Variations of Ambient Air Pollution: A Perspective from 2011 to 2023. Environment International, 183, Article 108430.
https://doi.org/10.1016/j.envint.2024.108430
[11] Moon, J., Kim, E., Jang, H., Song, I., Kwon, D., Kang, C. et al. (2024). Long-Term Exposure to PM2.5 and Mortality: A National Health Insurance Cohort Study. International Journal of Epidemiology, 53, dyae140.
https://doi.org/10.1093/ije/dyae140
[12] Nguyen Thi Khanh, H., Rigau-Sabadell, M., Khomenko, S., Pereira Barboza, E., Cirach, M., Duarte-Salles, T. et al. (2025). Ambient Air Pollution, Urban Green Space and Childhood Overweight and Obesity: A Health Impact Assessment for Barcelona, Spain. Environmental Research, 264, Article 120306.
https://doi.org/10.1016/j.envres.2024.120306
[13] Rhee, T., Ji, Y., Yang, S., Lee, H., Park, J., Kim, H. et al. (2024). Combined Effect of Air Pollution and Genetic Risk on Incident Cardiovascular Diseases. Journal of the American Heart Association, 13, e033497.
https://doi.org/10.1161/jaha.123.033497
[14] Rob, B., Gerard, H., Danielle, V., Marloes, E., Konstantina, D., Xanthi, P. et al. (2013) Development of NO2 and NOx Land Use Regression Models for Estimating Air Pollution Exposure in 36 Study Areas in Europe—The ESCAPE Project. Atmospheric Environment, 72, 10-23.
https://doi.org/10.1016/j.atmosenv.2013.02.037
[15] Wang, M., Beelen, R., Basagana, X., Becker, T., Cesaroni, G., de Hoogh, K. et al. (2013). Evaluation of Land Use Regression Models for NO2 and Particulate Matter in 20 European Study Areas: The ESCAPE Project. Environmental Science & Technology, 47, 4357-4364.
https://doi.org/10.1021/es305129t

Copyright © 2025 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.