Spatiotemporal Analysis of Groundwater Recharge under Urbanization and Climate Variability in Milan Province, Italy ()
1. Introduction
Groundwater recharge (GR) is the water that infiltrates through the unsaturated zone to the water table and thus increases the volume of water in the saturated zone, playing a pivotal role in ensuring the sustainability of groundwater resources [1] [2]. The factors influencing GR are numerous and interconnected, including land usage, geological conditions, soil composition, and climatic conditions [3]. Among these, anthropogenic activities of land use and land cover change are the most dynamic and influential factors [4]-[7].
Land use is a primary anthropogenic transformation that has altered the Earth’s surface, thereby impacting all of the planet’s ecological functions [8]. Land use change stands as a pivotal factor propelling shifts in both human activity and the natural environment, and must therefore be accurately quantified to understand the impacts of such changes [9]. Land use and land cover changes have a significant influence on global and regional climate and water balance [10]. It disrupts the processes of interception, percolation, evapotranspiration, rainfall, infiltration, and surface runoff generation by directly altering the physiological and structural characteristics of the land environment. Additionally, it indirectly influences the atmospheric and soil boundary layers [11] [12].
Land usage is changing rapidly in many emerging nations across the world, primarily due to population growth and artificial surface development [13]. The rapid expansion of artificial surfaces poses a significant challenge to water resources [14], as it increases surface runoff over impermeable areas and accelerates flow volume toward receiving water bodies via urban drainage networks [15] [16]. Furthermore, considering more flow in artificial surfaces not only results in a reduction in groundwater recharge but also contributes to an increase in flood risk due to additional surface runoff [17] [18]. Urban infrastructure also introduces microclimatic variations that influence evapotranspiration and soil moisture retention [19].
In addition to land use, climate change plays a critical role in GR dynamics. Studies by Guo et al. [20], and Fu et al. [21] illustrate that climatic factors such as temperature rise and altered precipitation patterns can significantly impact recharge processes. The combined effects of urbanization and climate change are complex, requiring integrated analysis to understand and manage GR effectively.
In the last 20 years, climate and land use change have been causing noticeable shifts in hydrological and meteorological patterns, with increasing or decreasing trends. To identify these trends, researchers often use methods like the Mann-Kendall (MK) test [22] [23], Sen’s slope [24], and the linear regression trend test to find valuable trends and forecast the situation for the future [25]. Linear Regression (LR) is a fundamental statistical method used to quantify the relationship between one or more independent variables and a dependent variable through a linear equation. To better understand and forecast recharge dynamics under various environmental situations, LR has been used more and more in GR research in recent years. For example, Mogaji et al. [26] used LR to forecast and estimate GR rates in southern Perak, Malaysia, by examining the correlation between rainfall and geophysical factors. The relationship between groundwater and regional air temperatures was also utilized by Figura et al. [27] to forecast groundwater temperatures in several Swiss aquifers using LR. Fu et al. [21] identified which factors should be included when investigating the impact of climate change on groundwater recharge.
This study examines the long-term dynamics of GR in Milan Province, Italy, from 1951 to 2023, considering climate, hydrological, and land use variability. Temporal and spatial trends were detected using the Mann-Kendall test and Sen’s slope, while Spearman correlation identified the relationships among all variables. Stepwise linear regression highlighted the most influential factors, offering insight into how urbanization and climate change have shaped GR patterns over time.
2. Materials
2.1. Study Area
The study area is the Milan Province, located in the Lombardy region, in northern Italy. The area falls within the Po River Plain, and it encompasses 189 municipalities. The area is still dominated by agricultural land use, but with city expansion and urbanization, it is decreasing over time. The rapid urban expansion and land use changes in Milan Province pose, in fact, challenges to GR [28]. Figure 1 shows the location of the Milan province in Italy.
Figure 1. Milan province, Italy.
2.2. Data Collection
2.2.1. Hydrological Data
The BIGBANG model, developed by the Italian Institute for Environmental Protection and Research (ISPRA), offers a comprehensive dataset of key hydrological variables across Italy from 1951 to 2023. The data is organized on a 1 km2 grid aligned with the European Environment Agency’s Reference Grid, using the ETRS89 datum and the Lambert Azimuthal Equal Area (LAEA) projection to ensure spatial consistency [29]. In this study, the analysis is based on key hydroclimatic indicators, including total precipitation (TP), actual evapotranspiration (AET), groundwater recharge (GR), surface runoff (R), soil water content at a depth of 1 meter (SW), and mean temperature (T). For compatibility with the study region, the data set was geographically aggregated to the administrative boundaries of the Milan province. To ensure consistency and avoid boundary-related distortions, only those grid cells that fall entirely within the administrative boundaries of Milan Province were included in the analysis. This decision was made to maintain equality in the spatial representation of each cell and to eliminate the potential bias that may arise from partially overlapping cells at the province edges. The final research area, as shown in Figure 2, is made up of 1787 grid cells totaling 1787 km2.
![]()
Figure 2. Gridded map of Milan province into cells with 1 km2 of area.
2.2.2. Land Use Data
Land use information was extracted from the Regione Lombardia geoportal based on the CORINE Land Cover (CLC) terminology [30] for 1954, 1980, 2000, 2007, 2012, 2015, 2018, and 2023. To ensure spatial coherence with the hydroclimatic variables provided by the ISPRA dataset, land use maps were reprojected to the same Coordinate Reference System. In this study, Urbanization was quantified using the Percentage of Artificial Surfaces (hereinafter PAS) as a proxy indicator, representing the proportion of land occupied by Artificial structures within each grid cell. Figure 3 illustrates the distribution of PAS across the study area, which was also divided into four principal artificial land cover subcategories: 1) urban fabric, 2) industrial, commercial, and transport units, 3) mine, dump, and construction sites, and 4) artificial, non-agricultural vegetated surfaces.
The study area has shown an increasing trend in PAS with respect to time over the years considered. It highlights not only the overall rise in PAS but also the shifting composition of its subcategories. While industrial, commercial, and transport units showed a steep increase between 1954 and 2012, urban fabric became the dominant category from 2015 onwards, reflecting a transition toward more densely built residential and mixed-use areas. The relative contribution of mine, dump, and construction sites, as well as artificial vegetated areas, remained relatively low but stable.
Figure 3. Percent of land uses contributing to the overall PAS across the time series.
3. Methodology
3.1. Estimation of Missing PAS Values Using Linear Interpolation
To ensure a continuous dataset for the PAS, for each grid cell, values for missing years were estimated based on the assumption of a linear rate of change between two known data points. It was chosen due to its simplicity, efficiency, and suitability for modeling gradual urban expansion trends. This method may not accurately reflect sudden urban expansion or redevelopment events, potentially introducing uncertainty into trend and regression analyses in areas with rapidly or non-linearly changing land cover. Despite this constraint, it was considered a viable method in the absence of annual PAS data [31]. The interpolation followed the standard linear equation:
(1)
where PAS(t) is the interpolated PAS value for year t, PAS(t1) and PAS(t2) are the known PAS values in the closest preceding and succeeding years, respectively, t1 and t2 represent the corresponding years with available PAS data. To ensure consistency with real-world conditions, since PAS cannot be negative, any extrapolated negative values were set to 0. Similarly, for grid cells where PAS values exceeded 100%, the value was fixed at 1.
3.2. Mann-Kendall
This non-parametric test, which considers outliers and accepts independent data, detects whether a variable follows a consistent increasing or decreasing pattern over time. Unlike linear regression, this method does not assume normality, making it particularly suitable for environmental datasets, where fluctuations and non-linearity are common [32] [33]. The test statistic S is computed as:
(2)
where:
(3)
where Xi and Xj are observations at time steps i and j, respectively, and n is the number of years in the time series. Since in this study n is more than 10, the test statistic S is approximately normally distributed with:
(4)
where tp refers to the number of ties in each tied group. If the S value is positive, it suggests an upward trend, meaning the observations are generally increasing over time. If it’s negative, it indicates a downward trend, showing that the observations are decreasing over time [34].
The p-value is calculated using the standard normal distribution. The formula for the p-value is:
(5)
where Φ is the cumulative distribution function of the standard normal distribution. The factor of 2 accounts for the two-tailed nature of the test, as the trend could be either increasing or decreasing. If Z is negative, the same formula is used because the test is symmetrical. The p-value is then compared with a selected significance level of 0.05 to determine whether the trend experienced at 95% confidence is significant or not. If the p-value is less than the selected significance level, the null hypothesis is rejected and indicates that the data has a significant trend.
3.3. Sen’s Slope
To investigate temporal trends in the data, Sen’s slope method was applied for all cells to estimate the magnitude and direction of the trend of each variable during the study period. Sen’s slope is a robust non-parametric estimator of a rate of change that is not subject to normality assumptions and can detect monotonic trends, and is therefore ideally designed for the analysis of environmental and climate data [34]. For each pair of data points (Xi, Yi)and (Xj, Yj), where i<j, the slope Sij is computed using the formula:
(6)
where Xi and Xj are the years of the two observations, and Yi and Yj are the corresponding values. This slope quantifies the rate of change between each pair of observations. A positive Sen’s slope indicates an increasing trend, while a negative slope suggests a decreasing trend. A slope near zero implies no significant trend, indicating stability over the study period.
The calculation of Mann-Kendall and Sen’s slope was performed using SPSS software version 29.0.1 [35], for each cell separately for the study period, and then reported as a map by using GIS software.
3.4. Spearman Correlation
Spearman’s correlation is a non-parametric method that assesses the monotonic relationship between two variables, meaning it captures trends even if they are non-linear. Instead of using raw values, Spearman ranks the data before applying Pearson’s formula [36]:
(7)
where di is the difference between the ranks of corresponding values in two datasets, and n is the number of observations. Like Pearson, Spearman’s correlation also ranges from −1 to +1, but it is more robust to non-linear relationships and outliers because it ranks the data before calculating the correlation [37].
The analysis was conducted using SPSS software version 29.0.1 [35], with a two-tailed test to detect both positive and negative correlations of variables, considering all the cells in the study area.
3.5. Stepwise Linear Regression (SLR)
SLR is a statistical method that helps identify the most important predictors in a linear relationship. It starts with the most significant variable and gradually adds or removes others based on their impact. This approach is useful because it simplifies the model while keeping only the most relevant factors.
A general multiple linear regression equation is:
(8)
where A0 is the intercept, A1, A2, … are standardized regression coefficients, and ϵ is the error term. Through a forward stepwise selection, all the available explanatory variables were tested for regression by keeping only the most significant predictors. In each step, a variable was considered for addition from the set of explanatory variables based on F significance with criterion of p-value < 0.05.
Unstandardized coefficient (B) is the actual rate of change in GR for a unit rise in each predictor, and standardized coefficient (Beta) allows comparison of relative predictors’ importance. The R2 score was calculated to measure model performance for each year.
(9)
where
and
are the ith observed and predicted values, n is the total number of observations,
and
are the ith mean observed and predicted values.
SLR was calculated using SPSS software version 29.0.1 [35].
4. Results and Discussion
4.1. Spatiotemporal Analysis
The Mann-Kendall and Sen’s slope tests were applied to each cell over the study period. Figure 4 presents the range of Sen’s slope values, incorporating the Mann-Kendall p-value for each cell. Cells with p-values greater than 0.05, indicating a lack of statistically significant monotonic trend at the 95% confidence level, are displayed in gray.
PAS showed a slightly decreasing trend in the central urban area of Milan (Sen’s slope −0.08 to 0 PAS/year), possibly due to urban redevelopment, green infrastructure implementation, or conversion of built-up areas into more permeable or natural spaces, and in parts of the western region that are mainly forested. While Areas surrounding Milan city exhibited an increasing trend (Sen’s slope 0.57 to 1.3 PAS/year). The analysis of TP showed that, although there was a slight tendency for a decrease in precipitation over time, in most of the cells the overall trend was not statistically significant, due to high fluctuations of TP. For other cells, Sen’s slope (−4.27 to −1.74 mm/year) outlined a very gradual decrease in precipitation without significant change compared to the high value range of precipitation. GR (Sen’s slope −3.17 to −1.86 mm/year) showed a significant decline over time. This decrease was statistically significant in most of the cells, indicating that GR is diminishing in certain regions, especially in urbanized areas. Urbanized areas with high PAS, particularly in the central and northern regions, show significant decreases in AET (Sen’s slope −5.5 to −2.5 mm/year) and SW (Sen’s slope −0.7 to −0.25 mm/year). R exhibits mixed trends, with increases (Sen’s slope 3 to 6.9 mm/year) more common in urban and suburban zones. T showed a clear and consistent rise (Sen’s slope 0.027 to 0.035 C/year) across the entire study area.
4.2. Correlation Analysis
Spearman’s correlation coefficient was used to give a fuller picture of how variables are related, considering all cells in the study period. Figure 5 shows the heatmap of Spearman correlation coefficients for each set of variables with p-values less than 0.05. Warmer colors indicate stronger positive correlations, while cooler tones reflect stronger negative associations.
TP remains the dominant driver of GR, as indicated by its strong positive correlation with GR. This aligns with previous studies demonstrating that precipitation serves as the primary recharge source in unconfined aquifers [38].
PAS has an inverse impact on GR due to the negative correlation. This suggests that urban expansion reduces natural infiltration capacity, likely due to increased impervious surface coverage, such as roads and buildings, which limit GR potential [39]. The strong positive correlation between PAS and R indicates that increasing urbanization leads to higher surface runoff volumes. Fletcher et al. [40] have shown that urbanization alters watershed-scale hydrological processes by shifting infiltration-dominated recharge towards a more runoff-dominated regime, increasing the risk of urban flooding, and altering the natural timing and
![]()
Figure 4. Mann Kendall p-values and Sen’s slope for each cell during the study period for (a) Total precipitation (TP (mm/year)), (b) Groundwater recharge (GR (mm/year)), (c) Actual evapotranspiration (AET (mm/year)), (d) Soil water content at a depth of 1 meter (SW (mm/year)), (e) Surface runoff (R (mm/year)), (f) Percentage of Artificial Surface (PAS/year), and (g) Mean temperature (T(C/year)).
Figure 5. Spearman’s correlation matrix among the studied variables. See Figure 4 for variable descriptions and units.
magnitude of runoff generation. The strong negative correlation between PAS and AET indicates that urbanization substantially reduces evapotranspiration. Mazrooei et al. [41] have demonstrated that urban development results in lower watershed evapotranspiration rates, leading to increased surface runoff and reduced water retention within the system.
4.3. Stepwise Linear Regression
The stepwise regression method was applied to determine the most significant predictors of GR. The process involved an iterative procedure that began with the most significant predictor and progressively added variables based on their statistical significance. Table 1 shows the model summary for stepwise linear correlation considering GR as the dependent.
In the first step, the model with only TP explained 54.9% of the variation in GR, highlighting that precipitation plays a key role in groundwater recharge. When PAS was added in Model 2, R2 became 70.5%, showing that PAS significantly improved the model’s ability to explain GR. However, after that, the model’s R2 didn’t increase much, suggesting that key contributors to the model were TP and PAS, with the other variables (T, SW, and AET) having a much smaller impact or being compensated. The ANOVA confirms the statistical significance of all the models, with p-values less than 0.05 for each, suggesting that the relationships between the predictors and GR are meaningful. Table 2 shows the coefficients for the stepwise linear regression considering GR as the dependent variable.
Table 1. Model summary for the stepwise linear regression considering Groundwater recharge (GR) as dependent.
Model |
Correlation Coefficient (R) |
Coefficient of Determination (R2) |
Adjusted R2 |
Standard Error of the Estimate |
1 |
0.741 |
0.549 |
0.549 |
90.36 |
2 |
0.840 |
0.705 |
0.705 |
73.07 |
3 |
0.841 |
0.708 |
0.708 |
72.69 |
4 |
0.842 |
0.709 |
0.709 |
72.56 |
5 |
0.842 |
0.709 |
0.709 |
72.51 |
1. Predictors: (Constant), TP (mm/year) 2. Predictors: (Constant), TP (mm/year), PAS (PAS/year) 3. Predictors: (Constant), TP (mm/year), PAS (PAS/year), T (C/year) 4. Predictors: (Constant), TP (mm/year), PAS (PAS/year), T (C/year), SW (mm/year) 5. Predictors: (Constant), TP (mm/year), PAS (PAS/year), T (C/year), SW (mm/year), AET (mm/year) |
Table 2. Unstandardized and standardized coefficients for the stepwise linear regression considering Groundwater recharge (GR) as dependent.
Model |
Unstandardized Coefficients |
Standardized Coefficients Beta |
t-statistics |
p-value |
B |
Standard Error |
1 |
(Constant) |
−162.364 |
0.989 |
|
−164.222 |
<0.05 |
TP (mm/year) |
0.377 |
0.001 |
0.741 |
396.876 |
<0.05 |
2 |
(Constant) |
−125.204 |
0.812 |
|
−154.182 |
<0.05 |
TP (mm/year) |
0.388 |
0.001 |
0.763 |
504.723 |
<0.05 |
PAS (PAS/year) |
−173.051 |
0.661 |
−0.396 |
−261.754 |
<0.05 |
3 |
(Constant) |
7.074 |
3.665 |
|
1.930 |
<0.05 |
TP (mm/year) |
0.380 |
0.001 |
0.747 |
477.944 |
<0.05 |
PAS (PAS/year) |
−166.792 |
0.679 |
−0.381 |
−245.623 |
<0.05 |
T (C/year) |
−9.500 |
0.257 |
−0.059 |
−37.006 |
<0.05 |
4 |
(Constant) |
47.713 |
4.120 |
|
11.580 |
<0.05 |
TP (mm/year) |
0.388 |
0.001 |
0.762 |
448.516 |
<0.05 |
PAS (PAS/year) |
−169.882 |
0.693 |
−0.389 |
−245.132 |
<0.05 |
T (C/year) |
−11.636 |
0.275 |
−0.073 |
−42.319 |
<0.05 |
SW (mm/year) |
−0.377 |
0.018 |
−0.040 |
−21.432 |
<0.05 |
5 |
(Constant) |
53.732 |
4.140 |
|
12.977 |
<0.05 |
TP (mm/year) |
0.393 |
0.001 |
0.772 |
414.039 |
<0.05 |
|
PAS (PAS/year) |
−181.740 |
1.105 |
−0.416 |
−164.479 |
<0.05 |
T (C/year) |
−11.047 |
0.278 |
−0.069 |
−39.724 |
<0.05 |
SW (mm/year) |
−0.339 |
0.018 |
−0.036 |
−19.067 |
<0.05 |
AET (mm/year) |
−0.037 |
0.003 |
−0.036 |
−13.772 |
<0.05 |
Considering the second model as the preferred one, the coefficient of TP was 0.388, meaning that for a one millimeter per year increase in TP, the GR increased 0.388 millimeters per year holding PAS constant. For PAS coefficients showed −173.051, meaning that for one percent per year increase in PAS, the GR is expected to decrease by −173.051 millimeters per year, holding TP constant. Moreover, the importance of TP by having a Standardized beta value of 0.763 demonstrated a positive high effect, and PAS −0.396 indicated a moderate negative effect on GR.
In both cases, the t-statistics are notably high, indicating that the coefficients are highly statistically significant and unlikely to have occurred by chance. A p-value less than 0.05 confirms significance at the 95% level.
4.4. Application of Linear Regression to Evaluate GR
Since TP and PAS had the highest effect on GR according to stepwise regression, the linear regression was used for each year, considering only those 2 variables. Figure 6 shows the standardized beta coefficient for each year for TP on the blue line and PAS in the orange line to model GR over the study period.
Figure 6. Standardized beta coefficient for TP and PAS to model GR over the study period.
From the graph, TP, despite fluctuations, generally remained positive, indicating that rainfall continued to play an important role in positively affecting GR. PAS had a negative effect on GR, and its impact increased over time, reaching more negative values, particularly in recent years. These two trends showed how challenging it is to balance urban growth with the need to take care of water resources. Figure 7 shows the coefficient of determination (R2) for linear regression over the study period, indicating the explanatory power of TP and PAS.
Figure 7. The coefficient of determination for linear regression over the study period.
The findings are consistent with a large body of research that has examined how urbanization affects hydrological systems and how it affects the significance of features in predictive models. Several studies have looked at how urbanization affects GR processes, hydrological cycles, and water balance. Wang et al. [42] discovered that the increase in impervious surfaces brought about by urbanization causes a considerable increase in surface runoff, which in turn reduces groundwater recharge. The water cycle is altered when the number of impermeable surfaces increases, since it lowers the soil’s ability for infiltration. This is in line with our findings, which show that the PAS grows more significant as cities grow, indicating that urban land use has a greater impact on the model’s capacity to forecast groundwater dynamics.
McGrane et al. [19] more clearly demonstrated how land use, urbanization, and climate interact, pointing out that microclimates and land cover variations in metropolitan areas impact water cycling. Their outcome is supported by the influence of Artificial Surface on our model, showing how urbanization adds complexity that changes water balance and impacts hydrologic processes.
5. Conclusions
This study clarifies how hydroclimatic and land use variables have changed over the past seven decades, and how they are correlated in Milan Province. Moreover, it demonstrated how GR has changed due to both land use and climate change by using stepwise linear regression and finding the most effective variables. Urbanization, considered as PAS, has had a significant influence on GR, even though precipitation remains the primary driver. As artificial surfaces expand, these impervious surfaces prevent water infiltration, reducing GR and increasing surface runoff, especially in the metropolitan city of Milan.
The relationship between urbanization and GR is not the same everywhere, and it changes over time. However, the overall trend shows that as urban areas expand, GR becomes more challenging, which can create long-term difficulties for water resources. While this study did not focus on policy implications, it highlights the importance of limiting soil sealing, enhancing water-saving measures, and addressing water stress related to climate change.
Our findings emphasize the need for integrated urban planning and water resource strategies that consider both land use change and climate change. As cities further develop, the incorporation of green infrastructure, sustainable drainage, and policies safeguarding permeable surfaces will be key to long-term hydrological balance and water security.
It should also be noted that this study did not account for possible confounding factors, such as changes in irrigation practices, groundwater abstraction rates, or urban water management infrastructure (e.g., SUDS or green roofs). These factors could significantly affect GR patterns and should be integrated into future models as data becomes available. Future research could dive deeper into how factors like population and building density influence GR, since here the Area of artificial surface as a percentage was considered only. Furthermore, each grid cell was analyzed separately, and the effect on surrounding grid cells was not included. By understanding these factors more clearly, we can better predict and manage water resources, ensuring that cities can thrive sustainably in the years to come.