Evaluation and Prediction of Groundwater Quality in the Municipality of Za-Kpota (South Benin) Using Machine Learning and Remote Sensing ()
1. Introduction
Groundwater is considered the most potable. Population pressure and increased use of agricultural fertilizers, rapid urbanization and industrialization have deteriorated the quality of these waters [1]. Za-Kpota is one of the municipalities of Benin where the predominantly rural population uses groundwater through wells and boreholes for food or other purposes. The different activities carried out such as agriculture, fishing, and livestock breeding, could have a considerable impact on water quality because various activities carried out by urban and rural populations do not respect the required environmental standards [2]. These groundwaters, given the various activities taking place in the municipality, are not protected from pollution due to the chemical contamination of the water tables. Thus, systematic prevention is far preferable to that which consists of treating water for drinking water supply when its quality has deteriorated. Large gaps in monitoring data limit our knowledge of the physicochemical quality of Za-Kpota waters. Remote sensing and Artificial Intelligence are two essential tools for better understanding the impact of human activities on the physico-chemical quality of groundwater resources. Thus, Mahadi et al, using remote sensing and the GIS approach, identified the land use changes (LULC) that can be observed at Bhaluka in Mymensingh, Bangladesh [3]. Also, artificial intelligence algorithms have been found to increase predictive performance in a wide range of environmental processes. Various studies have been conducted to predict nitrate distribution patterns in groundwater using the RF algorithm [4]. Our study therefore aims to assess and predict the physico-chemical quality of groundwater in the Za-Kpota municipality using a combination of remote sensing and machine learning.
2. Material and Methods
2.1. Study Area
Bordered to the north by the municipality of Djidja, to the west by the municipality of Bohicon, to the east by the municipality of Covè and to the south by the municipality of Zogbodomey, Za-Kpota is one of the nine (9) municipalities of the Zou department and is located between latitudes 7˚08' and 7˚20' north and longitudes 2˚05' and 2˚20'. With a population of 132,818, it covers an area of 600 km2 divided into eight (08) arrondissements [5] which are: Allahe, Assalin, Houngomey, Kpakpame. Kpozoun, Za-Kpota, Za-Tanta and Zeko. Its sub-equatorial climate is characterized by two rainy seasons, a long one from mid-March to mid-July and a short one from September to November, and two dry seasons, the long one extending from December to March and the short one covering the second half of July and the month of August. The geology is mainly characterized by formations from the Turonian-Coniacian boundary (91 - 89 Ma) [6], which consist of quartz sands with subordinate kaolin gravels and clays and/or ferruginous sandstones. Formations from the Maestrichtian period (around 70.6 to 65.5 Ma) cover a third of the municipality and are made up of sand, clay, marl and limestone with carbonaceous levels at the base. In addition, around 18% of the municipality is covered in gneiss. Finally, recent alluvial deposits consisting of sand-clay with subordinate gravel and carbonaceous levels are found along the river Zou. The municipality of Za-Kpota can be found in four different regions. Figure 1 shows the administrative map of the municipality of Za-Kpota.
![]()
Figure 1. Municipality of Za-Kpota.
2.2. Methods
2.2.1. Satellite Image Processing
Thematic processing is processing that allows new information to be produced from downloaded images. The software used here for these treatments is the paid software Qgis 3.16 and more precisely the Semi-Automatic Classification Plugins (SCP) extension. This treatment is done as follows:
This step consisted of importing the images, configuring the software and coloring the bands; This step is essential for treatment. The colored composition is a combination of spectral bands based on the principle of assigning image bands to three display planes based on three primary colors: Red, green and blue. Of the seven bands in our Landsat images, only bands 4, 3 and 2 have been respectively assigned to the Red, Green and Blue channels, giving a colored composition in standard false colors. This combination was chosen because it presents the best discrimination of land use types.
Image processing consists of identifying the different land use units and classifying the image. Supervised classification was chosen for our study. During this classification, the analyst identifies fairly homogeneous samples of the image which are representative of different types of surfaces (information classes). These samples form a set of test data. The selection of this test data is based on the analyst’s knowledge, familiarity with the geographic regions and the types of surfaces present in the image. The classification supervised therefore begins with the identification of the information classes which are then used to define the spectral classes which represent them.
This step consists of the evaluation, validation of the classification and layout of the map produced. To evaluate the reliability of our classification, we chose the practical method which is field validation. To do this, during our investigations, we took the coordinates of some land use units which we projected onto the various classified images in order to see if the latter reflected the realities on the ground. After this phase was successful, we converted the classified images from raster mode to vector mode. Cartographic production consisted of developing land use maps by adding new layers of information, geographic north, scale and legend. Figure 2 shows the satellite image processing stages.
2.2.2. Water Sampling in Wells and Boreholes
A total of 30 wells and boreholes were sampled. The various sampling sites were selected on the basis of previous physico-chemical data from 2008. The water samples were taken in 1.5-litre polyethylene bottles, and washed beforehand with distilled water. These bottles were rinsed three times with the water to be sampled before taking the actual samples. At the wells, the water was taken using a well and then bottled to avoid trapping air bubbles. At the boreholes, on the other hand, water samples were taken directly using a hand pump. Samples were stored in a cool box at 4˚C in the laboratory. In the field, in-situ physico-chemical parameters such as temperature, pH, electrical conductivity and dissolved oxygen were measured using the multi-parameter AQUAREAD AP-700. Figure 3 shows the sampling map.
Figure 2. Satellite image processing stages.
Figure 3. Location of water sampling points.
2.2.3. Physico-Chemical Analyzes
The physicochemical analyzes were carried out at the Applied Hydrology Laboratory (LHA) and under better conditions. Physico-chemical parameters were determined using two devices, namely: the Colorimeter (DR890) for measuring Suspended solids (SS) and Total Dissolved Solids (TDS) and finally the HACH LANGE DR 5000 Spectrophotometer for concentration measure (chloride, sulfate, calcium, magnesium, carbonates, bicarbonates, fluoride, sodium, nitrates, nitrites, manganese, ammonium, ortho-phosphates, potassium. Total iron and free CO2) in the laboratory. The analysis of ions is carried out following well-defined methods such as the sodium salicylate method for the determination of nitrates and the Zambelli Reagent method for the determination of nitrites [7].
2.2.4. Water Quality Index Calculation (WQI)
Water Quality Index (WQI) is a water classification technique that is based on the comparison of water quality parameters with respective international standards. In other words, the WQI summarizes large amounts of water quality data into simple terms (e.g.: excellent, good, poor, etc.), generating a score that describes the quality status of the water. water for domestic use. This method was initially proposed by Horton [8] and Brown and al. [9]. In this approach, a numerical value called weight, between 2 and 5, is assigned to each parameter, reflecting its degree of influence on water quality. Table 1 shows the weights of the various physico-chemical parameters.
Table 1. Weight of physico-chemical parameters.
Parameters |
Units |
WHO Standards (1998) |
Weight (wi) |
Relative Weights (Wi) |
pH |
- |
6.5 - 8.5 |
3 |
0.115 |
TDS |
mg/L |
500 |
3 |
0.115 |
|
mg/L |
500 |
3 |
0.115 |
Ca2+ |
mg/L |
100 |
3 |
0.115 |
Mg2+ |
mg/L |
50 |
3 |
0.115 |
|
mg/L |
45 |
5 |
0.192 |
Cl− |
mg/L |
250 |
3 |
0.115 |
|
mg/L |
200 |
3 |
0.115 |
|
Total |
26 |
1 |
The relative weight (Wi) is calculated by the following equation:
(1)
Wi is the relative weight, wi is the weight of each parameter and n is the number of parameters
The quality rating scale (qi) of each parameter is calculated by dividing the concentration of each parameter by the respective WHO standards.
(2)
qi: quality rating scale
Ci: the concentration of each parameter in mg/l
Si: the WHO standard for each parameter in mg/l.
To calculate the Water Quality Index, the Sub-Index (SIi) is the first index to determine. From the sum of the Sub-Indexes of each parameter, we determine the WQI of each sample:
(3)
(4)
Four quality classes can be identified according to the values of the quality index of water (Table 2).
Table 2. Water quality classification according to [10].
Water quality |
IQE |
Excellent |
<50 |
Good |
50 - 100 |
Poor |
100 - 200 |
Very poor |
200 - 300 |
Unfit for human consumption |
>300 |
2.2.5. Data Processing and Groundwater Quality Prediction Models
Python software was used for data processing and the development of groundwater prediction models in the municipality of Za-Kpota. The references used are the Beninese standards and those of the World Health Organization for drinking water. At this, different steps were followed.
Preprocessing consists, on the one hand, of choosing the different models to use. This choice is made taking into account the size and characteristics of our data and the objective which is the modeling of the groundwater quality of Za-Kpota. We have temporal data and the appropriate models for our prediction are neural networks. Neural network models are a category of machine learning models inspired by how the human brain works. The specific models used are Artificial Neural Network (ANN), Random Forest (RF) and Linear Regression (LR). An Artificial Neural Network (ANN) model is a type of deep learning model composed of layers of interconnected neurons and is used for tasks such as classification, regression, and pattern recognition. Random Forest (RF) is a machine learning algorithm that can be used for classification and regression tasks. It is an ensemble learning technique that builds a number of decision trees during training and combines them to obtain a more robust and generalizable prediction. Linear regression is a statistical model that seeks to establish a linear relationship between a dependent variable (the variable that we wish to predict) and one or more independent variables (the characteristics or covariates). This model is used to model and estimate the relationship between variables, thereby making predictions or analyzing the nature of the relationship.
Preprocessing continues with importing packages and data, descriptive statistics, and checking for missing values. This will be followed by the identification of the model input data and the normalization of the data using the min-max method. The model input data are: pH, TDS,
,
, Ca2+, Mg2+, Cl−,
and calculated IQE. The output is the predicted WQI.
[11]
X normalizes: The normalized value of the variable
X: The original value of the variable
min (X): The minimum value of the variable
max (X): The maximum value of the variable
This step ends with the dataset splitting into two parts: the training data(75%) and the testing (25%) sets and model setup.
The data processing phase consists of defining the models parameters, training or calibrating the model and finally validating the model. The models were calibrated on the training data and validated on the test data. Figure 4 shows the neural network diagram.
Figure 4. Neural network diagram.
This step consists of evaluating the different models. The criteria for assessing the performance of the models are the RMSE (root mean square error), MSE (mean square error), MAE (mean absolute error) and the Nash (Nash-Sutcliffe Efficiency) [12] and the RMSE (RMSE observations Standard Deviation Ratio) [13].
and
3. Results and Discussion
3.1. Land Use Classification and Analysis
Figure 5 shows the land use maps for 2008, 2013, 2018 and 2022. From this figure, we can conclude that the different land-use units in Za-Kpota include: crop fields and fallow land, which includes areas of perennial crops and annual crops; built-up areas and bare ground, which are areas where there is no plant cover; and residential areas and vegetation made up of dense forest, gallery forest and open forest.
Figure 5. Land use map of Za-Kpota for the years 2008, 2013, 2018 and 2022.
3.2. Evaluation of the Surface Areas of the Land Occupation Units
Figure 6 shows us the evolution spatiotemporal of the different land use units (2008-2022).
Table 3 presents the results of the statistical analysis of the different land use units (2008-2022).
Analysis of the various Figure 6 and Table 3 reveals that in recent years there has been a respective regression of 10% and 25% in vegetation and marshland formation in favor of urban areas, bare soil, crops and fallows.
Table 3. Statistical analysis of land use unit areas (2008-2022).
Land use units |
2008 |
2022 |
±% |
Urban areas |
7.6 |
31.31 |
+23.71 |
Crops and fallows |
45.6 |
58.59 |
+12.99 |
Vegetation |
14.7 |
3.88 |
−10.82 |
Swamp formations |
32.1 |
6.21 |
−25.89 |
Figure 6. Land use variation between 2008 and 2022
3.3. Descriptive Statistics on Physico-Chemical Data
Table 4 presents the results of the statistical analysis. Physico-chemical parameters of groundwater in the municipality of Za-Kpota.
The analysis results show that the groundwater of the municipality of Za-Kpota has a temperature which varies from 28.90˚C to 34.18˚C, an average of 30.59˚C. The waters of Za-Kpota are generally acidic with an acidity marked by pH values between 5.59 and 7.83 with an average of 6.57 (Table 4). These results are consistent with those of Azokpota et al. [14]. All of the waters sampled appear moderately mineralized with conductivities generally lower than 1500 μS/cm with a minimum of 59 µS/cm and a maximum of 1344 µS/cm and an average of 228.80 μS/cm. The average values of ammonium nitrites and nitrates are respectively 0.51 mg/l. 0.70 mg/l and 19.13 mg/l. Note also that we have high nitrate concentrations of up to 108.12 mg/l. The presence of nitrates in drinking water is mainly attributable to human activities [15]. The intrinsic causes of this state of affairs are synthetic fertilizers, pesticides used in agriculture and toxic discharges from industrial activities [16]. Nitrogen pollution comes from domestic wastewater, industrial effluents (agro-food, paper mills, etc.) and mainly from the leaching of fertilizers and livestock effluents in agricultural areas.
3.4. Temporal Evolution of the Physicochemical Quality of Groundwater in the Municipality of Za-Kpota
Figures 7-9 et Figure 10 present the temporal evolution of the physicochemical quality of groundwater in the municipality of Za-Kpota: Case of Fluorides, Nitrates, Phosphors and WQI.
Table 4. Basic statistics of physico-chemical parameters of groundwater in Za-Kpota municipality.
Parameters |
Units |
Minimum |
Average |
Maximum |
Standard Deviation |
T |
˚C |
28.90 |
30.59 |
34.18 |
1.44 |
Ph |
- |
5.59 |
6.57 |
7.83 |
0.46 |
EC |
µS/cm |
59.00 |
228.80 |
1344.00 |
248.48 |
TDS |
mg/l |
32.00 |
141.57 |
672.00 |
132.56 |
Salinity |
PSU |
0.02 |
0.08 |
0.58 |
0.11 |
DO |
mg/l |
6.74 |
7.28 |
7.50 |
0.22 |
SS |
mg/l |
0.01 |
32.11 |
81.00 |
25.93 |
|
mg/l |
0.01 |
0.51 |
3.31 |
0.68 |
|
mg/l |
0.03 |
0.70 |
5.34 |
1.02 |
|
mg/l |
0.01 |
19.13 |
108.12 |
24.51 |
Ca2+ |
mg/l |
1.43 |
22.16 |
114.97 |
22.55 |
Mg2+ |
mg/l |
3.92 |
16.79 |
52.28 |
14.62 |
Na+ |
mg/l |
0.37 |
4.05 |
13.50 |
3.68 |
K+ |
mg/l |
0.14 |
1.88 |
7.80 |
2.00 |
Cl− |
mg/l |
0.40 |
9.89 |
42.60 |
8.56 |
|
mg/l |
0.01 |
3.31 |
45.64 |
10.96 |
|
mg/l |
6.15 |
38.74 |
252.15 |
59.32 |
Fe2+ |
mg/l |
0.00 |
0.13 |
1.14 |
0.21 |
F− |
mg/l |
0.03 |
0.36 |
2.05 |
0.38 |
|
mg/l |
0.01 |
0.52 |
2.08 |
0.40 |
|
mg/l |
0.00 |
0.00 |
0.00 |
0.00 |
Free CO2 |
mg/l |
0.00 |
0.00 |
0.00 |
0.00 |
WQI |
- |
14.55 |
27.51 |
86.48 |
15.54 |
Figure 7. Temporal evolution of fluoride concentrations in Za-Kpota groundwater.
Figure 8. Temporal evolution of nitrates concentrations in Za-Kpota groundwater.
Figure 9. Temporal evolution of phosphate concentrations in Za-Kpota groundwater.
Figure 10. Temporal evolution of WQI in Za-Kpota groundwater.
The various maps of the temporal evolution of the quality of groundwater in the municipality reveal that there has generally been an increase in the concentrations of fluorides, nitrates, phosphorus and the water quality index in recent years. water. This increase in concentrations is believed to be due to the use of fertilizers by farmers to increase agricultural production. Therefore, the increase in the groundwater quality index of the said municipality would be due to the increase in the concentrations of certain minerals in recent years. The more the water quality index increases, the more the quality of this water deteriorates. Note that at this stage, the waters in this area are still good because the WQI < 100 [9]. However, in some places, we see a decrease in the concentrations of fluorides and others over time. The increase in concentrations of nitrates and others would be linked to the use of fertilizers (pesticides and herbicides) by farmers to increase production in this municipality [17]. It is therefore important to establish a relationship between the disappearance of land cover units and the increase in concentrations observed in the groundwater of Za-Kpota. The municipality of Djougou experienced an extension of 2.78% in the area occupied by habitats to the detriment of forests between 2005 and 2015 and the densest districts of the municipality experienced a deterioration of their groundwater through an increase in values of their electrical conductivity and their chloride and nitrate concentration [18]. The work carried out by Arsène and al. in the Ouémé basin in 2014, the results of the dynamics of land use in the basin in 1978, 1998 and 2010 showed a regression of the tree and shrub savannah; light forest and wooded savannah; gallery forest and dense forest on the one hand and a progression of cultivated and fallow areas, plantations and urban areas and bare soils. The significant regression of the four formations is due to the extension of cropping and fallow areas in the basin. Furthermore, the increased development of crop areas and urban areas leads to a progressive reduction in the extent of wooded areas and the destabilization of the soil structure. This degradation of the physical environment has an impact on water runoff and infiltration [19]. Indeed, the increased development of crop areas and urban areas leads to a progressive reduction in the extent of wooded areas and the destabilization of the soil structure. This degradation of the physical environment has impacts on runoff and water infiltration. Also urbanization, the growth of industry and intensive agriculture have diachronically increased the pollution of water bodies. The presence of nitrates would be due to the abusive use of sanitary products rich in nitrogen and inputs in cotton production and food products by the populations [20]. Nitrogen pollution comes from domestic wastewater, industrial effluents (agro-food, paper mills, etc.) and mainly from the leaching of fertilizers and livestock effluents in agricultural areas. Nitrates can also be released from urban wastewater treatment plants [21]. Note also that there is a reduction in the concentrations of nitrates, fluorides and phosphorus in certain areas and this reduction would generally be due to changes observed in land use. Indeed, we see that in these places in 2008, we had the presence of crops and fallows but which today are replaced by towns or roads. Therefore, these lands no longer receive enormous quantities of fertilizer as in 2008. Faced with the progressive deterioration of the quality of groundwater in recent years, it is therefore important to establish a prediction model of the quality of groundwater in the municipality of Za-Kpota which will be an awareness tool against the various pollution of groundwater.
3.5. Water Quality Prediction Models
Figure 11 presents the results of groundwater quality modeling with 3 models: ANN, RF and LR.
Figure 11. Observed and predicted WQI values (ANN, RF and LR).
From these figures, we see that there was an increase in the water quality index after prediction. The ANN and LR models are the best models for predicting the water quality of the said municipality because we see that the points are aligned around the trend line, this suggests a good agreement between the observed and predicted values. This is also confirmed by the RMSE values, R2, and Nash that we obtained for the ANN and LR models. Table 5 shows the performance of the models after calibration and validation of the models.
Table 5. Model performance evaluation.
Models |
Calibration (training) |
Validation (testing) |
RMSE |
R2 |
Nash |
RMSE |
R2 |
Nash |
ANN |
7.51E−05 |
1 |
1 |
0 |
0.98 |
0.98 |
RF |
7.51E−05 |
1 |
1 |
0.01 |
0.73 |
−19.28 |
LR |
0.42 |
1 |
1 |
0 |
1 |
1 |
The ANN and LR water quality prediction models show that over time, there will be an increase in the water quality index and therefore a progressive deterioration in the quality of groundwater in Za-Kpota. The LR model having performed too well (R2 = 1, RMSE = 1 and Nash = 1) suggests that this model is too efficient in predicting water quality data. The model that can be used as awareness tools is the ANN model. These results are consistent with the results of Cao et al. [22] who, based on existing land use data in Canada, performed short-term forecasts of land use change using recurrent neural network RNN models. This is also the case for Dawood et al. [9] in Iraq, which based on the water quality index and multivariate statistical techniques, assessed and predicted groundwater quality with the help of geographic information system. They applied multi-layer perception models (MLPs) to model the water quality index. They found that the MLP network accurately predicted the output (water quality index). Many researchers have simulated and predicted changes in LULC and LST of cities around machine learning algorithms such as artificial neural network (ANN): Wuhan in China [23], Ikom in Nigeria [24], and Faisalabad in Pakistan [25]. Yatoo and al. [26] used cellular automata (CA) simulation and ANN to monitor land use change and its future prospects in Ahmedabad, India.
4. Conclusion
The study consists of evaluating the physicochemical quality of groundwater in the municipality of Za-Kpota using remote sensing and Machine Learning. The results of the analyzes of the sampled waters indicate that these waters are generally acidic (pH varying between 5.59 and 7.83) with variable mineralization (59 µS/cm to 1344 µS/cm) depending on the geology of the environment. These waters have experienced a deterioration in their physico-chemical quality. Also, the diachronic analysis of the land occupation of Za-Kpota reveals that in recent years there has been a regression of vegetation and marshy formation in favor of urban areas, bare soils, crops. and fallows. The diachronic analysis of the quality of groundwater in the municipality and the increased use of herbicides and pesticides to increase production and the dynamics of land occupation in Za-Kpota in recent years, confirm that the deterioration of the quality of the municipality’s groundwater is due to anthropogenic activities. It is therefore necessary to find an effective way to raise awareness among the population about these practices that pollute groundwater. Based on the results of the groundwater quality prediction models (ANN, RF and RL) developed it was concluded that the model based on Artificial Neural Networks provides a better prediction (ANN: R2 = 0.97 and RMSE = 0) of changes in groundwater quality in the municipality of Za-Kpota.
Acknowledgments
This paper benefits financial support from Data Science Africa (DSA) in the framework of young researcher project support in Africa. The authors are grateful to Centre d’Excellence d’Afrique pour l’Eau et l’Assainissement (C2EA) of University of Abomey-calavi (Benin) via Laboratoire d’Hydrologie Appliquée (LHA) for hosting the project.