Groundwater Level Prediction Using Artificial Neural Networks: A Case Study in Tra Noc Industrial Zone, Can Tho City, Vietnam ()
1. Introduction
Groundwater resources (GWR) play an important role in the provision of domestic and production for millions of people in the Mekong Delta [1] [2] . In the context of contaminated surface water and fluctuating water levels downstream caused by the construction of hydroelectric projects and expansion of cultivated area in the upper Mekong, the role of GWR is becoming more and more important since the 1990s [3] . In addition, the impact of urbanization, population growth, land use changes and climate change will degrade the GWR in terms of the quantity, quality and dynamics of GWR [4] .
There have been many researches on GWR dynamics using hydrogeological or statistical models. For instance, Radu Goru et al. (2001) [5] utilized a geological geographic information system (GIS) database that offers facilities for groundwater-vulnerability analysis and hydrogeological modelling had been designed in Belgium for the Walloon region (GMS―Groundwater Modelling System). Rakesh et al. (2009) [6] applied GMS for the northern part of Mendha sub-basin in the semi-arid region of northeastern Rajasthan, employing conceptual groundwater modelling approach. For this purpose, Groundwater Modelling Software (GMS) was used which supports the Modflow-2000 code. Lately, Pandey and Kazama (2012) [7] carried out research on analyzing spatial variations in hydrogeological characteristics of shallow and deep groundwater aquifers in Kathmandu Valley, Nepal.
In the Mekong River basin, So Kazama et al. (2007) [8] determined the variation of GWR caused by flooding over inundated areas located in lower part of the Mekong River basin using numerical modeling and field observations. The research concluded that flood control which reduced the area of inundation, resulted in a reduction of GWR in the area. Thus, while flood control activities were vital to reduce negative flood impacts in the Mekong River basin, they also negatively impacted on GWR in the area. Babel et al. (2006) [9] studied on the various negative impacts on the environment and society caused by land subsidence which has been a problem in Bangkok, Thailand, since the 1970s. The intensive groundwater extraction for industrial and domestic purposes since the 1950s, which led to a decline of GWLs, was the primary cause of land subsidence.
Nguyen Tieng Vang and Tran Van Ty (2017) [10] conducted research in the Tra Noc Industrial Zone, Can Tho city to assess the current status of exploitation, GWLs changes and management of GWR. From which, the relationship between groundwater extraction, water level in Bassac River (CTH-039803 station) and GWLs at monitoring stations/wells was established. The results showed that the extraction of groundwater in the Tra Noc Industrial Zone was very large; over-exploitation of GWR might be a major cause of decrease in GWLs leading to the decrease in GWLs of Pleistocene and Holocene aquifers of 4 m and 1 m, respectively from 2000 to 2015. Rainfall and Bassacriver was found to be the major source of recharge to Holocene aquifer. In addition, management of GWR was not effective, lack of close coordination between enterprises and local GWR management agencies/departments.
Artificial Neural Network (ANN) is the most popular tool for groundwater prediction. Many studies have been conducted in the area of predicting GWLs. Suja and Sindhu (2016) [11] used factor analysis to identify the factors that have maximum influence on GWLs and time series analysis to predict the influencing factors prior to ANN. Hung et al. (2009) [12] introduced a new approach using an ANN technique to improve rainfall forecast performance for a real world case study was set up in Bangkok, 4 years of hourly data from 75 rain gauge stations in the area were used to develop the ANN model. The developed ANN model was applied for real time rainfall forecasting and flood management in Bangkok, Thailand.
The objective of this study is to predict GWLs under different impact factors using ANN for a case study in Tra Noc Industrial Zone, Can Tho city. This can be achieved by evaluating the current state of GWR exploitation, use and dynamics; setting-up, calibrating and validating the ANN for GWLs; and then predicting GWLs at different lead times.
2. Study Area and Data
Can Tho city is the youngest and largest urban area in the Mekong Delta, including 8 industrial zones with a total area of over 2366 ha. These industrial zones are located along the national highways and Bassacriver which is one of the two branches of Mekong river after entering Vietnam. Industrial activities have caused serious environmental problems such as pollution of water sources, microbial contamination, subsidence, etc. Tra Noc Industrial Zone was established and developed since the 1990s including Tra Noc 1 Industrial Zone (Tra Noc Ward, BinhThuy District) and Tra Noc 2 Industrial Zone (Phuoc Thoi Ward, O Mon District) with total planning area of 300 hectares (Figure 1).
(a) (b)
Figure 1. Location of Tra Noc Industrial Zone in Can Tho city. (a) Administrative map of Can Tho city; (b) Tra Noc Industrial Zone.
Currently, there are 16 groundwater resources (GWR) monitoring stations/wells in Can Tho city, of which two stations (QT08 and QT16) are located in the study area. At each station, there are 3 monitoring wells in 3 aquifers and at different depths (Middle-Upper Pleistocene (qp2-3), Upper Pleistocene (qp3) and Holocene floor (qh)). From 2000 to 2015, the GWLs of Pleistocene (qp3 and qp2-3) in the Tra Noc Industrial Zone had declined rapidly. However, in the Holocene, the trend of groundwater levels (GWLs) was relatively stable.
Data of rainfall at Can Tho station and river water levels at two stations, average withdrawal discharge of industrial use purposes and observed GWLs at Pleistocene aquifer (qp2-3 and qp3 layers) at different monitoring wells were collected. Data and their sources are presented in Table 1.
3. Methodology
An Artificial Neural Network (ANN) consists of input, hidden and output layers and each layer includes an array of processing data. ANN is characterized by its structure representing the pattern of connection between nodes, connection weights, and activation function. ANN models were developed using different sets of combinations of the input parameters and the best combination model was selected based on the performance statistics.
Data of groundwater levels (GWLs) was first used to initialize the ANN model with observed GWLs at a given time to reproduce water level variations using input variables (rainfall, river water levels and withdrawal discharge from pumping). The selected ANN structures via trial and error were first calibrated on a training dataset to perform 1-, 2-, 3-month ahead predictions of future GWLs using past observed GWLs and the input variables. Simulations were then produced on another data set by iteratively feeding back the predicted GWLs, along with real data.
3.1. ANN Model Setting-Up
To develop ANN, the neural network toolbox from the Visual Gene Developer (http://www.visualgenedeveloper.net/) [13] was used. This toolbox provides the capability to design many different kinds of neural systems for various applications.
3.2. Data Pre- and Post-Processing
Data pre-processing was carried out for analyzing and transforming the input
and output variables to minimize noise, and to highlight important relationships. The raw data were normalized between zero and one (unitless).
Pre-processing:
(1)
Post-processing:
(2)
where yt is the observed data; a, bare minimum and maximum values of observed data, respectively;
is the normalized value of observed data.
3.3. ANN Structures
The structure of ANN is determined by trial and error. The number of nodes in the hidden layers and the stopping criteria were optimized in terms of obtaining precise and accurate output. The activation function of the hidden/output layers was set to a sigmoid function as this proved by trial and error to be the best in depicting the non-linearity of the modeled natural system, among a set of other options. There is no well-established direct method for selecting the number of hidden nodes for an ANN model for a given problem. Thus the common trial-and-error approach remains the most widely used method [14] . Variables in the input vector to ANN models are presented in Table 2.
There are many kinds of neural networks depending on their structures, function and training methods. A typical feedward neural network with a back propagation learning algorithm to train it was used. A typical neural network is presented below:
(3)
where xi is the input vector, O is the output vector, wi is a weight factor between two nodes and f(N) is a activation function. Among the different kinds of activation functions, the sigmoid was used in this study. The back propagation learning algorithm is based on a generalized delta-rule accelerated by a momentum term [15] .
To improve the performance of the network, the weight factors were adjusted using following equations:
(4)
where h is the learning rate; α is the momentum coefficient; Δw is the previous weight factor change; O is the output; δ is the gradient-descent correction term;
Table 2. Variables in the input vector to ANN models.
Total input nodes: from 8 to 14; Total output node: 1; ANN structures were tested with various Hidden layers (from 1 to 5) and Hidden nodes (from 5 to 15) to select the best ANN structure; The optimum structures for qp2-3 and qp3 are 14-15-1 and 12-15-1 (with respectively to the input, hidden and output nodes), respectively.
and p stands for pattern. The learning rate (η) and the momentum coefficient (α) were randomly generated from 0.01 to 1 and from 0 to 1, respectively.
The back propagation algorithm is applied as follow:
1) Normalize the training data and initialize all weights (normally a small random value between minus one to one);
2) Compute the output of neurons in the hidden layer and in the output layer;
3) Compute the error and update the weights;
4) Update all weights and repeat steps 2 and 3 for all training data;
5) Repeat steps 2 to 4 until the error converges to an acceptable level.
The performance of the trained network was checked by determining the error between the predicted value and the observed one.
3.4. Calibration and Validation
Available data was divided into two distinct sets namely the training/calibration and validation sets. As the training set is used by neural network to learn the patterns present in the data, 70% of data was allocated to the calibration set (2004-2012), 30% to validation set (2013-2015). In this study, the networks were selected based on best performance on the training set, and a final check on the performance of the trained network was made using the validation set.
3.5. Criteria of Evaluation
Three different criteria were used in order to evaluate the suitable networks and their abilities to produce accurate predictions.
The Root Mean Square Error (RMSE):
(5)
Efficiency Index:
(6)
The R efficiency criterion:
(7)
where Xi is the observed data,
is the mean observed data, Yi is the calculated data and n is the number of observations. RMSE indicates the difference between the observed and calculated (ANN) values. The lowest the RMSE, the more accurate the prediction is. The best fit between observed and calculated values is indicated by EI and R2.
4. Results and Discussion
4.1. Current State of GWR
The total exploitation rate of groundwater resources (GWR) in Tra Noc Industrial Zone from 2004 to 2016 is shown in Figure 2. As can be seen, from 2004 to 2010, thanks to the policy of encouraging investment in Tra Noc Industrial Zone leading to the increasing exploitation of GWR. The total GWR exploitation in 2004, 2009 and 2011 were 3568 m3/day; 18,876 m3/day and 20,210 m3/day, respectively. It is clear that the total exploitation of GWR was increased up to almost six times for the period of 7 years. However, the enforcement of Official Letter No.2946/UBND-KT dated 23/6/2010 of the People’s Committee of Can Tho city [16] on regulating the use of GWR reduced the exploitation in 2012, and thus the groundwater levels (GWLs) were gradually stable.
In addition, the enterprises in Tra Noc Industrial Zone have used combination of different water sources for production and daily usage. Only 18.18% of enterprises used GWR; the others used tap water and GWR accounted for 63.64%; and the remained used combined sources (data is not shown here). However, the exploitation of GWR for production showed the increasing trend again after 2012.
Figure 2. Total GWR exploitation rate in Tra Noc Industrial Zone (2004-2016).
It can be seen in Figure 3 and Figure 4 that GWLs at Pleistocene aquifer reduced from 2000 to 2015. During this period, almost all of the enterprises in the area have exploited GWR for the production, especially in the Middle-upper Pleistocene (qp2-3) and upper Pleistocene (qp3). From 2010 onwards, the exploitation has been reduced thanks to the enforcement of Official Letter No. 2946/UBND-KT of the People’s Committee of Can Tho City (2010) [16] .
In addition, these two figures demonstrate that there was possible GWR recharge from rain water as there was the a little lag-time of GWLs and rainfall amount. According to the DONRE of Can Tho city (2011) [17] , the depth of Pleistocene aquifer was from 35 m to 149 m (MSL), thus, this aquifer may receive some recharge from Bassac River (river depth of 33 m-MSL at Can Tho station).
4.2. Results of ANN Structure Selection, ANN Calibration and Validation
All trainings were carried out by the neural network toolbox from the Visual Gene Developer. By means of trial and error for different ANN structures, the input layer consisted of various input nodes, and a 3-monthly time-lag was included (time lags t, t-1, t-2, and t-3 considering t is the value of a given variable at the present time step), and optimum ANN structures were obtained. The output of the network is a prediction of the GWLs at three lead times (1-, 2-, 3-month). The number of hidden neurons was determined through trial and error.
Figure 3. GWLs at QT08 station vs. rainfall.
Figure 4. GWLs at QT16 station vs. rainfall.
The results of ANN structure selection show that the optimum structures for qp2-3 and qp3 are 14-15-1 and 12-15-1 (with respectively to the input, hidden and output nodes), respectively. The number of nodes in the hidden layer has a slightly impacts on the accuracy of prediction. Therefore, these two structures were selected for 1-, 2-, 3-month GWLs prediction at QT08 and QT16, respectively. Figure 5 shows examples of the ANN structures (14-15-3 and 12-15-3) and weights (in the figure, red color corresponds to high positive number and violet color means high negative number. Line width is proportional to absolute number of weight factor or threshold value).
The comparison between observed and 1-month predicted GWLs at QT08 at qp2-3 and qp3 layers, respectively are presented in Figure 6 and Figure 7. Looking at the qualitative performance of GWLs, the shape and character of the predicted GWLs fits quite well with observations. Although the peak GWLs are under- or overestimated, this is not considered to be a serious problem, since the objective of this study is to assess the mean monthly GWLs, which are well fitted to observed GWLs at monitoring wells.
The correlations between GWLs and other impact factors such as rainfall, water levels in Bassacriver and GWR withdrawal for industrial uses were tested. The results show high negative correlations between GWLs and GWR withdrawal for industrial uses. In contrast, there are low correlations between GWLs and rainfall/water levels in Bassacriver (data is not shown here). Therefore, further study should consider the future projection of GWR pumping for different purposes.
Performance statistics are summarized in Table 3 and the scatter plot of observed and predicted GWL for 1-2-3 month at QT08 and QT16 at Pleistocene aquifer (qp2-3 and qp3) are depicted in Figure 8 and Figure 9. It is observed from Table 3 that the model performance is good, and the models have predicted the GWLs with reasonable accuracy in terms of all the statistical indices during calibration and validation periods (EI, RMSE and R2). The best fit between observed and predicted values shows high values of Efficiency Index (EI)
Figure 5. Selected ANN structures (14-15-3 and 12-15-3) and connection weights. (ANN structures: 14-15-3 and 12-15-3; and weights (red color corresponds to high positive number and violet color means high negative number. Line width is proportional to absolute number of weight factor or threshold value)).
(a)(b)
Figure 6. Comparison between observed vs. predicted 1-month GWLs at QT08 (qp2-3). (a) Calibration (2004-2012); (b) Validation (2013-2015).
(a)(b)
Figure 7. Comparison between observed vs. predicted 1-month GWLs at QT08 (qp3). (a) Calibration (2004-2012); (b) Validation (2013-2015).
Table 3. Performance statistics of the 1-, 2-, 3-month GWLs prediction.
and the R efficiency (R2) with all EI and R2 values are greater than 90%. Regarding the Root Mean Square Error (RMSE), RMSE statistic, which is a measure of residual variance that shows the global goodness of fit between the predicted and observed GWLs, is very good as evidenced by a low RMSE values during both
calibration and validation periods. As can be seen, the variation in RMSE statistics lies between a minimum of 0.06 m to a maximum of 0.22 m.
From Table 3 and Figure 8 and Figure 9, it is clear that the calibrated and validated ANN predicted the GWLs with reasonable quality, so it can be used to evaluate the effects of different scenarios (rainfall, river water levels and GWR pumpings) on GWL in the study area. It can be concluded that, in general, the results indicate the potential of neural computing techniques (ANN) in predicting the GWLs at observation wells at 1-, 2- , 3-month lead time.
5. Conclusions
Greater demand of groundwater resources (GWR) for domestic and industrial production purposes cause the widespread exploitation of the resources. GWLs in the study area reduced rapidly from 2000 to 2015, especially in the Middle-upper Pleistocene (qp2-3) and upper Pleistocene (qp3) layers due to the over-withdrawals of GWR in almost all the enterprises in the area. As the result, the Official Letter No. 2946/UBND-KT of the People’s Committee of Can Tho City was issued and taken into enforcement in 2012, to monitor the exploitation.
Application of Artificial Neural Network (ANN) has successfully demonstrated that the groundwater levels (GWLs) can be predicted by considering different impact factors. The predicted results will help to draw an attention of the local/central government to devise and formulate a clear GWR management policy for the Mekong delta, especially the industrial zones in the urban areas such as Can Tho city.
There are high negative correlations between GWLs decline and GWR withdrawal for industrial uses; therefore, further study should consider scenarios of GWR pumping for different purposes.
Acknowledgements
The authors express their sincere thanks to the Ministry of Education and Training (MOET) for supporting this study.