Optimization of the Number and Location of Boreholes for Gassy Soil Site Investigation Considering the Statistical Uncertainty ()
1. Introduction
The prevalence of gassy soils is widely distributed in the eastern coastal areas of China, particularly in the Hangzhou Bay area, Zhejiang province, as shown in Figure 1. Gassy soils, originating from the anaerobic decomposition of organic materials [1], are predominantly methane-dominated, with CH4 constituting over 90% of the samples in the Hangzhou Bay area in Table 1 and Table 2. The spatial variability in the distribution of gas pressure in these soils poses potential risks, such as fire outbreaks and blasting, during underground construction projects [2]. To mitigate these risks during underground construction, a site investigation scheme is imperative. The scheme, specifying the number and locations of boreholes, strategically places them to measure gas pressure values using a modified Cone Penetration Test (CPT) device, demonstrated in Figure 2.
Figure 1. The enrichment area of gassy soils in the eastern coastal area of China.
Table 1. Gas composition of tunnel across the Qiantang River [3].
Borehole number |
Number of gas sample |
CH4/% |
N2/% |
CO2/% |
CO/10−6 |
C-02 |
1 |
91.6 |
5.7 |
2.69 |
110 |
C-04 |
2 |
95.2 |
2.2 |
2.58 |
85 |
Additional 1 |
3 |
94.6 |
1.9 |
3.44 |
96 |
Figure 2. Site investigation of gassy soil with modified CPT device [3].
Table 2. Gas composition in wells at the south of the Qiantang River [3].
Borehole number |
Number of gas sample |
CH4/% |
N2/% |
CO2/% |
CO/10−6 |
C26 |
1 |
90.4 |
7.67 |
1.92 |
230 |
C31 |
2 |
92.8 |
5.31 |
1.88 |
125 |
C35 |
3 |
91.5 |
6.96 |
1.53 |
180 |
However, due to the substantial cost and human commitments associated with site investigations for gassy soils, the data obtained are limited in engineering practice. Predicting the state of gas pressure (safe or dangerous) at unknown points based on the acquired data becomes essential. However, such predictions relying on limited data, introduce uncertainty, particularly considering that the random field parameters (e.g., mean value, standard deviations, and scale of fluctuation) characterizing the spatial variability distribution of gas pressure remain unknown during the site investigation stage and result in corresponding statistical uncertainty. In light of these challenges, determining the optimal site investigation scheme, including the optimal number of boreholes and their corresponding locations, becomes a pertinent and open question. This optimization is crucial for effectively identifying the risk state and reducing associated uncertainty at unknown locations before the construction of underground projects.
As previously discussed, it is crucial to carefully determine the number and locations of boreholes to effectively identify the risk state and reduce corresponding uncertainty at unknown locations. This task is challenging, particularly considering the statistical uncertainty associated with random field parameters that quantify the distribution of gas pressure. While some studies have explored gassy soils in the Hangzhou Bay area, these predominantly focused on aspects such as the formation and composition of biogenic gas [2], features and distributions of gas pools [3], and exploration methods [4] [5]. Some researchers have discussed planning investigation schemes for gassy soil, primarily focusing on decreasing uncertainty in gas pressure distribution at unknown points, often overlooking the identification of the risk state at these locations and disregarding the statistical uncertainty of random field parameters [6]-[12].
This study introduces a probabilistic site investigation optimization method to determine the optimal scheme for investigating gassy soils. The method utilizes the expected state-identification probability to recognize the risk state of gas pressure and quantify corresponding uncertainty at unknown points. To decrease the uncertainty of the identified risk state, the site investigation scheme seeks the larger value of the expected state-identification probability at each unknown point. The scheme with the maximal value of expected state-identification probability at the minimal value location (i.e., the expected state-identification probability) is then identified using SSO in the space of candidate site investigation schemes generated through discretization [13]-[15]. The candidate scheme satisfying the condition that its maximal expected state-identification probability at the minimal value location exceeds a given threshold probability is determined as the optimal scheme.
The research is structured with an introduction, followed by a demonstration of the proposed framework. Subsequently, the generation of the space of candidate site investigation schemes, quantification of expected state-identification probability, and optimization of the optimal scenario using SSO are covered in detail. Lastly, the implementation procedure of the proposed approach is presented and illustrated through a case study in the Hangzhou Bay area.
2. Framework for Probabilistic Site Investigation
Optimization for Gassy Soils
Accurately identifying the risk state (safe or dangerous) of gas pressure and quantifying the corresponding uncertainty before construction is crucial to prevent engineering disasters caused by gassy soils. Typically, site investigations of gassy soils are conducted to estimate the risk state of gas pressure at unknown locations, relying on a limited amount of investigation data. To address this, an effective investigation scheme is significant that not only accurately identifies the risk state at unknown locations but also reduces the uncertainty associated with the identified gas pressure risk state. This study introduces a probabilistic site investigation approach for gassy soils to fulfill this purpose. It is important to note that this study focuses on the one-dimensional spatial variability of gas pressure in the horizontal direction, ignoring consideration of vertical spatial variability, which may be explored in future studies.
The proposed framework, illustrated in Figure 3, comprises three key steps: generation of the space of candidate site investigation schemes, quantification of expected state-identification probability, and optimization of borehole locations using SSO. The approach commences with the generation of all possible candidate site investigation schemes, achieved through a discretization procedure based on the site investigation range of gassy soils and a given discretization interval. It is crucial to emphasize that the determination of the discretization interval should align with the specific requirements and accuracy standards of the site investigation. After obtaining the space of candidate site investigation schemes, the expected state-identification probability is employed to identify the risk state and quantify corresponding uncertainty at unknown locations. This is calculated using simulation data, given that real gas pressure data cannot be obtained at the scheme design stage. To reduce the uncertainty of the risk state at each unknown point, the candidate site investigation scheme must ensure that the expected state-identification probability at the minimum value point has the maximum value. This optimization problem can be addressed using SSO. The candidate scheme that guarantees the value of the expected state-identification probability at the minimum value point surpasses a given probability threshold is determined as the optimal scheme.
![]()
Figure 3. The framework of proposed probabilistic site investigation approach for gassy soils.
3. Space of Candidate Site Investigation Schemes
The determination of candidate site investigation schemes, relying on the number and placement of boreholes, is achieved through a discretization process. Consider the length, L, of the site investigation field. The points of interest, denoted as Lm (where m = 1, 2, 3, …, N), adhere to Lm = (m − 1) with a given interval ∆L. Here, N is calculated as INT [L/∆L], where INT [·] denotes the rounding function returning the integer part of L/∆L. All values of Lm (m = 1, 2, 3, …, N) can be represented as a vector LN = [L1, L2, …, LN], as shown in Figure 4, encompassing a total of N possible values of Lm.
Figure 4. Site investigation scheme Sn = [x1, x2, …, xk, …, xn].
Assuming that investigation schemes are denoted by a vector Sn = [x1, x2, …, xk, …, xn], representing borehole locations horizontally. xk signifies the location of the k-th borehole, and n denotes the number of boreholes. The potential value of xk should correspond to an element (i.e., a feasible discretization point Lm (m = 1, 2, 3, …, N)) in LN. Based on this, each possible value of x1-xn constitutes the candidate scheme Sn, and it can be deduced that there is a total of
candidate site investigation schemes.
In practical engineering scenarios, based on the data of scheme Sn, engineers need to predict the risk state, denoted as the expected state-identification probability, at unknown locations. These locations are represented by the vector LN-n = [y1, y2, …, yj, …, yN-n], where boreholes are not placed to measure gas pressure. The value of yj should belong to the set Ωo, representing feasible values of Lm, while not being identical to any values among x1, x2, …, xk, …, xn. The number of unknown points yk, representing the difference between the total number of points of interest (i.e., Lm (m = 1, 2, 3, …, N)) and the number of points (xi (i = 1, 2, …, n)) corresponding to scheme Sn, is determined as N-n.
In the context of engineering practice, the primary focus is determining the risk state and associated uncertainty of the unknown point, yj. Identification of the risk state at an unknown location and the effective reduction of uncertainty related to the identified gas pressure risk state are pivotal considerations in the site investigations from an engineering perspective. These objectives can be accomplished by maximizing the expected state-identification probability. The specifics regarding the quantification of the expected state-identification probability will be covered in the subsequent Section 4.
4. Definition of Expected State-Identification Probability
4.1. Simulated Data with Prior Knowledge of Gas Pressure
To assess the expected probability,
, at the point yj, simulated data is employed. This data is generated based on prior knowledge of gas pressure, mean values μ, standard deviations σ, and the scale of fluctuation λ. Given that real gas pressure data (i.e., Zbr) is unavailable at the scheme design stage, simulated data becomes crucial. For instance, when considering mean values μ, standard deviations σ, and the scale of fluctuation λ varying within their respective typical ranges [μmin, μmax], [σmin, σmax], and [λmin, λmax], these parameters can be treated as uniform random variables defined by their typical ranges. Prior knowledge of random field parameters can be derived from historical data available in global databases as well as data specific to the site under consideration. In cases where no prevailing knowledge exists, the potential ranges of random field parameters can be determined based on their typical values reported in the literature. This approach provides a relatively uninformative prior knowledge, allowing for the incorporation of parameter uncertainty in the analysis. Random samples of μ, σ, and λ can be generated, denoted as μs,i, σs,i, and λs,i (where i = 1, 2, 3, …, Ne), representing Ne sets of random samples. For each set of μs,i, σs,i, and λs,i, the simulated data at discretization point Lm (where m = 1, 2, 3, …, N) can be expressed as Zs,i(LN) = [Zs,i(L1), …, Zs,i(Lm),…, Zs,i(LN)](i = 1, 2, 3, …, Ne). In this study, Zs,i(LN) is simulated using Karhunen-Loeve (K-L) expansion [16] [17], and the formulation is as follows:
(5)
where Zs,i(Lm)(i = 1, 2, 3, …, Ne) is the gas pressure data simulated using the sample μs,i, σs,i and λs,i; Lm is the discretization points with the given the length, L, of site investigation field concerned and corresponding interval ΔL. ζ(θ) is independent standard normal random variable; vj and fj(x) are the eigenvalues and eigenfunctions of the covariance function, which is taken as a squared exponential correlation function in this study:
(6)
where τ is the separate distance between two locations in the horizontal direction; ρ(τ) is the autocorrelation coefficient between the gas pressures at the two locations. For the sake of conciseness, details of the random field simulation based on K-L expansion are not provided here. Interested readers may refer to related reference [16] [17].
4.2. Prediction of the Gas Pressure Values with Gaussian Process
The simulated gas pressures at borehole locations (i.e., x1, x2, …, xk, …, xn) of scheme Sn are denoted as vector Zbr,i(Sn) = [Zbr,i(x1), …, Zbr,i(xk), …, Zbr,i(xn)]. Employing Zbr,i(Sn), Gaussian Process (GP) is applied to predict the gas pressure values at the unknown location [y1, y2, …, yj, …, yN-n], denoted as Zc,i(LN-n) = [Zc,i(y1), …, Zc,i(yj), …, Zc,i(yN-n)]. Zc,i(LN-n) comprises random variables with a joint Gaussian distribution, expressed as
[18],
and
respectively.
(j = 1, 2, 3, …, N-n) is the expectation of the gas pressure value Zc,i(yj) at the location yj.
(j = 1, 2, 3, .., N-n; k = 1, 2, 3, …, N-n) is the covariance between Zc,i(yj) and Zc,i(yk).
4.3. Calculation of Expected State-Identification Probability with
Simulated Data
Given that the multi-dimensional variable
represents a joint Gaussian distribution with an expectation
and covariance
[18], it follows that the marginal distribution Zc,i(yj) (j = 1, 2, 3,…, N-n) is also a Gaussian distribution. The probability of Es and Ed can be achieved using Equations (7) and (8), respectively.
(7)
(8)
where
and
are the probability of Es and Ed respectively, given data Zbr,i(Sn). Zc,i(yj) is the gas pressure at yj that is a Gaussian random variable with expectation
and standard deviation
. It is worth pointing out that
is the diagonal elements of
.
To assess the uncertainty in gas pressure distribution, Monte Carlo simulation is employed for the repetitive prediction of gas pressure using GP based on the Ne simulated data Zbr,i(Sn) = [Zbr,i(x1), …, Zbr,i(xk), …, Zbr,i(xn)] (i = 1, 2, 3, …, Ne). This results in Ne sets of expected values of predicted gas pressure, denoted as
(i = 1, 2, 3, …, Ne). With each set of simulated data Zbr,i(Sn) (i = 1, 2, …, Ne), the probabilities of Es and Ed are computed as pis(yj) and pid(yj) (i = 1, 2, …, Ne) using Equations (7) and (8). Subsequently, the mean values of pis(yj) and pid(yj) corresponding to the Ne sets of simulated data Zbr,i (i = 1, 2, …, Ne) are determined with Equations (9) and (10):
(9)
(10)
where pse(yj) and pde(yj) are the mean values of pis(yj) and pid(yj) corresponding to the Ne sets of simulated data Zbr,i (i = 1, 2, …, Ne). Ne is the total number of simulated data Zbr,i(Sn) (i = 1, 2, …, Ne).
Substitute Equations (9)-(10) into Equation (4),
can be expressed as Equation (11).
(11)
The next section makes uses of SSO to identify the optimal scheme
among the candidate site investigation scheme space.
5. Definition of Expected State-Identification Probability
As discussed in the “Space of Candidate Site Investigation Schemes” section, a total of
candidate schemes are generated by randomly selecting n discretization points from Ωo. The process of identifying the scheme
with the highest value of
at the location ymin can be expressed as the optimization problems in Equation (12):
(12)
As demonstrated in Equation (12), the optimization of the borehole locations is carried out with the expected state-identification probability,
, at the ymin location as the objective function. Solving the optimization problem (Equation (12)) to determine the scheme
and its corresponding
can be challenging due to the potentially large number (
) of candidate schemes. In this study, SSO, a well-established global optimization algorithm, is employed to address Equation (12). Within the SSO framework, the optimal scheme,
, characterized by the maximum
, is identified by exploring the design space of candidate schemes in a stochastic manner. Theoretically,
can be found among the candidate schemes by solving the following reliability analysis problem in Equation (13) [13]:
(13)
where
is an auxiliary failure event.
represents the probability that event F occurs, which becomes to zero as scheme Sn is equal to
.
A number of conditional samples of a series of nested intermediate failure events satisfying
is generated with SSO, with which
is expressed as Equation (14):
(14)
where
, m = 1, 2, 3, …, Ns.
is equal to
;
are an increasing sequence of Ns intermediate threshold values, which are determined adaptively with simulated samples so that the sample estimates of P(F1) and P(Fm|Fm−1) are always equivalent to a specific value of conditional probability p0 (e.g., 0.1). For a given number of boreholes, each set of random samples of feasible locations constitutes a random candidate scheme. The Subset Simulation approach begins with direct Monte Carlo simulation to generate a specified number, NL, of random schemes. Subsequently, the expected state-identification probability values of these random schemes are calculated and ranked in ascending order to identify a number, p0NL, of seed schemes. These seed schemes define the first threshold, F1, and another NL - p0NL random schemes satisfying F1 are simulated using Markov Chain Monte Carlo simulation (MCMCS). Similar procedures are then iterated to progressively explore m = 2, 3, ..., Ns levels, level by level. The implementation of SSO involved with related parameters (e.g., conditional probability p0 and Ns) setting, details of which can refer to related reference [15].
For various values of n, representing the number of boreholes in scheme Sn, employ the SSO approach mentioned earlier to identify the corresponding
. If the value of
at the ymin point associated with
exceeds a predefined threshold probability value, denoted as p*, then
is designated as the optimal scheme. The specific procedure for determining
using the proposed method will be covered in the subsequent section.
6. Illustrative Example
6.1. Candidate Investigation Schemes
To illustrate the application of the proposed approach in this study, an example of site investigation for gassy soils from the literature is adopted [6], focusing solely on the horizontal spatial variability of gas pressure. The cross-sectional length, L, in this example is 1023 m, discretized at 5 m intervals, resulting in a total of 205 discretization points, Lm (m = 1, 2, 3, …, 205), where Lm = 5(m − 1) (m = 1, 2, 3, …, 205). For varying investigation schemes, the number, n, of boreholes ranges from 10 to 30 at intervals of 5, i.e., 10, 15, 20, 25, 30.
Consider the case of n = 25, and the corresponding scheme S25 = [x1, x2, …, xk, …, x25]. The space of candidate schemes encompasses
instances of S25, randomly selected from the 205 discretization points (i.e., Lm (m = 1, 2, 3, …, 205)). The set of unknown points is denoted as L180 = [y1, y2, …, yj, …, y180]. Each yj (j = 1, 2, 3, …, 180) must belong to the feasible value set of Lm, without being equal to any values among x1, x2, …, xk, …, x25.
The scheme
, maximizing the value of
at the ymin point can be determined from the space of candidate schemes. This determination relies on simulated data generated with prior knowledge of gas pressure. The given prior knowledge in the literature assumes μ = 0.278 MPa, σ = 0.097 MPa, and λ = 50 m. However, since precise knowledge during the site investigation stage is unavailable, the mean values μ, standard deviations σ, and scale of fluctuation λ are considered as uniform random variables within their typical ranges specifically, μ ∈ (0 kPa, 300 kPa], σ ∈ (0 kPa, 125 kPa], and λ ∈ (0 m, 100 m] in this study, covering the prior knowledge assumption (i.e., μ = 278 kPa, σ = 97 kPa, and λ = 50 m) used in the literature [6].
The impact of gassy soils is contingent on the gas pressure’s specific value. Generally, gassy soils are deemed hazardous if the gas pressure is greater than or equal to 100 kPa; otherwise, the risk associated with gassy soils is considered negligible. Consequently, R is set at 100 kPa in this study, defining Es (Safe event) and Ed (Dangerous event) as the event with gas pressure less than 100 kPa and its complementary event, as shown in Table 3.
Table 3. Definition of ignorable and risky events.
Events |
Gas pressure (kPa) |
Notes |
Es |
(0, 100) |
Safe event |
Ed |
[100, +∞) |
Dangerous event |
As outlined in the “Definition of expected state-identification probability” section, the proposed approach uses the value of
to ascertain the presence of gas pressure risk at certain locations. In general, if the value of
is substantial, indicating the likely risk, the uncertainty associated with the presence of risk can be disregarded. Employing verbal probability descriptors in Table 4, the threshold value (i.e., p*) for determining the presence of risk is set at 0.9 (very likely) in this example. Since p* exceeds 0.9, Es or Ed is highly likely to occur based on whether
or
, respectively. The optimal scheme chosen by the proposed approach must ensure that the
values at each unknown location are all greater than 0.9, regardless of whether
or
.
Table 4. Verbal descriptors and their probability equivalents [19].
Verbal descriptor |
Virtually impossible |
Very unlikely |
Equally likely |
Very likely |
Virtually certain |
Probability equivalent |
0.01 |
0.10 |
0.50 |
0.90 |
0.99 |
6.2. Expected State-Identification Probability Given Different
Investigation Schemes
For instance, consider S25 = [x1, x2, …, xk, …, x25] with the borehole number of n = 25. To determine the optimal scheme,
, from the array of candidate schemes (i.e.,
), the value of
at the ymin point must be calculated. Initially, Ne (Ne = 500), of random field parameters μ, σ, and λ were generated from prior knowledge (i.e., uniform distribution within typical ranges for μ, σ, and λ). Using each set of μ, σ, and λ samples, Zs,i is simulated using Equation (5) and (6), where the gas pressures at borehole locations in S25 constitute a set of simulated data denoted as Zbr,i(S25). For each Zbr,i(S25), the mean
and covariance
of the predicted gas pressure at the unknown points L180 are obtained with GP. pse(yj) and pde(yj)) corresponding to the Ne sets of simulated data Zbr,i(S25) are then used to calculate
at each L180 location.
As discussed in Section 5 titled “Optimization of borehole location with SSO”, SSO is employed to locate the optimal scheme
that maximizes the value of
at the ymin point in the space of candidate schemes, where p0 and Ns are set as 0.1 and 30, respectively, and 1000 samples are simulated in each level. Figure 5 illustrates the intermediate threshold value of
at different simulation levels as m increases. With an increase in the number of simulation levels (i.e., m),
increases and reaches a value of 0.910 at m = 4. In this example, the SSO is executed until the 20th level to ensure the convergence of
, after which the
at m = 4 is considered as the estimate of the maximal
values.
Figure 5. Evolution of the intermediate threshold value of the state-identification probability during SSO of S25.
For varying borehole number (i.e., n), Figure 6 displays the optimized maximal
values using SSO for n = 10, 15, 20, 25, and 30, respectively. The results indicate that as the value of n increases to 20 (i.e., S20), the maximal
values surpass 0.9. This suggests that all
values at each location L185 (i.e., y1, y2,…, and y185) are greater than 0.9. For schemes with a larger number of boreholes (e.g., S25 and S30), the proposed approach identifies optimal schemes, such as
and
, where all
values exceed 0.9, as demonstrated in Figure 7. However, it’s important to note that the optimal schemes (i.e.,
,
) corresponding to S25 and S30, as optimized by the proposed method, require more investigation efforts due to the larger number of boreholes compared to S20. Therefore, considering the investigation effort, n = 20 is determined to be the optimal number of boreholes, and the corresponding scheme
is selected as the optimal scheme.
![]()
Figure 6. Evolution of the intermediate threshold value of the expected state-identification probability during SSO for different numbers of boreholes.
Figure 7. Expected state-identification of the optimal experimental schemes with different number of boreholes.
The specific horizontal coordinates of the optimal scheme
are illustrated in Figure 8 with blue-filled circles. The expected state identification for all unknown locations (i.e., LN-n) along the horizontal direction is obtained and depicted in Figure 8. It’s noteworthy that some of L185 correspond to
(indicated by green triangles representing safe gas pressures), while others correspond to
(depicted by red squares denoting dangerous gas pressures). For locations where pa = pde(yj) and pa ≥ 0.9, it is highly likely that the gas pressure is not risky. Conversely, for locations where pa = pre(yj) and pa ≥ 0.9, there is a high likely that the gas pressure is risky.
Figure 8. Expected state-identification probability of location LN-n along the horizontal direction corresponding to optimal scheme
.
6.3. Comparison with Bayesian Compressive Sampling
To assess the effectiveness of the optimal scheme determined by the proposed method, a comparison with Guan et al.’s approach for planning a site investigation scheme is crucial. Guan et al.’s approach utilizes Bayesian compressive sampling (BCS) and information entropy to automatically determine sample size and optimal sampling locations for predicting the gas pressure distribution, given specific values (i.e., μ = 278 kPa, σ = 97 kPa, and λ = 50 m) of random field parameters. As outlined in Section 7.2, titled “Expected state-identification probability given different investigation schemes”, the optimal scheme determined by the proposed method is
, with a corresponding optimal number of boreholes of 20. Utilizing the mean value and standard deviation of gas pressure predicted with simulated data corresponding to the optimal scheme
, the coefficient of variation (COV) for each unknown location is obtained, as illustrated in Figure 9.
![]()
Figure 9. COV of location LN-n along the horizontal direction corresponding to optimal scheme
.
In Figure 9, it can be seen that the maximum COV among all unknown locations (i.e., LN-n) determined by the proposed method in this study is 44.37%. This value is close to the maximum COV (42.69%) obtained by Guan et al.’s approach when the number of boreholes is 20. It is important to note that, in this study, the mean values μ, standard deviations σ, and scale of fluctuation λ are defined as uniform random variables within their respective typical ranges (i.e., μ ∈ (0 kPa, 300 kPa], σ ∈ (0 kPa, 125 kPa], and λ ∈ (0 m, 100 m]) rather than specific values (i.e., μ = 278 kPa, σ = 97 kPa, and λ = 50 m) as in Guan et al.’s approach. This choice introduces larger uncertainty and relatively less informative prior knowledge on random field parameters. Therefore, the result that the maximum COV (44.37%) determined by the proposed method, given the same number of boreholes (n = 20), is relatively larger than that of Guan et al.’s approach is reasonable. This finding substantiates the effectiveness of the method proposed in this study.
6.4. Effect of the Range of Prior Knowledge
Employing the proposed approach, the determination of the optimal scheme relies on the prior knowledge concerning the random field parameters of gas pressure. In the previous discussion, the prior knowledge has been defined as μ ∈ (0 kPa, 300 kPa], σ ∈ (0 kPa, 125 kPa], and λ ∈ (0 m, 100 m]), referred to as Priori I in this research. To discuss the impact of varying prior knowledge, this subsection explores a new set of parameters (i.e., μ ∈ (0 kPa, 150 kPa], σ ∈ (0 kPa, 62.5 kPa], and λ ∈ (0 m, 50 m]]), denoted as Priori II, in the determination of the optimal scheme using the proposed method.
Figure 10 demonstrates the
values of optimal schemes with varying numbers (n) of measuring points, determined using Priori I and II. The results are depicted by lines with squares and circles, respectively. For a specific number of measuring points, the
associated with Priori II surpasses that of Priori I. This indicates that, with the same number of measuring points, gas pressure exhibits lower uncertainty when considering Priori II compared to Priori I. This discrepancy arises from the relatively higher informativeness of Priori II, leading to a more substantial reduction in uncertainty concerning gas pressure variability given an equivalent amount of measurement data. While the proposed approach was developed based on the squared exponential correlation function, it can be adapted to accommodate other correlation functions. During the optimization phase of the scheme, the choice of correlation function can be made based on existing knowledge of gassy soil. If multiple correlation functions are considered, the uncertainty associated with model selection should be integrated into the optimization process. Bayesian model selection methods can be utilized to quantify this uncertainty. Incorporating the proposed method to address model selection uncertainty will be a focus of future research endeavors.
![]()
Figure 10. Comparison of the expected state-identification probability of optimal schemes obtained using different prior knowledge for different numbers of boreholes.
7. Summary and Conclusions
The study has devised a probabilistic method for optimizing site investigation, aiming to determine the most effective investigation scheme while considering the statistical uncertainty associated with random field parameters. This approach allows for the accurate identification of the risk state and simultaneous reduction of the corresponding uncertainty. The key findings are summarized as follows:
1) The space of potential site investigation schemes is established through discretization along the horizontal dimension of gassy soil areas. The expected state-identification probability, quantifying the risk and uncertainty of gassy soils, is computed using simulated data based on GP. An optimization process is employed to identify the site investigation scheme with the maximum expected state-identification probability at the minimum value location. This scheme is considered optimal if its probability value surpasses a predetermined threshold.
2) The proposed approach is applied and validated using a site investigation example from the literature concerning gassy soils. Results demonstrate that, for a given number of measuring points (n), the maximal expected state-identification probability at the minimum value location increases progressively with an elevated number of simulation levels. This value ultimately converges to a maximum. The determined optimal scheme corresponds to n = 20 measuring points, identified by the SSO, as it exceeds the specified threshold probability value (0.9).
3) The effectiveness of the proposed method is verified by comparing it with an alternative approach for planning site investigation schemes. Using the optimal number of measuring points determined by our method (n = 20), the maximum COV of gas pressure among all unknown locations is found to be 44.37%, surpassing the 42.69% obtained through the alternative approach. The advantage of our proposed site investigation method lies in its consideration of the prior knowledge of parameters, defined as uniform random variables within typical ranges, aligning more closely with actual engineering conditions. This results in larger uncertainty but provides a more informative and realistic representation of prior knowledge on random field parameters. Notably, the method allows for the identification of gas pressure risk states during the site investigation stage, a facet overlooked in previous studies.