Optimal Penalization and Nonlinear Solver Convergence for a DG-Based Richards’ Equation Model of Variably Saturated Flows

Camille Poussel; Mehmet Ersoy; Yolhan Mannes; Aimed Ajroud; Fr&#233; d&#233; ric Golay

doi:10.4236/jamp.2025.1311227

Journal of Applied Mathematics and Physics > Vol.13 No.11, November 2025

Optimal Penalization and Nonlinear Solver Convergence for a DG-Based Richards’ Equation Model of Variably Saturated Flows

Camille Poussel¹, Mehmet Ersoy¹, Yolhan Mannes¹, Aimed Ajroud², Frédéric Golay¹
¹Institut de Mathématiques de Toulon (IMATH), Université de Toulon, La Garde, France.
²École d’Ingénieurs SeaTech, Université de Toulon, La Garde, France.
DOI: 10.4236/jamp.2025.1311227 PDF HTML XML 3 Downloads 14 Views

Abstract

In this article, we study the convergence of an IIPG (Incomplete Interior Penalty Galerkin) Discontinuous Galerkin numerical method for the Richards equation. The Richards equation is a degenerate parabolic nonlinear equation for modeling flows in porous media with variable saturation. The numerical solution of this equation is known to be difficult to calculate numerically, due to the abrupt displacement of the wetting front, mainly as a result of highly nonlinear hydraulic properties. As time scales are slow, implicit numerical methods are required, and the convergence of nonlinear solvers is very sensitive. We propose an original method to ensure convergence of the numerical solution to the exact Richards solution, using a technique of auto-calibration of the penalty parameters derived from the Galerkin Discontinuous method. The method is constructed using nonlinear 1D and 2D general elliptic problems. We show that the numerical solution converges toward the unique solution of the continuous problem under certain conditions on the penalty parameters. Then, we numerically demonstrate the efficiency and robustness of the method through test cases with analytical solutions, laboratory test cases, and large-scale simulations.

Keywords

Porous Media, Richards Equation, Discontinuous Galerkin, Backward Differentiation Method, Incomplete Interior Penalty Galerkin (IIPG), Broken Sobolev Space, Picard’s Fixed Point, Minimal Regularity Solution

Share and Cite:

Poussel, C. , Ersoy, M. , Mannes, Y. , Ajroud, A. and Golay, F. (2025) Optimal Penalization and Nonlinear Solver Convergence for a DG-Based Richards’ Equation Model of Variably Saturated Flows. Journal of Applied Mathematics and Physics, 13, 4083-4127. doi: 10.4236/jamp.2025.1311227.

1. Introduction

The behavior of flows in variably saturated porous media can be modeled by the Richards’ Equation (RE). One of the key advantages of RE is its ability to represent the porous medium, incorporating both saturated and unsaturated zones. While it doesn’t consider the air phase, RE effectively incorporates the effects of gravity and capillarity, enabling the modeling of complex processes across various scales. Notably, RE is a nonlinear parabolic equation that can transform into an elliptic equation under complete saturation conditions.

The history of RE begins with Darcy’s law, which was formulated experimentally by Darcy in 1856 [1] for saturated porous media. This result was later extended to multiphase flows by Buckingham in 1907 [2], resulting in the Darcy-Buckingham law, which serves as the cornerstone for the derivation of RE. The equation was first established by Richardson in 1922 [3], although it was later attributed solely to Richards, who independently published the equation in 1931 [4]. Initial attempts to numerically solve the RE date back to the late 1960s with the works of Rubin [5] and Cooley [6]. From the 1980s, RE was extensively studied from both theoretical and numerical perspectives.

In this paper, RE is introduced by providing its expression and constitutive laws. As the main objective of this work is to solve RE using Discontinuous Galerkin (DG) methods, the weak problem associated with RE is given and its discretization using the Incomplete Interior Penalty Galerkin (IIPG) formulation. Additionally, an overview of the penalization method is provided. The fully discrete IIPG formulation is derived through time integration using the implicit Backward Differentiation Formula (BDF) method. Due to the nonlinear nature of RE, its fully discretized nonlinear formulation is linearized using Picard’s fixed-point method. Theoretical results related to the solution of stationary nonlinear elliptic problem are produced, including existence, uniqueness, and convergence results. Furthermore, an automatic calibration method is obtained for penalization parameters. The solution of RE using the previously mentioned IIPG formulation is implemented in an in-house numerical code named RIVAGE, which is then validated against numerical benchmarks.

2. Governing Equation

RE is a classical nonlinear parabolic equation used to describe flow in both unsaturated and saturated zones of an aquifer (for a detailed derivation of the equation, please refer to Clément’s 2021 thesis [7]).

The so-called mixed formulation of the RE, commonly used in hydrology, is

$θ (h - z) - \nabla \cdot (K (h - z) \nabla h) = 0$ (1)

where $h : = ψ + z$ is the hydraulic head with $ψ$ the pressure head, $z$ is the elevation, $θ$ is the water content and $K$ is the hydraulic conductivity tensor.

The tensor of hydraulic conductivity $K$ is split, in general, into two parts: the intrinsic or saturated hydraulic conductivity tensor $K_{s}$ and the relative hydraulic conductivity $K_{r}$ :

$K (ψ) = K_{s} K_{r} (ψ) .$ (2)

The intrinsic hydraulic conductivity tensor $K_{s}$ depend on the material of the porous media.

The relative hydraulic conductivity is a function of the pressure head controlling the behavior of groundwater flow within the porous media and it is defined as

$K_{r} (ψ) = {\begin{array}{l} 1 & if ψ \geq ψ_{e}, \\ K_{e, law} (ψ) & otherwise \end{array}$ (3)

where $K_{e, law}$ is given by empirical laws, see Table 1 and Figure 1. The quantity $ψ_{e}$ , corresponding to the entry of the air pressure, the pressure head transition value between the saturated and unsaturated zones. The saturated zone corresponds to $ψ \geq ψ_{e}$ and the unsaturated zone to $ψ < ψ_{e}$ . The water table corresponds to $ψ = ψ_{e}$ by definition.

Table 1. Hydraulic relations for hydraulic conductivity and effective saturation.

Name	Expression	Parameters
Gardner-Irmay relations (1954) [8]	$\begin{array}{l} S_{e} = e^{\frac{α ψ}{m}} \\ K_{r} = e^{α ψ} \end{array}$	$α$ : Pore-size distribution $m$ : Tortuosity
Vachaud’s relations (1971) [9]	$\begin{array}{l} S_{e} = \frac{C}{C + {\| ψ \|}^{D}} \\ K_{r} = \frac{A}{A + {\| ψ \|}^{B}} \end{array}$	$A, B$ : Empirical shape parameters $C, D$ : Empirical shape parameters
Van Genuchen-Mualen relations (1980) [10]	$\begin{array}{l} S_{e} = {(1 + {(α \| ψ \|)}^{n})}^{- m} \\ K_{r} = S_{e}^{l} {(1 - {(1 - S_{e}^{\frac{1}{m}})}^{m})}^{2} \end{array}$	$l = 0.5$ : Pore connectivity $α$ : Linked to air entry pressure inverse $n > 1$ : Pore-size distribution $m = 1 - \frac{1}{n}$ : Pore-size distribution

Figure 1. Hydraulic laws for effective saturation and hydraulic conductivity.

The water content law is expressed in terms of the effective saturation $S_{e}$ :

$S_{e} (ψ) = \frac{θ (ψ) - θ_{r}}{θ_{s} - θ_{r}},$ (4)

where $θ_{r}$ is the residual water content and $θ_{s}$ is the saturated water content corresponding to the minimal and maximal saturation, respectively. The effective saturation is defined as follows

$S_{e} (ψ) = {\begin{array}{l} 1 & if ψ \geq ψ_{e}, \\ S_{e, law} (ψ) & otherwise, \end{array}$ (5)

where $S_{e, law}$ is given by empirical laws, see 1 and 1.

Remark. The nonlinear behavior of the constitutive laws $S_{e, law}$ and $K_{r, law}$ (see Table 1 and Figure 1) are responsible of the fails of the convergence of the numerical methods and a particular attention have been done. In particular, we have:

In the saturated zone, hydraulic properties remain constant and RE becomes an elliptic equation characterized by fast diffusion.
In the unsaturated zone, hydraulic properties approach very close to zero, which halts diffusion and can cause numerical inconvenience.
For a specific set of parameters, when $ψ \to 0^{-}$ , constitutive laws may display extremely steep gradients.

To overcome, regularization techniques can be employed as in [11], for instance, which make slight modifications to the functions to avoid some types of degeneracy to improve convergence properties. In this paper, we will see that in the framework of DG, we show that whenever some numerical parameters are well-chosen, the modification of such constitutive laws is not necessary.

Equation (1) together with Equation (2) and Equation (4) can be completed with Dirichlet and/or Neumann boundary conditions as done in this work. One can also use more realistic boundary condition in view of real life simulation, such as the seepage boundary condition (we refer to [12] for details).

3. Numerical Methods

This section focusses on the presentation of the numerical solution of RE using DG methods. The solution is sought within a trial space due to the similarity of these methods to Finite Element (FE) methods, resulting in a weak problem.

Let $d \in {1, 2, 3}$ be the space dimension, the porous medium can be represented by the computational domain $Ω \subset ℝ^{d}$ of boundary $\partial Ω = Γ_{D} \cup Γ_{N}$ for which the subscript $D$ and $N$ stands for, respectively, Dirichlet and Neuman. Let $T \in ℝ_{+}^{*}$ be the final time.

The problem is:

Find $h (x, t) : Ω \times (0, T) \to ℝ$ such that:

${\begin{array}{l} θ (h - z) - \nabla \cdot (K (h - z) \nabla h) = 0, & in Ω \times (0, T), \\ h = h_{0}, & in Ω \times {0}, \\ h = h_{D}, & on Γ_{D} \times (0, T), \\ - K (h - z) \nabla h \cdot n = q_{N}, & on Γ_{N} \times (0, T) \end{array}$ ()

where $h \in L^{2} (Ω \times (0, T))$ represents the solution of RE. Additionally, $h_{0} \in L^{2} (Ω)$ , $h_{D} \in L^{2} (Γ_{D}; (0, T))$ , and $q_{N} \in L^{2} (Γ_{N}; (0, T))$ correspond to the initial condition, the Dirichlet boundary condition, and the Neumann boundary condition, respectively.

The matrix-valued function $K$ depends monotonically on $h$ , is symmetric positive definite, and is uniformly bounded below and above (see Equation (2), Table 1 and Figure 1). Similarly, the function $θ$ , also depends monotonically on $h$ , is uniformly bounded below and above (see Equation (4), Table 1 and Figure 1). Both $K$ and $θ$ are continuous functions within a given porous medium but may be discontinuous at the interface of heterogeneous materials.

3.1. Settings

The time duration $(0, T)$ is subdivided into $N$ time intervals such that $0 = t^{0} < t^{1} < \dots < t^{N} = T$ . Let $n \in ℕ$ , $0 < n < N$ , if the time interval $T^{n} = [t^{n}, t^{n + 1}]$ is considered, the corresponding time step is $Δ t^{n} = t^{n + 1} - t^{n}$ .

Let us define $ℰ^{n}$ a partition of the computational domain Ω valid for all $t \in T^{n}$ . For the sake of simplicity, it is assumed that Ω is a polygonal domain in two space dimensions so that $ℰ^{n}$ covers Ω exactly. The mesh $ℰ^{n}$ is composed of quadrilateral and triangular elements not necessarily conformal.

For all elements $E \in ℰ^{n}$ , $d_{E}$ is its diameter defined as the ratio between its surface ( $s_{E}$ ) and perimeter ( $p_{E}$ ) and $d^{n} : = \max_{E \in ℰ^{n}} (d_{E})$ .

The set of all open faces of all elements $E \in ℰ^{n}$ is denoted by $ℱ$ . Moreover, one can define two subsets of $ℱ$ , $ℱ^{\partial}$ for the boundary faces and $ℱ^{in}$ for the interior faces:

$ℱ : = \underset{F \in \partial Ω}{\cup} F and ℱ^{in} : = ℱ \ ℱ^{\partial} .$ (6)

For a given element $E \in ℰ^{n}$ , there exists a set of face $ℱ^{E} : = {F \in ℱ | F \in \partial E}$ which defines boundaries of $E$ . Then, for all interior faces of $E$ , i.e., $\forall F \in ℱ^{E} \cap ℱ^{in}$ , there exists a neighboring element $E_{r}$ such that $E \cap E_{r} = F$ . Consequently, the normal unit vector $n_{F} : = {(n_{x}, n_{y})}^{T}$ pointing from $E$ to $E_{r}$ can be defined. An example of interior face is given Figure 2(a). Moreover for all boundary faces of $E$ , i.e., $\forall F \in ℱ^{E} \cap ℱ^{\partial}$ , there exists $E_{\partial}$ a fictitious element such that $E \cap E_{\partial} = F$ . Consequently, the normal unit vector $n_{F}$ pointing always from $E$ to $E_{\partial}$ can be defined.

Example 1. Figure 2(a) gives a graphical representation for an example mesh composed of triangles and quadrilaterals. In this example, the mesh is composed of 7 elements, i.e., $ℰ^{n} = {E_{i}, i \in 1, \dots, 7}$ . Thus, the set of faces $ℱ = {F_{i}, i \in 1, \dots, 19}$ is defined. It can be split into two subsets, the first one $ℱ^{\partial} = {F_{i}, i \in 1, \dots, 9}$ boundary faces of $ℱ$ , depicted with dashed lines on Figure 2. The second one $ℱ^{in} = {F_{i}, i \in 10, \dots, 19}$ interior faces of $ℱ$ . Figure 2(b) gives graphical representation for two elements $E_{5}$ and $E_{7}$ . Faces are also depicted with their normal vectors.

Figure 2. Example of a mesh.

Let two neighbouring elements $E_{l}$ and $E_{r}$ sharing one face $F \in ℱ$ . There are two traces of a function $v$ on $E_{l}$ ( $v_{l}$ ) and on $E_{r}$ ( $v_{r}$ ):

$v_{l} (x) : = lim_{ε \to 0^{-}} v (x + ε n_{F}) and v_{r} (x) : = lim_{ε \to 0^{+}} v (x + ε n_{F}), \forall x \in F .$ (7)

In addition, on any boundary faces $F \in ℱ^{\partial}$ the trace of $v$ is only defined on the left side of the face:

$v_{l} (x) : = lim_{ε \to 0^{-}} v (x + ε n_{F}), \forall x \in F$ (8)

Using these trace definitions, one can define the jump and the average on any face of the mesh (as displayed in 1D on Figure 3). On an interior face $F \in ℱ^{in}$ , the jump and the average are respectively defined as:

$\forall x \in F, 〚 v 〛 (x) : = v_{r} (x) - v_{l} (x) and {| u |} (x) : = \frac{1}{2} (v_{r} (x) + v_{l} (x)) .$ (9)

Moreover, on a boundary face $F \in ℱ^{\partial}$ , the jump and the average are respectively defined as:

$\forall x \in F, 〚 v 〛 (x) : = v_{l} (x) and {| u |} (x) : = v_{l} (x) .$ (10)

Figure 3. Definition of the mean and jump operators for two elements $E_{l}$ and $E_{r}$ in 1D.

The solution of Problem () is sought in a subspace of the well-known broken Sobolev space, taken to be:

$V^{p} (ℰ^{n}) : = {v \in L^{2} (Ω) | {v |}_{E} \in ℙ^{p} (E), \forall E \in ℰ^{n}}$ (11)

where $ℙ^{p} (E)$ stands for the set of polynomial functions of degree less than or equal to $p \in ℕ$ on $E$ . It is called the DG space. For more detailed and general definitions of this set, see [13].

3.2. Semi-Discrete Weak Formulation

Keeping in mind that

$\forall u, v \in V^{P} (ℰ^{n}), 〚 u v 〛 = 〚 u 〛 {| v |} + {| u |} 〚 v 〛,$ (12)

assuming that the flux of RE is continuous at the interfaces of elements:

$\forall F \in ℱ, {〚 K (h - z) \nabla h \cdot n_{F} 〛 |}_{F} = 0,$ (13)

the Neumann boundary condition arises naturally in the weak formulation, multiplying Problem () by a test function $φ \in V^{p} (ℰ^{n})$ and integrating on each element of $ℰ$ , we get

${\begin{array}{l} \sum_{E \in ℰ} \int_{E} θ (h - z) φ d E + \sum_{E \in ℰ} \int_{E} (K (h - z) \nabla h) \cdot \nabla φ d E \\ - \sum_{F \in ℱ^{in}} \int_{F} 〚 | (K (h - z) \nabla h) \cdot n_{F} | 〛〚 φ 〛 d F - \sum_{F \in ℱ^{D}} \int_{F} (K (h - z) \nabla h) \cdot n_{F} φ d F \\ + \sum_{F \in ℱ^{N}} \int_{F} q_{N} φ d F = 0, & on t \in (0, T) \\ \sum_{E \in ℰ} \int_{E} h φ d E = \sum_{E \in ℰ} \int_{E} h_{0} φ d E, \\ h = h_{D}, & on Γ_{D} \times (0, T) . \end{array}$ (14)

To enforce the continuity of the solution and the Dirichlet boundary condition, two penalty terms are added:

$J_{I} (h, φ) : = \sum_{F \in ℱ^{in}} \frac{1}{2} (\frac{σ_{E}^{i n}}{d_{E}} + \frac{σ_{E_{r}}^{i n}}{d_{E_{r}}}) \int_{F} 〚 h 〛〚 φ 〛 d F$ (15)

$J_{D} (h, φ) : = \sum_{F \in ℱ^{D}} \frac{σ_{E}^{\partial}}{d_{E}} \int_{F} (h - h_{D}) φ d F$ (16)

where, $J_{I}$ represents the penalization terms that constrain the continuity of the solution on the interior of the domain, and, $J_{D}$ for the Dirichlet boundary conditions. $σ_{E}^{i n}$ and $σ_{E}^{\partial}$ are the penalization parameters for the interior and for the Dirichlet boundary condition where, we recall that, $d_{E}$ is the diameter of an element $E$ .

Remark. This method is known as the IIPG method [12] [14]-[16]. The role of these parameters is essential to ensure the convergence of the method and will be studied in Section 4 for the first time, up to our knowledge, in the nonlinear case. The linear case has been dealt in [17].

Using Equation (15) and Equation (16) in Equation (14), the semi-discrete nonlinear weak formulation of Problem () is, $\forall t \in T^{n}$ ,

${\begin{cases} Find h \in V^{p} (ℰ^{n}) such that : \\ m_{n} (θ (h - z), φ) + a_{n} (h, φ; h) = l_{n} (φ), \forall φ \in V^{p} (ℰ^{n}), \end{cases}$ ()

where $m_{n}$ , $a_{n}$ , and $l_{n}$ are given by:

$m_{n} (q, φ) = \sum_{E \in ℰ^{n}} \int_{E} q φ d E$ (17)

$\begin{matrix} a_{n} (h, φ; h) = \sum_{E \in ℰ^{n}} \int_{E} (K (h - z) \nabla h) \cdot \nabla φ d E \\ - \sum_{F \in ℱ^{in}} \int_{F} {| (K (h - z) \nabla h) \cdot n_{F} |} 〚 φ 〛 d F \\ + \sum_{F \in ℱ^{in}} \frac{1}{2} (\frac{σ_{E}^{i n}}{d_{E}} + \frac{σ_{E_{r}}^{i n}}{d_{E_{r}}}) \int_{F} 〚 h 〛〚 φ 〛 d F \\ - \sum_{F \in ℱ^{D}} \int_{F} (K (h - z) \nabla h) \cdot n_{F} φ d F + \sum_{F \in ℱ^{D}} \frac{σ_{E}^{\partial}}{d_{E}} \int_{F} h φ d F \end{matrix}$ (18)

$l_{n} (φ) = \sum_{F \in ℱ^{D}} \frac{σ_{E}^{\partial}}{d_{E}} \int_{F} h_{D} φ d F - \sum_{F \in ℱ^{N}} \int_{F} q_{N} φ d F .$ (19)

3.3. Time Discretization

The aim of this section is to present the time discretisation through the implicit BDF method for Problem (). In the following, we make use of notation: $\forall n \in ℕ$ , $u^{n} (x) : = u (x, t_{n})$ , for any function $u \in L^{2} (Ω \times (0, T))$ . Let us recall that the time step is defined by $Δ t^{n} = t^{n + 1} - t^{n}$ and the time interval by $T^{n} = [t^{n}, t^{n + 1}]$ .

Due to their stability properties, the BDF methods are commonly used to solve stiff differential equations such as Problem (). These linear multi-step methods allow to construct time approximation up to order $q \leq 6$ . The analysis of these methods can be found in [18]. The 1-step BDF method corresponds to the classical backward Euler scheme. BDF methods have been used in [19] [20] up to 6th-order. BDF methods are well-known to balance space and time errors and particularly well-designed in combination with DG methods. BDF methods can be constructed both with a constant time step [18] or a variable [21]. The case of variable time step is more pertinent for Problem () concerned. The method of order $q$ is derived from the Newton interpolation polynomial of degree $q$ , which interpolates $h^{j}$ at time $t^{j}$ for $j = n + 1, \dots, n + 1 - q$ , using the method of divided difference.

The backward divided difference for a given function $y$ is defined by a recursive division process:

${\begin{array}{l} δ^{0} y^{n + 1} = [y^{n + 1}] = y^{n + 1}, \\ δ^{1} y^{n + 1} = [y^{n + 1}, y^{n}] = \frac{δ^{0} y^{n + 1} - δ^{0} y^{n}}{Δ t^{n}} = \frac{y^{n + 1} - y^{n}}{Δ t^{n}}, \\ δ^{2} y^{n + 1} = [y^{n + 1}, y^{n}, y^{n - 1}] = \frac{δ^{1} y^{n + 1} - δ^{1} y^{n}}{Δ t^{n} + Δ t^{n - 1}} = \frac{\frac{y^{n + 1} - y^{n}}{Δ t^{n}} - \frac{y^{n} - y^{n - 1}}{Δ t^{n - 1}}}{Δ t^{n} + Δ t^{n - 1}}, \\ ⋮ \\ δ^{j} y^{n + 1} = [y^{n + 1}, y^{n}, \dots, y^{n + 1 - j}] = \frac{δ^{j - 1} y^{n + 1} - δ^{j - 1} y^{n}}{\sum_{k = 0}^{j - 1} Δ t^{n - k}} . \end{array}$ (20)

For a given ode, for instance $\frac{d u}{d t} = f (u, t)$ with initial condition, the implicit BDF method of order $q$ is given by: $\begin{array}{l} \sum_{j = 1}^{q} (\prod_{k = 1}^{j - 1} (\sum_{l = 0}^{k - 1} Δ t^{n - l})) δ^{j} u^{n + 1} = \sum_{j = 0}^{q} α_{q, j} u^{n + 1 - j} = f (u^{n + 1}, t^{n + 1}), \\ \Leftrightarrow α_{q, 0} u^{n + 1} - f (u^{n + 1}, t^{n + 1}) = - \sum_{j = 0}^{q - 1} α_{q, j + 1} u^{n - j} \end{array}$ where $α_{q, j}$ are the linear combination coefficients obtained from the divided differences of $u$ . For instance, for the 2-order BDF method, the coefficients are:

${\begin{cases} α_{2, 0} = \frac{1}{Δ t^{n}} + \frac{1}{Δ t^{n} + Δ t^{n - 1}}, \\ α_{2, 1} = - \frac{1}{Δ t^{n}} - \frac{1}{Δ t^{n} + Δ t^{n - 1}} - \frac{Δ t^{n}}{Δ t^{n - 1} (Δ t^{n} + Δ t^{n - 1})}, \\ α_{2, 2} = \frac{Δ t^{n}}{Δ t^{n - 1} (Δ t^{n} + Δ t^{n - 1})} . \end{cases}$ (21)

Remark. (Stability.) BDF methods of order 1 and 2 are $A$ -stable, and $L$ -stable [22]. BDF methods of order 3 to 6 are $A (α)$ -stable where $α$ decreases as the order increases [23]. BDF methods of order $q > 6$ are unconditionally unstable. The use of variable time steps is recommended to enhance the stability of the method. In practical applications, variations in time step sizes are limited by an upper bound known as the swing factor to ensure stability and robustness Table 2 (see [24]). In the following, swing factors are used.

Table 2. Maximum swing $Δ t^{n + 2} / Δ t^{n + 1}$ for BDF methods with variable time steps.

Order $q$	1	2	3	4	5	6
Maximum swing $Δ t^{n + 2} / Δ t^{n + 1}$	-	2.6	1.9	1.5	1.2	1.05

Applying the BDF method to Problem (), we get

${\begin{cases} Find as equence of {(h^{n})}_{0 \leq n \leq N} \in V^{p} (ℰ^{n}) such that : \\ m_{n} ({\frac{\partial θ (ψ)}{\partial ψ} |}_{ψ^{n + 1}} \sum_{j = 0}^{q} α_{q, j} h^{n + 1 - j}, φ) + a_{n} (h^{n + 1}, φ; h^{n + 1}) = L_{n} (φ), \forall φ \in V^{p} (ℰ^{n}) . \end{cases}$ ()

where $m_{n}$ , $a_{n}$ and $l_{n}$ are given, respectively, by Equation (17), Equation (18), Equation (19) with $ψ = h - z$ .

The time integration method needs an initialization step to compute the solution for further time steps. The initialization uses the prescribed initial condition to start the first-time step. A direct and simple way is to write the corresponding discontinuous weak formulation:

$Find h^{0} \in V^{p} (ℰ^{0}) such that : m_{0} (h^{0}, φ) = f_{0} (φ),$ (22)

where $m_{0}$ is defined by Equation (17) and $f_{0}$ is the linear form defined by:

$f_{0} (φ) = \sum_{E \in ℰ^{0}} \int_{E} h_{0} φ d E, \forall φ \in V^{p} (ℰ^{0}) .$ (23)

3.4. Nonlinear Iterative Process

Problem () being nonlinear, several iterative methods can be used such as the Newton-Raphson method or the classical first-order fixed-point method Picard’s method. Due to the strong nonlinearities of the constitutive laws Equation (2) and Equation (4) (see also Remark 1), the convergence of the iterative methods may fail [25] [26]. We will see in Section 4 that in the case of IIPG methods one can enhance the convergence of the iterative methods, at least in the case of a Picard’s fixed-point method, whenever the penalization terms Equation (15) and Equation (16) are well-chosen. Therefore, in what follows, we present the Picard’s fixed-point method for Problem ().

Remark. (Choice of the Picard linearization.) Although Newton-Raphson iterations may offer quadratic convergence, we adopted a Picard fixed-point linearization for robustness in strongly nonlinear configurations such as the Vauclin infiltration case. In preliminary tests, Newton iterations often diverged without regularization of the hydraulic functions, whereas the Picard approach provided stable convergence at a moderate computational cost. Similar observations have been reported in [7] [27], highlighting that Picard iterations remain preferable when the Jacobian varies sharply near saturation thresholds.

Linearization of Problem () is done by a Picards’ iterative procedure. For $k = 0, \dots$ , the problem is:

${\begin{cases} For a given h^{n + 1, k} \in V^{p} (ℰ^{n}), find h^{n + 1, k + 1} \in V^{p} (ℰ^{n}) such that, \forall φ \in V^{p} (ℰ^{n}) : \\ m_{n} ({\frac{\partial θ (ψ)}{\partial ψ} |}_{ψ^{n + 1, k}} α_{q, 0} h^{n + 1, k + 1}, φ) + a_{n} (h^{n + 1, k + 1}, φ; h^{n + 1, k}) \\ = l_{n} (φ) - m_{n} ({\frac{\partial θ (ψ)}{\partial ψ} |}_{ψ^{n + 1, k}} \sum_{j = 0}^{q - 1} α_{q, j + 1} h^{n - j}, φ) . \end{cases}$ ()

where $m_{n}$ , $a_{n}$ and $l_{n}$ are given, respectively, by Equation (17), Equation (18), Equation (19) with $ψ = h - z$ . $h^{n - j}$ stands for the solution at the rank $k$ of the iterative process (see Figure 4).

Figure 4. Scheme of the general proof.

The global algorithm of the Picard’s fixed-point iteration, for a positive $n$ , is:

1) Start with an initial guess $h^{n + 1, 0}$ ;

2) Compute the solution of Problem () with $h^{n + 1, 0}$ to get $h^{n + 1, 1}$ ;

3) Start again with $h^{n + 1, 1}$ ;

4) Compute the solution of Problem () with $h^{n + 1, k}$ to get $h^{n + 1, k + 1}$ ;

5) Start again with $h^{n + 1, k + 1}$ until the stopping criteria are satisfied;

6) Set $h^{n + 1} = h^{n + 1, k + 1}$ .

The stopping criterion is one important choice in determining accuracy for a nonlinear iterative process. For RE, the stopping criterion can be specified in terms of absolute error for pressure head or water content between two successive iterations [12]. For this study, we have used: $\frac{‖ r_{n} (h, φ) ‖}{‖ a_{n} (h, φ) ‖} < ε_{1}$ and $\frac{‖ δ_{k} ‖}{‖ h^{k} ‖} < ε_{2}$ , where $δ_{k} = h^{k} - h^{k - 1}$ and $r_{n} (h, φ) = m_{n} (h, φ; h) + a_{n} (h, φ; h) - l_{n} (φ)$ . $ε_{1}$ and $ε_{2}$ are user-defined tolerances. These two criteria are relative and independent of the characteristic quantities of the problem.

3.5. Adaptive Time Stepping

Time adaptation is motivated by the convergence of the nonlinear solver. On one hand, transient simulations have difficulties to converge if the time step is too large but, on the other hand, shorter time steps mean more time steps and so, a longer computational time. That is the reason why time adaptation is very attractive and common for Richards’ equation. Different strategies can be used to adjust the time step [28]-[30], either heuristic and mainly based on convergence performance of the nonlinear solver or rational and based on error control. The latter ones are generally more efficient but heuristic methods remain a relevant approach due to their simplicity.

In this study, the time step is adjusted heuristically based on the number of iterations $N_{i t}$ from the nonlinear solver, as discussed in [29] [31]. The size of the time step directly influences the convergence of the solver. The simulations start with a time step $Δ t^{0}$ , and subsequent time steps are calculated according to the following rule: the time step remains unchanged if convergence is achieved between $m_{i t}$ and $M_{i t}$ nonlinear iterations; it is increased by an amplification factor $λ_{a m p} > 1$ if convergence is achieved in fewer than $m_{i t}$ nonlinear iterations; and it is decreased by a reduction factor $λ_{r e d} < 1$ if convergence requires more than $M_{i t}$ nonlinear iterations. If convergence fails due to solver issues (poor initial guess, bad condition number) or exceeds a prescribed maximum bound $W_{i t}$ , the time step is recalculated using a reduced step size ( $λ_{r e d} < 1$ ). The calculation of the next time step $Δ t^{n + 1}$ from the previous one $Δ t^{n}$ follows this time-stepping scheme:

${\begin{cases} Δ t^{n + 1} = {\begin{array}{l} λ_{a m p} Δ t^{n} & if N_{i t} \leq m_{i t}, \\ Δ t^{n} & if m_{i t} < N_{i t} \leq M_{i t}, \\ λ_{r e d} Δ t^{n} & if M_{i t} < N_{i t} \leq W_{i t}, \end{array} \\ Δ t^{n} = λ_{r e d} Δ t^{n} if N_{i t} > W_{i t} or if the solver has failed (time step is started again), \end{cases}$ (24)

where $N_{i t}$ is the number of nonlinear iterations.

Remark. By studying the full-time-dependent problem, as done in Section 4 in the case of the steady problem, the time step can be adjusted automatically and this work is in progress.

Remark. In the numerical code RIVAGE, Adaptive Mesh Refinement can be also employed. We refer to [7] [12] [32]-[34] for more details.

4. Theoretical Study and Estimation of the Optimal Penalization Parameters

In this section, we present the main result of this work, namely, the way to get a convergent iterative scheme by constructing a robust method to compute automatically the penalization parameters (see Equation (15) and Equation (16)). This is achieved by studying the theoretical properties and convergence of the solution of the discrete problem Problem () to the mathematical problem Problem (). To this end and for the sake of simplicity, we will consider a toy model similar to the stationary RE for which we study, as depicted in Section 4.

1) The existence and uniqueness of the weak solution to the nonlinear problem in Section 4.2.

2) The existence and uniqueness of the weak solution to the discrete linearized problem in Section 4.3.

3) The method to compute optimal penalization parameters to ensure the convergence of the nonlinear solver at the discrete level in Section 4.4.

4) The convergence of the discrete linearized weak problem to the continuous linearized weak problem in Section 4.5.

Proofs of this section are given in Appendix and can be easily extended to several space dimensions. However, since the computations are rather technical to get the optimal penalization parameters in the two-dimensional case, for the sake of completeness, the 2D case for the existence and uniqueness of the weak solution to the discrete linearized problem is considered in Section 4.3. We will see that the construction of the optimal penalization parameters is essentially based on the constants appearing in the discrete continuity and the discrete coercivity of the operator.

4.1. Toy Model

Let us consider the following toy problem () on the interval $Ω = [a, b] \subset ℝ$ :

$\begin{array}{l} For a given f \in L^{2} (Ω), find u (x) : Ω \to ℝ such that : \\ {\begin{array}{l} - {(A (x, u, u^{'}))}^{'} = f, & in Ω \\ u = 0, & on \partial Ω \end{array} \end{array}$ ()

with $A (x, s, ξ) = K (x, s) ξ$ where the real function $K$ intends to mimick the properties of $K$ (Equation (2)). Following [35] and in view of the properties of $K$ (Equation (2)), assuming that

${\begin{array}{l} \exists K_{0}, K_{1} \in ℝ_{+}^{*}, & K_{0} \leq K (x, \bar{u}) \leq K_{1}, & \forall x \in Ω, \forall \bar{u} \in ℝ \\ \exists K_{l i p} \in ℝ_{+}, & | K (x, {\bar{u}}_{1}) - K (x, {\bar{u}}_{2}) | \leq K_{l i p} | {\bar{u}}_{1} - {\bar{u}}_{2} |, & \forall x \in Ω, \forall ({\bar{u}}_{1}, {\bar{u}}_{2}) \in ℝ^{2} \end{array},$ ( $ℋ 1$ )

we deduce that $A$ is straightforwardly a Carathéodory function, which we recall hereafter,

$\begin{array}{l} (1) \exists α > 0 & s . t . & (A (x, s, ξ) - A (x, s, 0)) ξ \geq α {| ξ |}^{2}, \\ (2) \exists β > 0, \exists h \in L^{2} (Ω) & s . t . & | A (x, s, ξ) | \leq β (h (x) + | s | + | ξ |), \\ (3) \exists γ > 0 & s . t . & (A (x, s, ξ) - A (x, s, η)) (ξ - η) \geq γ | ξ - η |^{2}, \\ (4) \exists δ > 0, \exists h \in L^{2} (Ω) & s . t . & | A (x, s, ξ) - A (x, t, ξ) | \leq δ | s - t | (h (x) + | ξ | + | s | + | t |) . \end{array}$ ( $ℋ 2$ )

This problem can be cast into the weak formulation by multiplying by a test function $v \in H_{0}^{1} (Ω)$ and integrating over Ω:

$Find u \in H_{0}^{1} (Ω) such that : a (u, v) = l (v), \forall v \in H_{0}^{1} (Ω)$ ( $W$ )

where

$a (u, v) = \int_{Ω} K (x, u) u^{'} v^{'} d x, l (v) = \int_{Ω} f v d x .$

Problem () being nonlinear, we use the Picard’s iterations method as in Problem () to get

${\begin{cases} For a given \bar{u} \in L^{2} (Ω), find u \in H_{0}^{1} (Ω) such that : \\ \tilde{a} (u, v; \bar{u}) = l (v), \forall v \in H_{0}^{1} (Ω) \end{cases}$ ( $\tilde{W}$ )

with

$\tilde{a} (u, v; \bar{u}) = \int_{Ω} K (x, \bar{u}) u^{'} v^{'} d x .$ (25)

Given ${\bar{u}}^{0}$ , we solve the Problem ( $\tilde{W}$ ) with $\bar{u} = {\bar{u}}^{0}$ to obtain $u^{1}$ . Then, we solve the Problem ( $\tilde{W}$ ) with $\bar{u} = {\bar{u}}^{1}$ to obtain $u^{2}$ and so on. The sequence of solutions of the linearized problem is denoted by ${(u^{n})}_{n \in ℕ}$ and its limit when $n$ goes to infinity is expected to be the solution to the nonlinear Problem ( $W$ ). In the following, we note $u^{n + 1} = T (u^{n})$ the fixed point.

4.2. Existence and Uniqueness of the Weak Solution to the Nonlinear Problem ( $W$ )

The first step is to show that Problem ( $W$ ) has a unique solution in $H_{0}^{1} (Ω)$ . The existence of solution of Problem ( $W$ ) can be achieved by using the Schauder fixed-point theorem to the operator $T$ while the uniqueness can be obtained through the technique proposed in [35].

Thus, we have

Lemma 1. (Existence of a solution to Problem ( $W$ ).) Under Hypothesis ( $ℋ 1$ ), $\exists u \in H_{0}^{1} (Ω)$ ; $T (u) = u$ .

Then, one can obtain uniqueness through the following result

Lemma 2. (Uniqueness of the solution to Problem ( $W$ ).) Under Hypothesis ( $ℋ 1$ ), the solution $u \in H_{0}^{1} (Ω)$ of Problem ( $W$ ) is unique.

These results hold for the dimension $d \leq 3$ and the proofs are rather classical and left to the reader.

4.3. Existence and Uniqueness of the Weak Solution to the Discrete Linearized Problem ( $\tilde{W}$ )

One-dimensional case

To solve numerically Problem ( $\tilde{W}$ ), we use DG methods as in Section 3. Let $0 = x_{0} < \dots < x_{N} = 1$ be a partition $ℰ_{h}$ of Ω (see Figure 5) and denote $I_{n} = [x_{n}, x_{n + 1}]$ a sub-interval. The size of a sub-interval is defined as $| I_{n} | : = h = \frac{1}{N}$ , $\forall n \in {0, \dots, N - 1}$ with $N$ —the number of elements in the partition. The solution is sought in the DG space $V_{0}^{p} (ℰ_{h})$ defined as:

$V_{0}^{p} (ℰ_{h}) = {v \in L^{2} (Ω) | {v |}_{\partial Ω} = 0; {v |}_{I_{n}} \in ℙ^{p} (I_{n}), \forall I_{n} \in ℰ_{h}} \subseteq L^{2} (Ω)$ (26)

Figure 5. Representation of $ℰ_{h}$ in the one-dimension case.

As in Section 3, we define

$v (x_{n}^{+}) = \lim_{\begin{matrix} ϵ \to 0 \\ ϵ > 0 \end{matrix}} v (x_{n} + ϵ), v (x_{n}^{-}) = \lim_{\begin{matrix} ϵ \to 0 \\ ϵ > 0 \end{matrix}} v (x_{n} - ϵ),$ (27)

${〚 v 〛}_{x_{n}} = v (x_{n}^{-}) - v (x_{n}^{+}), {| v |}_{x_{n}} = \frac{1}{2} (v (x_{n}^{-}) + v (x_{n}^{+})), \forall n \in {1, \dots, N - 1},$ (28)

and

${〚 v 〛}_{x_{0}} = - v (x_{0}^{+}), {| v |}_{x_{0}} = v (x_{0}^{+}), {〚 v 〛}_{x_{N}} = v (x_{N}^{-}), {| v |}_{x_{N}} = v (x_{N}^{-}) .$ (29)

The DG space $V_{0}^{p} (ℰ_{h})$ is associated with the norm:

${‖ v ‖}^{2} = \sum_{n = 0}^{N - 1} {‖ v^{'} ‖}_{I_{n}}^{2} + \sum_{n = 0}^{N} \frac{1}{h} {〚 v 〛}_{x_{n}}^{2} = \sum_{n = 0}^{N - 1} {‖ v^{'} ‖}_{I_{n}}^{2} + {| v |}_{J}^{2}$ (30)

where ${‖ \cdot ‖}_{I_{n}}$ is the usual norm $L^{2} (I_{n})$ and ${| v |}_{J}^{2} : = \sum_{n = 0}^{N} \frac{1}{h} {〚 v 〛}_{x_{n}}^{2}$ is the jump semi-norm. With this definition of the norm, jumps are controlled. One can observe that $‖ \cdot ‖$ is a norm on $V_{0}^{p} (ℰ_{h})$ . One can note that $V_{0}^{p} (ℰ_{h})$ is a complete Banach space, i.e., a complete normed vector space for $‖ \cdot ‖$ . Lastly, the concept of broken gradient is introduced to specify when only the regular part of the gradient is considered. The broken gradient $\nabla_{h} : V_{0}^{p} (ℰ_{h}) \to L^{2} (Ω)$ is defined that, for all $v \in V_{0}^{p} (ℰ_{h})$ ,

$\forall E \in ℰ_{h}_{h}, {(\nabla_{h} v) |}_{E} : = \nabla ({v |}_{E}) .$ (31)

The linearized weak formulation Problem ( $\tilde{W}$ ) can be discretized using the IIPG formulation as in Section 3 to get

${\begin{cases} For a given \bar{u} \in V_{0}^{p} (ℰ_{h}), find u_{h} \in V_{0}^{p} (ℰ_{h}) such that : \\ {\tilde{a}}_{h} (u_{h}, v_{h}; \bar{u}) = l_{h} (v_{h}), \forall v_{h} \in V_{0}^{p} (ℰ_{h}) \end{cases}$ ( ${\tilde{W}}_{h}$ )

with

$\begin{matrix} {\tilde{a}}_{h} (u_{h}, v_{h}, \bar{u}) = \sum_{n = 0}^{N - 1} \int_{I_{n}} K (x, \bar{u}) {u^{'}}_{h} {v^{'}}_{h} d x - \sum_{n = 0}^{N} {| K (x, \bar{u}) u^{'} |}_{h}_{x_{n}} {〚 v_{h} 〛}_{x_{n}} \\ + \frac{σ_{0}}{h} {〚 u_{h} 〛}_{x_{0}} {〚 v_{h} 〛}_{x_{0}} + \sum_{n = 1}^{N - 1} \frac{σ_{n - 1} + σ_{n}}{2 h} {〚 u_{h} 〛}_{x_{n}} {〚 v_{h} 〛}_{x_{n}} \\ + \frac{σ_{N}}{h} {〚 u_{h} 〛}_{x_{N}} {〚 v_{h} 〛}_{x_{N}}, \end{matrix}$ (32)

$l_{h} (v_{h}) = \int_{Ω} f v_{h} d x .$

At the discrete level, one can write Hypothesis ( $ℋ 1$ ) as follows: for all $n \in {0, \dots, N - 1}$ :

${\begin{array}{l} \exists K_{0}^{(n)}, K_{1}^{(n)} \in ℝ_{+}^{*}, \forall x \in I_{n}, \forall \bar{u} \in ℝ, & K_{0}^{(n)} \leq K (x, \bar{u}) \leq K_{1}^{(n)}; \\ \exists K_{l i p}^{(n)} \in ℝ_{+}, \forall x \in I_{n}, \forall ({\bar{u}}_{1}, {\bar{u}}_{2}) \in ℝ^{2}, & | K (x, {\bar{u}}_{1}) - K (x, {\bar{u}}_{2}) | \leq K_{l i p} | {\bar{u}}_{1} - {\bar{u}}_{2} | \end{array}$ ( $ℋ_{n}$ )

where

$K_{1} : = \min_{\begin{matrix} n = 0, \dots, N - 1 \end{matrix}} K_{1}^{(n)}, K_{0} : = \max_{\begin{matrix} n = 0, \dots, N - 1 \end{matrix}} K_{0}^{(n)} and K_{l i p} = \max_{\begin{matrix} n = 0, \dots, N - 1 \end{matrix}} K_{l i p}^{(n)} .$ (33)

Existence and unicity for the solution to Problem ( ${\tilde{W}}_{h}$ ) is obtained using the Lax-Milgram theorem. We have the following result.

Theorem 3. (Existence and uniqueness of the weak solution to the discrete linearized Problem ( ${\tilde{W}}_{h}$ ).) Under Hypothesis ( $ℋ_{n}$ ) for all $n$ , for a given $\bar{u} \in V_{0}^{p} (ℰ_{h})$ , then $\exists! u \in V_{0}^{p} (ℰ_{h})$ such that ${\tilde{a}}_{h} (u_{h}, v_{h}; \bar{u}) = l_{h} (v_{h})$ , $\forall v_{h} \in V_{0}^{p} (ℰ_{h})$ .

This existence and uniqueness result is obtained thanks to the below-following lemmas.

Lemma 4. (Discrete coercivity of ${\tilde{a}}_{h}$ .) Under Hypothesis ( $ℋ_{n}$ ) for all $n$ , for any vector of positive numbers $ϵ = {(ε^{(n)})}_{n = 0, \dots, N - 1}$ , there exists a constant $C^{*} (ϵ) > 0$ such that

$\forall u_{h} \in V_{0}^{p} (ℰ_{h}), {\tilde{a}}_{h} (u_{h}, u_{h}; \bar{u}) \geq C^{*} (ϵ) {‖ u_{h} ‖}^{2}$

${\begin{array}{l} ε^{(n)} < 2, & \forall n \in {0, \dots, N - 1} \\ σ_{n} > σ_{n}^{*}, & \forall n \in {1, \dots, N - 1} \\ σ_{0} > σ_{0}^{*} \\ σ_{N} > σ_{N}^{*} \end{array} with {\begin{array}{l} σ_{n}^{*} = \frac{{(K_{1}^{(n)} C_{tr, p - 1}^{(n)})}^{2}}{2 ε^{(n)} K_{0}^{(n)}}, \forall n \in {1, \dots, N - 1} \\ σ_{0}^{*} = \frac{{(K_{1}^{(0)} C_{tr, p - 1}^{(0)})}^{2}}{ε^{(0)} K_{0}^{(0)}} \\ σ_{N}^{*} = \frac{{(K_{1}^{(N - 1)} C_{tr, p - 1}^{(N - 1)})}^{2}}{ε^{(N - 1)} K_{0}^{(N - 1)}} \end{array}$ (34)

and

$C^{*} (ϵ) = min {min_{\begin{matrix} n = 0, \dots, N - 1 \end{matrix}} (K_{0}^{(n)} (1 - \frac{ε^{(n)}}{2})), σ_{0} - σ_{0}^{*}, σ_{N} - σ_{N}^{*}, min_{\begin{matrix} n = 0, \dots, N - 1 \end{matrix}} (\frac{σ_{n} - σ_{n}^{*}}{2})} .$ (35)

Lemma 5. (Discrete continuity of ${\tilde{a}}_{h}$ .) Under Hypothesis ( $ℋ_{n}$ ) for all $n$ , for any vector of positive numbers $ϵ = {(ε^{(n)})}_{n = 0, \dots, N - 1}$ , there exists a constant $\tilde{C} (ϵ) > 0$ such that

$\forall u_{h}, v_{h} \in V_{0}^{p} (ℰ_{h}), | {\tilde{a}}_{h} (u_{h}, v_{h}; \bar{u}) | \leq \tilde{C} (ϵ) ‖ u_{h} ‖ ‖ v_{h} ‖$

where

$\begin{matrix} \tilde{C} (ϵ) = max_{\begin{matrix} n = 0, \dots, N - 1 \end{matrix}} (K_{1}^{(n)}) + \sqrt{max_{\begin{matrix} n = 0, \dots, N - 1 \end{matrix}} (2 ε^{(n)} K_{1}^{(n)}) max (σ_{0}^{*}, σ_{N}^{*}, max_{\begin{matrix} n = 1, \dots, N - 1 \end{matrix}} (\frac{σ_{n}^{*}}{2}))} \\ + \max (σ_{0}, σ_{N}, max_{\begin{matrix} n = 1, \dots, N - 1 \end{matrix}} (\frac{σ_{n}}{2})) . \end{matrix}$ (36)

Lemma 6. (Discrete continuity of $l_{h}$ .) There exists a constant $B > 0$ such that $\forall v_{h} \in V_{0}^{p} (ℰ_{h})$ , $‖ l_{h} (v_{h}) ‖ \leq B ‖ v_{h} ‖$ .

Remark. Trace constantly involved in bounds for penalization parameters are a function of the polynomial degree $p$ , the type of polynomial basis used. In the one-dimensional case, with an orthonormal basis and for $u \in V_{0}^{p} (ℰ_{h})$ , the trace constant for $I_{n}$ is given by:

$C_{tr, p}^{(n)} : = p + 1.$ (37)

Proofs of Lemmas 4, 5, and 6 can be found in Appendix. The proof of Theorem 3 is a straightforward application of the Lax-Milgram theorem and is left to the reader.

Two-dimensional case

We propose to extend the previous results to the dimension 2. Let us consider the two-dimensional extension of Problem ( ${\tilde{W}}_{h}$ )

${\begin{cases} For a given \bar{u} \in V_{0}^{p} (ℰ_{h}), find u_{h} \in V_{0}^{p} (ℰ_{h}) such that, \forall v_{h} \in V_{0}^{p} (ℰ_{h}) : \\ {\tilde{a}}_{h} (u_{h}, v_{h}; \bar{u}) = l_{h} (v_{h}) . \end{cases}$ ( ${\tilde{W}}_{h}^{2}$ )

where

$\begin{matrix} \tilde{a} (u_{h}, v_{h}; \bar{u}) = \sum_{E \in ℰ^{n}} \int_{E} (K (x, \bar{u}) \nabla u_{h}) \cdot \nabla v_{h} d E - \sum_{E \in ℱ} \int_{F} {| (K (x, \bar{u}) \nabla u_{h}) \cdot n_{F} |} 〚 v_{h} 〛 d F \\ + \sum_{E \in ℱ^{in}} \frac{1}{2} (\frac{σ_{E}^{i n}}{d_{E}} + \frac{σ_{E_{r}}^{i n}}{d_{E_{r}}}) \int_{F} 〚 u_{h} 〛〚 v_{h} 〛 d F + \sum_{E \in ℱ^{D}} \frac{σ_{E}^{\partial}}{d_{E}} \int_{F} u_{h} v_{h} d F \end{matrix}$

$l_{h} (v_{h}) = \int_{Ω} f v_{h} d x .$

The two-dimensional version of the discrete hypothesis on $K$ is given by: For all $E \in ℰ$ :

${\exists K_{0}^{E}, K_{1}^{E} \in ℝ_{+}^{*}, \forall x \in E, \forall \bar{u} \in ℝ, K_{0}^{E} \leq {‖ K (x, \bar{u}) ‖}_{2} \leq K_{1}^{E};$ ( $ℋ_{E}^{2}$ )

with ${‖ K ‖}_{2} = \max_{i = 1, 2} (K_{i i})$ . In addition, $K_{1} = \max_{E \in ℰ} K_{1}^{E}$ and $K_{0} = \min_{E \in ℰ} K_{0}^{E}$ denotes global bound of $K$ .

The DG space is associated with the following norm:

${‖ v ‖}^{2} : = \sum_{E \in ℰ} {‖ v ‖}_{E}^{2} + \sum_{E \in ℱ^{in}} (\frac{1}{d_{E}} + \frac{1}{d_{E_{r}}}) {‖ 〚 v 〛 ‖}_{F}^{2} + \sum_{E \in ℱ^{\partial}} \frac{1}{d_{E}} {‖ 〚 v 〛 ‖}_{F}^{2} = \sum_{E \in ℰ} {‖ v ‖}_{E}^{2} + {| v |}_{J}^{2}$ (38)

where ${‖ v ‖}_{E}^{2}$ is the usual $L^{2}$ norm on $E$ , ${‖ v ‖}_{F}^{2}$ is the $L^{2}$ norm on $F$ and ${| v |}_{J}^{2}$ is the jump semi-norm. This norm has the same characteristics as in the one-dimensional case. We obtain the following result.

Theorem 7. (Existence and uniqueness of the weak solution to the discrete linearized Problem ( ${\tilde{W}}_{h}$ ).) If $K$ satisfies Hypothesis ( $ℋ_{E}^{2}$ ) for all $E \in ℰ$ and for a given $\bar{u} \in V_{0}^{p} (ℰ_{h})$ , then $\exists! u \in V_{0}^{p} (ℰ_{h})$ such that ${\tilde{a}}_{h} (u_{h}, v_{h}; \bar{u}) = l_{h} (v_{h})$ , $\forall v_{h} \in V_{0}^{p} (ℰ_{h})$ .

As before, this result is a consequence of the Lax-Milgram theorem through the following lemmas:

Lemma 8. (Discrete coercivity of ${\tilde{a}}_{h}$ .) If $K$ satisfies Hypothesis ( $ℋ_{E}^{2}$ ) for all $E \in ℰ$ and for any vector of positive numbers $ϵ = {(ε^{E})}_{E \in ℰ}$ , there exists a constant $C^{*} (ϵ) > 0$ such that

$\forall u_{h} \in V_{0}^{p} (ℰ_{h}), {\tilde{a}}_{h} (u_{h}, u_{h}; \bar{u}) \geq C^{*} (ϵ) {‖ u_{h} ‖}^{2}$

${\begin{array}{l} ε^{E} < 2, & \forall E \in ℰ \\ σ_{E}^{i n} > σ_{E}^{i n, *} and σ_{E_{r}}^{i n} > σ_{E_{r}}^{i n, *}, & \forall F \in ℱ^{in} \\ σ_{E}^{\partial} > σ_{E}^{\partial, *}, & \forall F \in ℱ^{\partial} \end{array} with \forall E \in ℰ {\begin{cases} σ_{E}^{i n, *} = \frac{D^{E} {(K_{1}^{E} C_{tr, p - 1}^{E})}^{2}}{4 ε^{E} K_{0}^{E}} \\ σ_{E}^{\partial, *} = \frac{D^{E} {(K_{1}^{E} C_{tr, p - 1}^{E})}^{2}}{2 ε^{E} K_{0}^{E}} \end{cases}$ (39)

and $D^{E}$ is the number of edges of the element $E$ . Moreover

$C^{*} (ϵ) = min {min_{E \in ℰ} (K_{0}^{E} (1 - \frac{ε^{E}}{2})), min_{E \in ℰ} (\frac{σ_{E}^{i n} - σ_{E}^{i n, *}}{2}), min_{F \in ℱ^{\partial}} (σ_{E}^{\partial} - σ_{E}^{\partial, *})} .$ (40)

Lemma 9. (Discrete continuity of ${\tilde{a}}_{h}$ .) If $K$ satisfies Hypothesis ( $ℋ_{E}^{2}$ ) for all $E \in ℰ$ and for any vector of positive numbers $ϵ = {(ε^{E})}_{E \in ℰ}$ , there exists a constant $\tilde{C} (ϵ) > 0$ such that

$\forall u_{h}, v_{h} \in V_{0}^{p} (ℰ_{h}), | {\tilde{a}}_{h} (u_{h}, v_{h}; \bar{u}) | \leq \tilde{C} (ϵ) ‖ u_{h} ‖ ‖ v_{h} ‖$

where

$\begin{matrix} \tilde{C} (ϵ) = max_{E \in ℰ} K_{1}^{E} + max {max_{E \in ℰ} (\frac{σ_{E}^{i n}}{2}), max_{F \in ℱ^{\partial}} (σ_{E}^{\partial})} \\ + \sqrt{2 max_{E \in ℰ} ε^{E} K_{1}^{E} max {max_{E \in ℰ} (\frac{σ_{E}^{i n, *}}{2}), max_{F \in ℱ^{\partial}} (σ_{E}^{\partial, *})}} . \end{matrix}$ (41)

Lemma 6 still holds in the two-dimensional case and is left to the reader. Proofs of Lemmas 7, 8 and 9 are similar to proofs in the one-dimensional case. The main difference is in the expression of trace constants. In two dimensions, they are linked to the element’s shape. For an orthonormal basis and for $u \in V_{0}^{p} (ℰ_{h})$ , the trace constant of $E \in ℰ$ is given by:

$C_{tr, p}^{E} = {\begin{array}{l} \sqrt{\frac{(p + 1) (p + 2)}{2}}, & if E is a triangle, \\ \frac{p + 1}{2}, & if E is a quadrilateral . \end{array}$ (42)

4.4. Optimal Penalization Parameters

Thanks to the previous results on the discrete linearized problem Problem ( ${\tilde{W}}_{h}$ ), one can now construct a method to set automatically penalization parameters. They must be chosen to ensure the coercivity and continuity of the linearized discrete problem, i.e., $C^{*} (ϵ) > 0$ and $\tilde{C} (ϵ) > 0$ . Moreover, using Céa’s Lemma 10, they are set to minimize the distance between the weak and discrete solutions.

Lemma 10. (Céa’s lemma). Let $V$ be a real Hilbert space with the norm $‖ \cdot ‖$ . Let $a : V \times V \to ℝ$ be a bilinear form and $l : V \to ℝ$ a linear form satisfying the Lax-Milgram theorem. Let $V_{h}$ be a closed subspace of $V$ . Then, there exists a unique $u_{h} \in V_{h}$ such that

$\forall v_{h} \in V_{h}, a (u_{h}, v_{h}) = l (v_{h}) and ‖ u - u_{h} ‖ \leq \frac{\tilde{C}}{C^{*}} ‖ u - v ‖, \forall v \in V_{h}$ (43)

where $\tilde{C}$ is the continuity constant and $C^{*}$ the coercivity constant.

Firstly, as a reminder, positivity of continuity and coercivity constants enforce that for all $n \in {0, \dots, N - 1}$ , $ε^{(n)} < 2$ and $\forall n \in {0, \dots, N}$ , $σ_{n} > σ_{n}^{*}$ . They are given by:

$C^{*} (ϵ) = min {min_{\begin{matrix} n = 0, \dots, N - 1 \end{matrix}} (K_{0}^{(n)} (1 - \frac{ε^{(n)}}{2})), σ_{0} - σ_{0}^{*}, σ_{N} - σ_{N}^{*}, min_{\begin{matrix} n = 1, \dots, N - 1 \end{matrix}} (\frac{σ_{n} - σ_{n}^{*}}{2})},$ (44)

and

$\begin{matrix} \tilde{C} (ϵ) = max_{\begin{matrix} n = 0, \dots, N - 1 \end{matrix}} (K_{1}^{(n)}) + \sqrt{max_{\begin{matrix} n = 0, \dots, N - 1 \end{matrix}} (2 ε^{(n)} K_{1}^{(n)})} \sqrt{max (σ_{0}^{*}, σ_{N}^{*}, max_{\begin{matrix} n = 1, \dots, N - 1 \end{matrix}} (\frac{σ_{n}^{*}}{2}))} \\ + max (σ_{0}, σ_{N}, max_{\begin{matrix} n = 1, \dots, N - 1 \end{matrix}} (\frac{σ_{n}}{2})) . \end{matrix}$ (45)

For the sake of simplicity, let us consider that the variable $ε$ is the same for every element: $\forall n \in {0, \dots, N - 1}$ , $ε^{(n)} = ε < 2$ , and in addition, because penalization parameters are bounded below, let us consider that they are above the lower bound of an amount $α$ constant for every element:

$\forall α > 1, \forall n \in {1, \dots, N - 1}, σ_{n} = \frac{α}{2 ε} {\tilde{σ}}_{n}^{*}, σ_{0} = \frac{α}{ε} {\tilde{σ}}_{0}^{*}, σ_{N} = \frac{α}{ε} {\tilde{σ}}_{N}^{*} with {\tilde{σ}}_{n}^{*} = \frac{{(K_{1}^{(n)} C_{tr, p - 1}^{(n)})}^{2}}{K_{0}^{(n)}} .$ (46)

Using previous assumptions, it can be noticed that $C^{*}$ and $\tilde{C}$ are functions of $ε$ and $α$ and can be rewritten:

$C^{*} (α, ε) = min {K_{0} (1 - \frac{ε}{2}), \frac{α - 1}{ε} {\tilde{σ}}_{0}, \frac{α - 1}{ε} {\tilde{σ}}_{N}, min_{\begin{matrix} n = 1, \dots, N - 1 \end{matrix}} (\frac{α - 1}{ε} \frac{{\tilde{σ}}_{n}}{4})}$ (47)

and

$\begin{matrix} \tilde{C} (ϵ) = K_{1} + \sqrt{2 ε K_{1}} \sqrt{\frac{1}{ε} max ({\tilde{σ}}_{0}^{*}, {\tilde{σ}}_{N}^{*}, max_{\begin{matrix} n = 1, \dots, N - 1 \end{matrix}} (\frac{{\tilde{σ}}_{n}^{*}}{4}))} \\ + \frac{α}{ε} max ({\tilde{σ}}_{0}, {\tilde{σ}}_{N}, max_{\begin{matrix} n = 1, \dots, N - 1 \end{matrix}} (\frac{{\tilde{σ}}_{n}}{4})) . \end{matrix}$ (48)

One can see that two quantities are involved in the two previous definitions:

${\tilde{σ}}_{\min} = \min {{\tilde{σ}}_{0}, {\tilde{σ}}_{N}, \min_{\begin{matrix} n = 1, \dots, N - 1 \end{matrix}} (\frac{{\tilde{σ}}_{n}}{4})} and {\tilde{σ}}_{\max} = \max {{\tilde{σ}}_{0}, {\tilde{σ}}_{N}, \max_{\begin{matrix} n = 1, \dots, N - 1 \end{matrix}} (\frac{{\tilde{σ}}_{n}}{4})}$ (49)

to have the final write:

$C^{*} (α, ε) = min {K_{0} (1 - \frac{ε}{2}), \frac{α - 1}{ε} {\tilde{σ}}_{\min}} and \tilde{C} (α, ε) = K_{1} + \sqrt{2 K_{1} {\tilde{σ}}_{\max}} + \frac{α}{ε} {\tilde{σ}}_{\max}$ (50)

These new expressions of $C^{*}$ and $\tilde{C}$ show that $C^{*}$ has two different states and $\tilde{C}$ is continuous concerning $α$ and $ε$ . The aim of this section can now be reformulated as find $α$ and $ε$ such that $γ (α, ε) = \frac{\tilde{C} (α, ε)}{C^{*} (α, ε)}$ is minimal. First, $C^{*}$ and $\tilde{C}$ are studied separately, then $γ$ is observed. $C^{*}$ has two different states, is continuous and well defined for all $(α, ε) \in (1, + \infty) \times (0, 2)$ . It can be rewritten as follows:

$\forall (α, ε) \in (1, + \infty) \times (0, 2),$ (51)

$C^{*} (α, ε) = {\begin{array}{l} \frac{α - 1}{ε} {\tilde{σ}}_{\min}, & if α \leq α^{*} (ϵ) \\ K_{0} (1 - \frac{ε}{2}), & otherwise \end{array} with α^{*} (ϵ) = \frac{K_{0}}{2 {\tilde{σ}}_{\min}} ε (2 - ε) + 1.$ (52)

where $\tilde{C}$ is continuous and well defined for all $(α, ε) \in (1, + \infty) \times (0, 2)$ . $C^{*}$ and $\tilde{C}$ are now explicitly characterized and now $γ (α, ε) = \frac{\tilde{C} (α, ε)}{C^{*} (α, ε)}$ can be studied. $(α_{o p t}, ε_{o p t})$ are looked for such that $γ$ is minimal and it is given by:

$\forall (α, ε) \in (1, + \infty) \times (0, 2),$ (53)

$γ (α, ε) = {\begin{array}{l} \frac{ε}{{\tilde{σ}}_{\min} (α - 1)} (K_{1} + \sqrt{2 K_{1} {\tilde{σ}}_{\max}}) + \frac{α}{α - 1} \frac{{\tilde{σ}}_{\max}}{{\tilde{σ}}_{\min}}, & if α \leq α^{*} (ϵ) \\ \frac{2}{K_{0} (2 - ε)} (K_{1} + \sqrt{2 K_{1} {\tilde{σ}}_{\max}} + \frac{α}{ε} {\tilde{σ}}_{\max}), & otherwise \end{array}$ (54)

where $γ$ is studied on its different open subdomains and the boundary between them. On $D_{1}$ , for all $(α, ε) \in (α^{*} (ε), + \infty) \times (0, 2)$ , it gives:

$γ (α, ε) = a \frac{1}{2 - ε} + b \frac{α}{ε (2 - ε)} with a = 2 \frac{K_{1} + \sqrt{2 K_{1} {\tilde{σ}}_{\max}}}{K_{0}} and b = 2 \frac{{\tilde{σ}}_{\max}}{K_{0}} .$ (55)

Then, looking at its variations, it gives that:

$\partial_{ε} γ (α, ε) {\begin{array}{l} < 0, & if 0 < ε < ε^{*} \\ = 0, & if ε = ε^{*} \\ > 0, & if ε^{*} < ε < 2 \end{array} and \partial_{α} γ (α, ε) = \frac{b}{ε (2 - ε)} > 0$ (56)

with $ε^{*} = \frac{\sqrt{b (2 a + b)} - b}{a} > 0$ . And finally noting that $γ \to + \infty$ when $α \to + \infty$ and when $ε \to 0$ or $ε \to 2$ it gives that $γ$ is minimal for $ε = ε^{*}$ and $α \to α^{*} (ε^{*})$ .

On $D_{2}$ , for all $(α, ε) \in (1, α^{*} (ε)) \times (0, 2)$ it gives:

$γ (α, ε) = \frac{ε}{{\tilde{σ}}_{\min} (α - 1)} (K_{1} + \sqrt{2 K_{1} {\tilde{σ}}_{\max}}) + \frac{α}{α - 1} \frac{{\tilde{σ}}_{\max}}{{\tilde{σ}}_{\min}} .$ (57)

Then, looking at its variations, it gives that:

$\begin{array}{l} \partial_{ε} γ (α, ε) = \frac{K_{1} + \sqrt{2 K_{1} {\tilde{σ}}_{\max}}}{{\tilde{σ}}_{\min} (α - 1)} > 0 and \\ \partial_{α} γ (α, ε) = - \frac{1}{{(α - 1)}^{2}} (\frac{ε}{{\tilde{σ}}_{\min}} (K_{1} + \sqrt{2 K_{1} {\tilde{σ}}_{\max}}) + \frac{{\tilde{σ}}_{\max}}{{\tilde{σ}}_{\min}}) < 0. \end{array}$ (58)

And finally, noting that $γ \to + \infty$ when $α \to 1$ it gives that $γ$ is minimal for $α \to α^{*} (ε)$ . On the boundary between $D_{1}$ and $D_{2}$ , for all $α = α^{*} (ε)$ and $ε \in (0, 2)$ it gives:

$\begin{array}{l} γ (α^{*} (ε), ε) = a \frac{1}{2 - ε} + b \frac{1}{ε (2 - ε)} + c with a = 2 \frac{K_{1} + \sqrt{2 K_{1} {\tilde{σ}}_{\max}}}{K_{0}}, \\ b = 2 \frac{{\tilde{σ}}_{\max}}{K_{0}} and c = \frac{{\tilde{σ}}_{\max}}{{\tilde{σ}}_{\min}} . \end{array}$ (59)

Then, looking at its variations, it gives that:

$\partial_{ε} γ (α^{*} (ε), ε) {\begin{array}{l} < 0, & if 0 < ε < ε_{o p t} \\ = 0, & if ε = ε_{o p t} \\ > 0, & if ε_{o p t} < ε < 2 \end{array} with ε_{o p t} = \frac{\sqrt{b (2 a + b)} - b}{a} > 0.$ (60)

The expression of $(α_{o p t}, ε_{o p t})$ can be summarized as follows:

$\begin{array}{l} ε_{o p t} = \frac{\sqrt{b (2 a + b)} - b}{a} with a = 2 \frac{K_{1} + \sqrt{2 K_{1} {\tilde{σ}}_{\max}}}{K_{0}} and b = 2 \frac{{\tilde{σ}}_{\max}}{K_{0}} and \\ α_{o p t} = \frac{K_{0}}{2 {\tilde{σ}}_{\min}} ε_{o p t} (2 - ε_{o p t}) + 1. \end{array}$ (61)

Finally, in one dimension, the auto-calibration of penalization parameters is given by:

$\begin{array}{l} \forall n \in {1, \dots, N - 1}, σ_{n} = \frac{α_{o p t}}{2 ε_{o p t}} {\tilde{σ}}_{n}^{*}, σ_{0} = \frac{α_{o p t}}{ε_{o p t}} {\tilde{σ}}_{0}^{*}, σ_{N} = \frac{α_{o p t}}{ε_{o p t}} {\tilde{σ}}_{N}^{*} \\ with {\tilde{σ}}_{n}^{*} = \frac{{(K_{1}^{(n)} C_{tr, p - 1}^{(n)})}^{2}}{K_{0}^{(n)}} . \end{array}$ (62)

In two dimensions, the auto-calibration of penalization parameters is given by:

${\begin{cases} \forall F \in ℱ^{in}, σ_{E}^{i n} = \frac{α_{o p t}}{2 ε_{o p t}} σ_{E}^{*}, σ_{E_{r}}^{i n} = \frac{α_{o p t}}{2 ε_{o p t}} σ_{E_{r}}^{*} \\ \forall F \in ℱ^{\partial}, σ_{E}^{\partial} = \frac{α_{o p t}}{ε_{o p t}} σ_{E}^{*} \end{cases} with σ_{E}^{*} = \frac{D^{E} {(K_{1}^{E} C_{tr, p - 1}^{E})}^{2}}{2 ε^{E} K_{0}^{E}}$ (63)

and $D^{E}$ is the number of edges of the element $E$ and $C_{tr, p - 1}^{E}$ is the trace constant defined in Equation (42).

Extension to multi-dimensional meshes. In multi-dimensional settings, the penalization term given in Equation (62) is extended by replacing the one-dimensional element length $d_{E}$ with a local characteristic size derived from the ratio between the element volume and its boundary surface area, i.e., $d_{E} = | E | / | \partial E |$ . For non-quadrilateral (triangular or polygonal) cells, this characteristic measure ensures that the penalty parameter scales consistently with the element geometry. Therefore, no additional geometric constant is required, and the same penalization formula applies naturally in two and three dimensions.

4.5. Convergence of the Discrete Linearized Weak Problem to the Continuous Linearized Weak Problem

Previously, it has been proven that the Problem ( ${\tilde{W}}_{h}$ ) has a unique solution. This problem is part of a fixed-point method, and it has been proven in Section 4.2 that this fixed point has a unique solution also. To solve the nonlinear weak formulation Problem ( $W$ ), one step needs to be added to prove the well-posedness of the problem. It is addressed in the following; the goal is to prove that the solution of Problem ( ${\tilde{W}}_{h}$ ) converges towards the solution of Problem ( $\tilde{W}$ ) and prove that the bilinear form ${\tilde{a}}_{h}$ of Problem ( ${\tilde{W}}_{h}$ ) converges to Problem ( $W$ ).

The work in this section is based on the book of Pietro and Ern published in 2012 [13]. They proved convergence in the case of a Symmetric Interior Penalty Galerkin method and sketch the proof in the case of an Incomplete Interior Penalty method. The following study provides detailed proof of the IIPG case.

The key idea is to revisit the concept of consistency and introduce a new point of view based on asymptotic consistency. This new form of consistency and the usual stability of the discrete bilinear form are the two main ingredients for asserting convergence to the minimal regularity solutions. The discrete bilinear form $a_{h}$ needs to be reformulated to consider only the contribution of $K$ on the mesh elements, not the interfaces; consequently, lifting operators are introduced. They map functions defined on mesh faces to functions defined on mesh elements. In the context of DG methods, liftings act on interfaces and boundary jumps. Bassi and Rebay introduced them [36] in the context of compressible flows and analyzed by Brezzi et al. [37] in the context of the Poisson problem. Liftings have many useful applications. They can be combined with the gradient to define discrete gradients. Discrete gradients play an essential role in the design and analysis of DG methods. Indeed, they can be used to formulate the discrete problem locally on each element using numerical fluxes.

Liftings: Definition

For any point $x_{n}$ , and for all $φ \in L^{2} ({x_{n}})$ the lifting operator $r_{n}^{p} : L^{2} ({x_{n}}) \to V_{0}^{p} (ℰ_{h})$ is defined as the solution of the following problem:

$\int_{Ω} r_{n}^{p} (φ) τ_{h} d x = {| τ_{h} |}_{x_{n}} φ (x_{n}), \forall τ_{h} \in V_{0}^{p} (ℰ_{h}) .$ (64)

For any $v$ in $V_{0}^{p} (ℰ_{h})$ , the global lifting of its interface and boundary jumps is defined as follows:

$R_{h}^{p} (〚 v 〛) : = \sum_{n = 0}^{N} r_{n}^{p} (〚 v 〛) \in V_{0}^{p} (ℰ_{h}) .$ (65)

Discrete gradients: Definition

The discrete gradient operator $G_{h}^{p} : V_{0}^{p} (ℰ_{h}) \to L^{2} (I_{n})$ is defined as follows: for all $v$ in $V_{0}^{p} (ℰ_{h})$ ,

$G_{h}^{p} (v) : = \nabla_{h} v - R_{h}^{p} (〚 v 〛) .$ (66)

In addition, there exists a bound on the discrete gradient operator:

${‖ G_{h}^{p} (v) ‖}_{L^{2} (Ω)} \leq α ‖ v ‖$ (67)

where $‖ \cdot ‖$ is the norm associated with the IIPG formulation defined Equation (30).

Theorem 11. (Regularity of the limit and weak asymptotic consistency of discrete gradients.) Let $p \geq 0$ . Let $v_{h}$ be a sequence in $V_{0}^{p} (ℰ_{h})$ bounded by the $‖ . ‖$ -norm. Then, there is a function $v \in H_{0}^{1} (Ω)$ such that as $h \to 0$ , up to a subsequence,

$v_{h} \to v strongly in L^{2} (Ω),$ (68)

and for all $p \geq 0$ , the discrete gradients defined by Equation (66) are such that

$G_{h}^{p} (v_{h}) ⇀ v^{'} weakly in L^{2} (Ω) .$ (69)

Proof of Theorem 11 is available in [13] (pp. 194-195).

Because of the shape of the IIPG formulation, the modified discrete gradient operator ${\hat{G}}_{h}^{p} : V_{0}^{p} (ℰ_{h}) \to L^{2} (I_{n})$ is defined as follows: for all $v$ in $V_{0}^{p} (ℰ_{h})$ ,

${\hat{G}}_{h}^{p} (v) : = \nabla_{h} v .$ (70)

Using liftings and discrete gradients, surface contributions of the flux in Equation (32) are transformed to volume contribution. It makes working with the bilinear form ${\tilde{a}}_{h}$ easier. For a given $\bar{u} \in V_{0}^{p} (ℰ_{h})$ , it can be rewritten as follows:

$\begin{matrix} \forall u_{h}, v_{h} \in V_{0}^{p} (ℰ_{h}), \\ {\tilde{a}}_{h} (u_{h}, v_{h}; \bar{u}) = \sum_{n = 0}^{N - 1} \int_{I_{n}} K (x, \bar{u}) \nabla_{h} u_{h} \nabla_{h} v_{h} d x - \sum_{n = 0}^{N} {| K (x, \bar{u}) \nabla_{h} u_{h} |}_{x_{n}} {〚 v_{h} 〛}_{x_{n}} + s_{h} (u_{h}, v_{h}) \\ = \sum_{n = 0}^{N - 1} \int_{I_{n}} K (x, \bar{u}) {\hat{G}}_{h}^{p} (u_{h}) \nabla_{h} v_{h} d x - \sum_{n = 0}^{N} \sum_{m = 0}^{N - 1} \int_{I_{m}} K (x, \bar{u}) r_{n}^{p} (〚 v_{h} 〛) {\hat{G}}_{h}^{p} (u_{h}) + s_{h} (u_{h}, v_{h}) \\ = \sum_{n = 0}^{N - 1} \int_{I_{n}} K (x, \bar{u}) {\hat{G}}_{h}^{p} (u_{h}) \nabla_{h} v_{h} d x - \sum_{n = 0}^{N - 1} \int_{I_{n}} K (x, \bar{u}) R_{h}^{p} (〚 v_{h} 〛) {\hat{G}}_{h}^{p} (u_{h}) + s_{h} (u_{h}, v_{h}) \\ = \sum_{n = 0}^{N - 1} \int_{I_{n}} K (x, \bar{u}) {\hat{G}}_{h}^{p} (u_{h}) G_{h}^{p} (v_{h}) d x + s_{h} (u_{h}, v_{h}) \end{matrix}$ (71)

with

$\begin{array}{l} \forall u_{h}, v_{h} \in V_{0}^{p} (ℰ_{h}), s_{h} (u_{h}, v_{h}) = \frac{σ_{0}}{h} {〚 u_{h} 〛}_{x_{0}} {〚 v_{h} 〛}_{x_{0}} + \sum_{n = 1}^{N - 1} \frac{σ_{n - 1} + σ_{n}}{2 h} {〚 u_{h} 〛}_{x_{n}} {〚 v_{h} 〛}_{x_{n}} \\ + \frac{σ_{N}}{h} {〚 u_{h} 〛}_{x_{N}} {〚 v_{h} 〛}_{x_{N}} . \end{array}$ (72)

Consider that ${(σ_{n})}_{n = 0, \dots, N}$ are chosen according to Lemma 4 that implies discrete coercivity in the $‖ . ‖$ -norm, and hence well-posedness of the discrete linearized problem $({\tilde{V}}_{h})$ .

Definition 1. (Asymptotic adjoint consistency.) The discrete bilinear form $a_{h}$ is asymptotically adjoint consistent with the exact bilinear form $a$ on $V_{0}^{p} (ℰ_{h})$ if for any subsequence $v_{h}$ in $V_{0}^{p} (ℰ_{h})$ bounded in the $‖ . ‖$ -norm and for any smooth function $φ \in C_{0}^{\infty} (Ω)$ , there is a subsequence $φ_{h}$ in $V_{0}^{p} (ℰ_{h})$ converging to $φ$ in the $‖ . ‖$ -norm and such that, up to a subsequence

$\lim_{h \to 0} a_{h} (v_{h}, φ_{h}) = a (v, φ) = \int_{Ω} v^{'} φ^{'} d x$ (73)

where $v \in H_{0}^{1} (Ω)$ is the limit of the subsequence identified in Theorem 11.

Lemma 12. (Asymptotic adjoint consistency of ${\tilde{a}}_{h}$ .) The discrete bilinear form ${\tilde{a}}_{h}$ of Problem ( ${\tilde{W}}_{h}$ ) is asymptotically adjoint consistent with the exact bilinear form $\tilde{a}$ of Problem ( $\tilde{W}$ ) on $V_{0}^{p} (ℰ_{h})$ .

Finally, we deduce the following result.

Theorem 13. (Convergence to minimal regularity solutions.) Let $p \geq 1$ . Let $u_{h}$ be a sequence of approximate solutions generated by solving the discrete linearized problem $({\tilde{V}}_{h})$ with ${\tilde{a}}_{h}$ defined by Equation (32) and with penalty parameters ensuring coercivity. Then, as $h \to 0$

$u_{h} \to u strongly in L^{2} (Ω)$ (74)

$\nabla_{h} u_{h} \to u^{'} strongly in L^{2} (Ω)$ (75)

${| u_{h} |}_{J} \to 0$ (76)

where $u \in H_{0}^{1} (Ω)$ is the unique solution of the strong problem.

Proofs of Lemma 12 and Theorem 13 can be found in Appendix.

4.6. Concluding Results

In the current section, several theorems have been proven. It is proven that there exists a unique solution to Problem ( $W$ ) using Lemma 1 and Lemma 2. Then, it is proven that for a given $\bar{u}$ , there exists a unique solution to Problem ( ${\tilde{W}}_{h}$ ) using Lemma 3. Lastly it is proven that for a given $\bar{u}$ , the solution of Problem ( ${\tilde{W}}_{h}$ ) converges to the solution of Problem ( $\tilde{W}$ ). These results proven in a general case for a given $\bar{u}$ can be used to solve the toy problem. Figure 6 gives a graphical representation of the whole loop of resolution with different paths.

Figure 6. Scheme of the whole loop of resolution with the different linearization methods.

The nonlinear problem, Problem ( $W$ ) can be linearized directly at the continuous level by employing a fixed-point method. The continuous level linearization

$\begin{array}{l} T : H_{0}^{1} (Ω) \to H_{0}^{1} (Ω) \\ \bar{u} \mapsto T (\bar{u}) = u \end{array}$ (77)

stands for: find $u$ solution of Problem ( $\tilde{W}$ ) for a given $\bar{u} \in H_{0}^{1} (Ω)$ . One can define the following sequence defined by $u^{0} \in H_{0}^{1} (Ω)$ an initial guess and $u^{n + 1} = T (u^{n})$ for $n \in ℕ$ . Lemma 1 and Lemma 2 ensure that taking $\lim_{n \to \infty} u^{n}$ gives the solution of Problem ( $W$ ).

A discretization step is needed to compute the solution of Problem ( $W$ ). Consequently, the projector $P_{h} : H_{0}^{1} (Ω) \to V_{0}^{p} (ℰ_{h})$ is introduced. It projects a function living in an infinite-dimensional space to a finite-dimensional space, especially it projects a function to the DG space $V_{0}^{p} (ℰ_{h})$ . Then, at a discrete level, the linearization method

$\begin{array}{l} T_{h} : V_{0}^{p} (ℰ_{h}) \to V_{0}^{p} (ℰ_{h}) \\ \bar{u} \mapsto T_{h} (\bar{u}) = u_{h} \end{array}$ (78)

stands for: find $u_{h}$ discrete solution of Problem ( ${\tilde{W}}_{h}$ ) for a given $\bar{u} \in V_{0}^{p} (ℰ_{h})$ . One can notice that for a given $\bar{u} \in H_{0}^{1} (Ω)$ , it has been proven (Theorem 13) that $(T_{h} \circ P_{h}) (\bar{u}) = u_{h}$ converges to $u$ given by $T_{C} (\bar{u})$ .

Lastly, the linearization method of Problem ( $W$ ) going through a discretization step is defined as

$\begin{array}{l} T_{D} : H_{0}^{1} (Ω) \to H_{0}^{1} (Ω) \\ \bar{u} \mapsto T_{D} (\bar{u}) = \lim_{h \to 0} (T_{h} \circ P_{h}) (\bar{u}) = u \end{array}$ (79)

Using $T_{D}$ one can define a new sequence $v^{0} \in H_{0}^{1} (Ω)$ an initial guess and $v^{n + 1} = T_{D} (v^{n}) = \lim_{h \to 0} (T_{h} \circ P_{h}) (v^{n})$ for $n \in ℕ$ . Taking the limit when $n$ goes to infinity gives the solution of Problem ( $W$ ).

The previously explained method uses two limits, $h$ goes to 0 then $n$ goes to infinity. One can also consider limits in the opposite order. Using proof of Lemma 1 applied to the nonlinear discrete problem and then using Theorem 13, one can prove that the solution of the nonlinear discrete problem converges to the solution of the nonlinear continuous problem.

5. Numerical Results

Following the numerical methods and theoretical results presented in the previous sections, the RIVAGE code is validated against numerical test cases. Two analytical test cases are used to compute convergence rates and validate the code. These analytical test cases are obtained by considering the problem’s aimed solution and choosing the source term according to the solution and the hydraulic conductivity function. They are built upon the nonlinear Poisson’s equation. The first case is a nonlinear one-dimensional problem in its stationary form. The second case is a nonlinear two-dimensional problem in its stationary form. These numerical experiments are inspired by literature. In 2008, Rivière [14] and in 2021, Clément et al. [12] computed convergence rates for linear problems, also for nonlinear problems.

Stationary problems are considered since theoretical results are given on this type of problem. Moreover, they are more difficult to solve since they solve the problem at infinite time. Consequently, the nonlinear solver has to find the solution without getting time sub-steps.

Experimental test cases are solved with the RIVAGE code. These problems aim at confirming the performance of the adaptive strategy proposed in this work. Moreover, they allow to test RIVAGE of problems encountered in the hydrology field. These experiments are based on the work of Haverkamp et al. [38] and Vauclin et al. [39].

Table 3. Solver and time-integration settings used in all numerical experiments.

Parameter	Symbol/Setting	Description
Nonlinear tolerance	$ε_{NL} = 10^{- 6}$	Residual tolerance for Picard iterations
Maximum Picard iterations	$N_{Pic}^{max} = 40$	Upper bound before step rejection
Time-step size	adaptive	Automatically reduced if residual increases
Minimum time-step	$Δ t_{min} = 10^{- 4} s$	Stability safeguard
Linear solver	Conjugate Gradient (CG)	Preconditioned by ILU(0)
Spatial polynomial degree	$p = 1, 2$	Depending on the test case
Penalty parameters	Equation (63)	Auto-calibrated during iteration

These settings (see Table 3) were kept identical for all test cases unless explicitly stated otherwise.

5.1. One-Dimensional Analytical Test Case

For this first test case, theoretical convergence rates of the IIPG methods are checked, and numerical stability is evaluated concerning penalty values and penalization methods. The following problem is considered:

${\begin{cases} - (K (u) u) = f (x) in Ω = [- 1, 1] \\ u (- 1) = 1, \\ u (1) = - 1, \end{cases}$ (80)

with $K (u) = tanh (5 u) + 1.01$ and $f$ obtained by replacing $u$ by $u_{e x}$ in the problem. The chosen analytical solution is $u_{e x} (x) = - sin (\frac{π}{2} x)$ . The analytical solution is chosen not to be polynomial but to span the interval $[- 1, 1]$ . The hydraulic conductivity is chosen to have a nonlinear problem with a similar shape of law given in Table 1. tanh has been chosen because it is a smooth function convenient for the computation of convergence rates and looks like constitutive laws for RE. Moreover, a factor of 200 between the maximum and the minimum value of $K$ with $K_{0} = 0.01$ . The problem is solved with the IIPG method. Three types of penalization are used. The first one $σ^{E} = σ = 1$ for all $E \in ℰ$ , the second one $σ^{E} = σ = 100$ for all $E \in ℰ$ and the third one $σ^{E}$ are auto-calibrated using the method presented in Section 4. For each type of penalization, the solution is approximated by a piecewise linear function ( $p = 1$ ), a piecewise quadratic function ( $p = 2$ ), and a piecewise cubic function ( $p = 3$ ). Moreover, lastly, four different mesh sizes are used $N_{x} = 20, 40, 80, 160$ with $N_{x}$ —the number of elements in the equally spaced partition of Ω.

Table 4. $L^{2}$ -error, convergence rates and number of iterations for the one-dimensional benchmark.

		$p = 1$			$p = 2$			$p = 3$
$σ$	$N_{x}$	$L^{2}$ -error	$r$	$t (s)$	$L^{2}$ -error	$r$	$t (s)$	$L^{2}$ -error	$r$	$t (s)$
1	20	3.21 × 10⁻¹		0.21	1.33 × 10⁻¹		1.94	2.29 × 10⁻⁴		6.21
-	40	1.29 × 10⁻¹	1.31	0.46	3.41 × 10⁻²	1.97	3.17	1.42 × 10⁻⁵	4.01	11.85
-	80	3.77 × 10⁻²	1.78	1.02	8.53 × 10⁻³	2.00	5.97	8.88 × 10⁻⁷	4.00	23.94
-	160	9.83 × 10⁻³	1.94	2.08	2.13 × 10⁻³	2.00	12.09	5.62 × 10⁻⁸	3.98	52.83
-	Fitted		1.69			1.99			4.00
100	20	8.33 × 10⁻³		0.21	1.36 × 10⁻³		1.48	2.33 × 10⁻⁶		5.89
-	40	2.10 × 10⁻³	1.99	0.51	3.41 × 10⁻⁴	2.00	2.97	1.44 × 10⁻⁷	4.02	11.87
-	80	5.27 × 10⁻⁴	2.00	1.03	8.53 × 10⁻⁵	2.00	5.94	9.74 × 10⁻⁹	3.89	24.03
-	160	1.31 × 10⁻⁴	2.00	2.08	2.13 × 10⁻⁵	2.00	12.16	1.37 × 10⁻⁹	2.83	53.20
-	Fitted		1.99			2.00			3.61
auto	20	3.53 × 10⁻²		0.24	1.69 × 10⁻²		1.43	3.46 × 10⁻⁶		5.88
-	40	8.88 × 10⁻³	1.99	0.51	4.40 × 10⁻³	1.94	2.86	1.61 × 10⁻⁷	4.42	11.92
-	80	2.17 × 10⁻³	2.03	1.05	1.13 × 10⁻³	1.95	5.95	9.45 × 10⁻⁹	4.10	24.07
-	160	5.32 × 10⁻⁴	2.03	2.55	2.90 × 10⁻⁴	1.97	12.13	1.33 × 10⁻⁹	2.82	53.25
-	Fitted		2.02			1.96			3.81

Figure 7. Penalization parameters for the one-dimensional test case in the case of auto-penalization.

Table 4 shows $L^{2}$ -error and convergence rate for each computation. It can be noticed that computed convergence rates correspond to the theoretical ones found in literature [14] and [40] (pp. 64-84). For the IIPG formulation with penalization, $p$ is odd in order $p + 1$ , it is optimal, and if $p$ is even, the order is $p$ , it is suboptimal. Moreover, for a penalization speed set by the user to 1 (outside of the range specified by theoretical results), errors are about 100 times greater than in other computations. The fixed-point method converges to a less accurate solution. Computation times are also given. It can be noticed that auto-penalization is not greatly slower than user-defined penalization and can even be faster due to the quickest convergence of the iterative method.

Moreover, Figure 7 shows penalization values in the case of auto-calibration. One can observe that penalization values are not constant on Ω and vary according to the polynomial degree of approximation. On the domain, some part needs a small amount of penalization, whereas others need a higher amount.

5.2. Two-Dimensional Analytical Test Case

This second experiment focuses on the ability of the IIPG method to solve RE in two dimensions. Its convergence rates are computed, and numerical stability is evaluated concerning penalty values and penalization methods. The following problem is considered:

${\begin{array}{l} - \nabla \cdot (K (u) \nabla u) = f (x) & in Ω = [- 1, 1] \times [- 1, 1] \\ u = 0 & on \partial Ω, \end{array}$ (81)

with $K (u) = tanh (u) + 1.01$ and similarly to the previous test case $f$ is obtained by replacing $u$ by $u_{e x}$ in the problem. The chosen analytical solution is $u_{e x} (x, y) = \sin (\frac{π}{2} x) \sin (\frac{π}{2} y)$ . The problem is solved similarly to the one-dimensional test case. Three types of penalization are used. The first one $σ^{E} = σ = 1$ for all $E \in ℰ$ , the second one $σ^{E} = σ = 100$ for all $E \in ℰ$ and the third one $σ^{E}$ are auto-calibrated. For each type of penalization, the solution is approximated by a piecewise linear function ( $p = 1$ ), a piecewise quadratic function ( $p = 2$ ), and a piecewise cubic function ( $p = 3$ ). Lastly, three different meshes are used. They are all composed of quadrilaterals of identical size, and each space direction is discretized with $N = 10, 20, 40$ elements. It gives a mesh with $N_{E} = 100, 400, 1600$ elements.

Table 5. $L^{2}$ -error, convergence rates and number of iterations for the two-dimensional benchmark.

		$p = 1$			$p = 2$			$p = 3$
$σ$	$N_{x}$	$L^{2}$ -error	$r$	$t (s)$	$L^{2}$ -error	$r$	$t (s)$	$L^{2}$ -error	$r$	$t (s)$
1	10	6.45 × 10⁻²		0.29	4.83 × 10⁻²		1.54	8.60 × 10⁻⁴		5.07
-	20	1.51 × 10⁻²	1.99	1.00	1.11 × 10⁻²	2.11	7.21	4.69 × 10⁻⁵	4.20	27.21
-	40	3.53 × 10⁻³	2.10	7.57	2.65 × 10⁻³	2.07	59.51	2.74 × 10⁻⁶	4.09	279.59
-	Fitted		2.10			2.09			4.15
100	10	3.80 × 10⁻²		0.25	2.02 × 10⁻³		1.14	7.32 × 10⁻⁵		4.83
-	20	9.53 × 10⁻³	1.99	0.99	2.72 × 10⁻⁴	2.90	6.78	4.59 × 10⁻⁶	4.00	30.23
-	40	2.38 × 10⁻³	2.00	8.37	4.08 × 10⁻⁵	2.74	61.62	2.87 × 10⁻⁷	4.00	290.86
-	Fitted		2.00			2.82			4.00
auto	10	3.37 × 10⁻²		0.25	2.52 × 10⁻³		1.15	7.41 × 10⁻⁵		4.93
-	20	8.11 × 10⁻³	2.06	1.03	5.90 × 10⁻⁴	2.09	6.88	4.71 × 10⁻⁶	3.98	30.15
-	40	2.02 × 10⁻³	2.00	8.49	1.51 × 10⁻⁴	1.96	60.85	2.97 × 10⁻⁷	3.99	288.50
-	Fitted		2.03			2.03			3.98

Table 5 shows $L^{2}$ -error and convergence rate for each computation. It can be noticed that computed convergence rates correspond to the theoretical ones found in literature [14] and [40] (pp. 64-84). For the IIPG formulation with penalization, $p$ is odd in order $p + 1$ , it is optimal, and if $p$ is even, the order is $p$ and suboptimal. Moreover, for a penalization speed set by the user to 1 (outside of the range specified by theoretical results), errors are about 100 times greater than in other computations. The fixed-point method converges to a less accurate solution. Computation times are also given. It can be noticed that auto-penalization is not greatly slower than user-defined penalization and can even be faster due to the quickest convergence of the iterative method as in the one-dimensional case.

Moreover, Figure 8 shows penalization values in the case of auto-calibration. One can observe that penalization values are not constant on Ω and vary according to the polynomial degree of approximation. On the domain, some part needs a small amount of penalization, whereas others need a higher amount.

Figure 8. Penalization parameters for the two-dimensional test case in the case of auto-penalization.

5.3. Application to Groundwater Flows I: Haverkamp’s Test Case

The two problems considered here, one-dimensional and two-dimensional, aim to validate the numerical resolution of RE using DG methods and auto-calibration of penalization parameters. Numerical results are compared to numerical simulations in the literature and experimental data.

The first experimental validation of solving RE with DG methods is a one-dimensional test case. The numerical results are compared with data sourced from the literature. This particular numerical test case was initially presented by Celia et al. [41]. It is based on an experiment conducted by Haverkamp et al. [38], who referred to the availability of a quasi-analytical solution provided by Philip [42]. Subsequently, it was used by others such as [43] [44], and represents a set of well-established test cases, for instance, see [30]. Despite its simplicity, this case offers insights into the fundamental physics of a wetting front resulting from infiltration.

This scenario involves the one-dimensional infiltration into a soil column measuring 40 cm in height and 8 cm in width. The hydraulic head at the top and bottom is governed by Dirichlet boundary conditions: $h_{t o p} = 19.3 cm$ and $h_{b o t t o m} = - 61.5 cm$ , resulting in cumulative downward infiltration. The sides are impermeable. The initial condition is $h_{0} = - 61.5 + z cm$ . Although this case is one-dimensional, it is solved on a two-dimensional domain. Therefore, homogeneous Neumann boundary conditions are applied along the boundary in the infiltration direction. For a visual representation of this setup, refer to Figure 9.

Figure 9. Haverkamp’s test case configuration.

Hydraulic properties use Vachaud’s relations in Table 1 with $A = 1.175 \times 10^{6}$ , $B = 4.74$ , $C = 1.611 \times 10^{6}$ , $D = 3.96$ , $K_{s} = 0.0094 cm \cdot s^{- 1}$ , $θ_{s} = 0.287$ , and $θ_{r} = 0.075$ . The simulation is done on a mesh of 160 elements along the $z$ -axis. The solution is piecewise linear ( $p = 1$ ), and time integration is BDF of order 2. Penalization parameters are set automatically using results from Section 4. In addition, stopping criteria are set to 10⁻⁶ for this computation. The solution to this problem is computed at $T = 600 s$ .

Figure 10 displays the comparison of numerical results with results from Manzini et al. [44], the pressure head distribution at $t = 360 s$ and the penalization parameters distribution at $t = 360 s$ . Numerical results are in good agreement with the literature results for this test case. The pressure head distribution shows a vertical progression of the wetting front with a steep transition from the initial $ψ$ to $ψ$ imposed at the boundary condition. Moreover, the distribution of penalization parameters shows that the penalization parameters are not constant on the whole domain and are higher on the wetting front.

This test case validates a real, evolving test case for the DG method. Moreover, it gives a good insight into the behavior of automatic penalization. Penalization parameters are auto-calibrated as long as the solution evolves. Moreover, automatic penalization impacts a full nonlinear problem because the nonlinear solver needs fewer iterations to converge to the solution.

Figure 10. Haverkamp’s test case, numerical solution for 160 elements, $p = 1$ and BDF-2 method.

5.4. Application to Groundwater Flows II: Vauclin’s Test Case

Vauclin, Vachaud, and Khanji conducted a series of laboratory experiments in the 1970s, the details of which can be found in [39]. These experiments explored water table recharge and drainage in a slab of sandy soil. The work by Vauclin et al. [39] specifically focuses on simulating water flow recharge through a soil slab and provides experimental details and results. The experiment involved a 6 m by 2 m box, with only one half simulated due to symmetry. The left, top (for $x > 50 cm$ ), and bottom sides were impervious, with a prescribed constant flux on the top for $x \leq 50 cm$ of $u_{g} \cdot n = - 14.8 cm \cdot h^{- 1}$ . The water level was maintained at a constant $h = 65 cm$ in the ditch on the right for $z \leq 65 cm$ , while the remaining boundary on the right for $z > 65 cm$ accounted for a seepage boundary condition. The initial state was at hydrostatic equilibrium with the water table at $z = 65 m$ . For further reference, please see Figure 11 for a schematic representation of the setup. The complete simulation of water table recharge by Vauclin et al. [39] has been used by numerous studies to evaluate their methods (see, for instance, [45]-[47]). The MODFLOW code validation partially relies on this experimental dataset [31].

Hydraulic properties use Vachaud’s relations in Table 1 with $A = 2.99 \times 10^{6}$ , $B = 5.0$ , $C = 40000$ , $D = 2.9$ , $K_{s} = 35 cm \cdot h^{- 1}$ , $θ_{s} = 0.3$ , and $θ_{r} = 0.0$ . The simulation is carried on an evolving mesh. The mesh is adapted along the computation according to the gradient of $h$ . Mesh adaptive parameters are set to $β_{c} = 50$ and $β_{r} = 50$ . The solution is sought piecewise linear ( $p = 2$ ) and time integration is BDF of order 3. Penalization parameters are set automatically using results from Section 4. In addition, stopping criteria are set to 10⁻⁶ for this computation. The solution of this problem is computed until $T = 10 h$ .

Figure 11. Vauclin’s test case configuration.

In the initial mesh displayed in Figure 12, the refinement below the water entry edge aims to assist in simulating the steep wetting front. Figure 13 compares the water table’s position at $t = 2, 3, 4, 8 h$ with data from Vauclin et al. [39]. The numerical results closely match the experimental profile, although there are small discrepancies in the middle of the water table, which may be due to the non-perfect isotropic and homogeneous nature of the sandy soil.

Figure 14 and Figure 15 illustrate the field distribution of hydraulic head, flux, and the positions of the water table and capillary fringe at $θ = 0.29$ . These figures also show the isolines of the hydraulic head. The numerical results are in agreement with the data from Vauclin et al. [39].

Additionally, in Figure 16, the evolution of penalization parameters during the computation is presented. At selected times, the evolution of the mesh reflects the capture of the steep front.

Finally, Figure 17 displays the evolution of time-steps and the number of elements over time. The adaptation of time steps and the number of elements is evident, with the time steps initially small due to the strong nonlinearity induced by the steep wetting front. As the front smoothens, the number of elements decreases, stabilizing at $N_{e l e} = 600$ after $t = 3 h$ .

This test case is a test case, which is a typical problem where auto-calibration of penalization parameters is essential. Since the problem is strongly nonlinear and evolving, with a basic penalization and user defined parameters, the nonlinear solver failed to capture the solution or necessitates some combination of fixed-point solver and Newton-Raphson method such as in the work of [7].

Discussions on possible limitations. A potential limitation of the proposed approach arises when hydraulic parameters (e.g., $K_{s}$ , $α$ , $n$ ) exhibit abrupt spatial variations between adjacent elements. In such cases, the optimal scaling of the penalty parameter may deteriorate, leading to sub-optimal convergence rates. Similarly, the application of capillary-pressure regularization can alter the local nonlinearity of the constitutive laws, thereby reducing the effectiveness of the automatic penalization scheme. Future work will investigate adaptive penalty strategies that account for local parameter contrasts and regularized constitutive models.

Figure 12. Vauclin’s test case, initial mesh.

Figure 13. Vauclin’s test case, numerical water table position compared to experimental data from Vaulin et al. [39].

Figure 14. Vauclin’s test case, at $t = 3 h$ , spatial distribution of hydraulic head, water table position (white line), contour plot of hydraulic head (red lines) and flux (arrows).

Figure 15. Vauclin’s test case, at $t = 8 h$ , spatial distribution of hydraulic head, water table position (white line), contour plot of hydraulic head (red lines) and flux (arrows).

Figure 16. Vauclin’s test case, spatial distribution of penalization parameters and mesh at selected times.

Figure 17. Vauclin’s test case, evolution along time of time-steps (left) and number of elements (right).

Acknowledgements

This work has been supported by the ADEN-MED project (Adaptability to Extreme events and Natural risks-application to the Mediterranean and Djibouti), funded by the Région Sud Provence-Alpes-Côte d’Azur under the AAP MEDCLIMAT “Natural risks and food sovereignty”, and by France 2030 through the Priority Research Program and Equipment (PEPR) “Maths-Vives-Mathematics in Interactions”, targeted project HYDRAUMATH (ANR-23-EXMA-007), operated by ANR.

Appendix

Proofs on Theoretical Results

Proof of Lemma 4. For a given $\bar{u} \in V_{0}^{p} (ℰ_{h})$ and choosing $v_{h} = u_{h}$ in (32) yields

$\begin{array}{l} \forall u_{h} \in V_{0}^{p} (ℰ_{h}), {\tilde{a}}_{h} (u_{h}, u_{h}) = \sum_{n = 0}^{N - 1} \int_{I_{n}} K (x, \bar{u}) {({u^{'}}_{h})}^{2} d x - \sum_{n = 0}^{N} {| K (x, \bar{u}) {u^{'}}_{h} |}_{x_{n}} {〚 u_{h} 〛}_{x_{n}} \\ + \frac{σ_{0}}{h} {〚 u_{h} 〛}_{x_{0}}^{2} + \sum_{n = 1}^{N - 1} \frac{σ_{n - 1} + σ_{n}}{2 h} {〚 u_{h} 〛}_{x_{n}}^{2} + \frac{σ_{N}}{h} {〚 u_{h} 〛}_{x_{N}}^{2} \end{array}$ (82)

An upper bound to the term $\sum_{n = 0}^{N} {| K (x, \bar{u}) {u^{'}}_{h} |}_{x_{n}} {〚 u_{h} 〛}_{x_{n}}$ needs to be established to prove the coercivity of ${\tilde{a}}_{h}$ . Using Hypothesis ( $ℋ_{n}$ ) and definition of average:

$\begin{array}{l} \forall n \in {1, \dots, N - 1} \\ | {| K (x, \bar{u}) {u^{'}}_{h} |}_{x_{n}} | \leq \frac{1}{2} (| K (x_{n}^{-}, \bar{u} (x_{n}^{-})) {u^{'}}_{h} (x_{n}^{-}) | + | K (x_{n}^{+}, \bar{u} (x_{n}^{+})) {u^{'}}_{h} (x_{n}^{+}) |) \\ \leq \frac{K_{1}^{(n - 1)}}{2} | {u^{'}}_{h} (x_{n}^{-}) | + \frac{K_{1}^{(n)}}{2} | {u^{'}}_{h} (x_{n}^{+}) | \end{array}$ (83)

Recalling the trace inequality [48] in the case of an orthonormal polynomial basis: for an interval $I_{n}$ ,

$\forall u \in ℙ^{p} (I_{n}), | u (x_{n}^{+}) | \leq C_{tr, p} \frac{{‖ u ‖}_{L^{2} (I_{n})}}{\sqrt{h}}, | u (x_{n + 1}^{-}) | \leq C_{tr, p} \frac{{‖ u ‖}_{L^{2} (I_{n})}}{\sqrt{h}}$ (84)

we get, $\forall n \in {1, \dots, N - 1}$ :

$\begin{matrix} | {| K (x, \bar{u}) {u^{'}}_{h} |}_{x_{n}} | | {〚 u_{h} 〛}_{x_{n}} | \leq (\frac{K_{1}^{(n - 1)}}{2} \frac{C_{tr, p - 1}^{(n - 1)}}{\sqrt{h}} {‖ {u^{'}}_{h} ‖}_{I_{n - 1}} + \frac{K_{1}^{(n)}}{2} \frac{C_{tr, p - 1}^{(n)}}{\sqrt{h}} {‖ {u^{'}}_{h} ‖}_{I_{n}}) | {〚 u_{h} 〛}_{x_{n}} | \\ \leq \sqrt{ε^{(n - 1)}} \sqrt{K_{0}^{(n - 1)}} {‖ {u^{'}}_{h} ‖}_{I_{n - 1}} \frac{K_{1}^{(n - 1)}}{2 \sqrt{ε^{(n - 1)}} \sqrt{K_{0}^{(n - 1)}}} \frac{C_{tr, p - 1}^{(n - 1)}}{\sqrt{h}} | {〚 u_{h} 〛}_{x_{n}} | \\ + \sqrt{ε^{(n)}} \sqrt{K_{0}^{(n)}} {‖ {u^{'}}_{h} ‖}_{I_{n}} \frac{K_{1}^{(n)}}{2 \sqrt{ε^{(n)}} \sqrt{K_{0}^{(n)}}} \frac{C_{tr, p - 1}^{(n)}}{\sqrt{h}} | {〚 u_{h} 〛}_{x_{n}} | \end{matrix}$ (85)

At the boundary nodes $x_{0}$ and $x_{N}$ , we have

$| {| K (x, \bar{u}) {u^{'}}_{h} |}_{x_{0}} | | {〚 u_{h} 〛}_{x_{0}} | \leq \sqrt{ε^{(0)}} \sqrt{K_{0}^{(0)}} {‖ {u^{'}}_{h} ‖}_{I_{0}} \frac{K_{1}^{(0)}}{\sqrt{ε^{(0)}} \sqrt{K_{0}^{(0)}}} \frac{C_{tr, p - 1}^{(0)}}{\sqrt{h}} | {〚 u_{h} 〛}_{x_{0}} |$ (86)

$| {| K (x, \bar{u}) {u^{'}}_{h} |}_{x_{N}} | | {〚 u_{h} 〛}_{x_{N}} | \leq \sqrt{ε^{(N - 1)}} \sqrt{K_{0}^{(N - 1)}} {‖ {u^{'}}_{h} ‖}_{I_{N - 1}} \frac{K_{1}^{(N - 1)}}{\sqrt{ε^{(N - 1)}} \sqrt{K_{0}^{(N - 1)}}} \frac{C_{tr, p - 1}^{(N - 1)}}{\sqrt{h}} | {〚 u_{h} 〛}_{x_{N}} |$ (87)

Gathering the bounds on the boundary and the interior nodes, we get

$\begin{array}{l} \sum_{n = 0}^{N - 1} | {| K (x, \bar{u}) {u^{'}}_{h} |}_{x_{n}} | | {〚 u_{h} 〛}_{x_{n}} | \\ \leq \sqrt{ε^{(0)}} \sqrt{K_{0}^{(0)}} {‖ {u^{'}}_{h} ‖}_{I_{0}} \frac{K_{1}^{(0)}}{\sqrt{ε^{(0)}} \sqrt{K_{0}^{(0)}}} \frac{C_{tr, p - 1}^{(0)}}{\sqrt{h}} | {〚 u_{h} 〛}_{x_{0}} | \\ + \sum_{n = 1}^{N - 1} (\sqrt{ε^{(n - 1)}} \sqrt{K_{0}^{(n - 1)}} {‖ {u^{'}}_{h} ‖}_{I_{n - 1}} \frac{K_{1}^{(n - 1)}}{2 \sqrt{ε^{(n - 1)}} \sqrt{K_{0}^{(n - 1)}}} \frac{C_{tr, p - 1}^{(n - 1)}}{\sqrt{h}} | {〚 u_{h} 〛}_{x_{n}} | \\ + \sqrt{ε^{(n)}} \sqrt{K_{0}^{(n)}} {‖ {u^{'}}_{h} ‖}_{I_{n}} \frac{K_{1}^{(n)}}{2 \sqrt{ε^{(n)}} \sqrt{K_{0}^{(n)}}} \frac{C_{tr, p - 1}^{(n)}}{\sqrt{h}} | {〚 u_{h} 〛}_{x_{n}} |) \\ + \sqrt{ε^{(N - 1)}} \sqrt{K_{0}^{(N - 1)}} {‖ {u^{'}}_{h} ‖}_{I_{N - 1}} \frac{K_{1}^{(N - 1)}}{\sqrt{ε^{(N - A)}} \sqrt{K_{0}^{(N - 1)}}} \frac{C_{tr, p - 1}^{(N - 1)}}{\sqrt{h}} | {〚 u_{h} 〛}_{x_{N}} | \end{array}$ (88)

Then, using Cauchy-Schwarz’s and Young’s inequality, we have:

$\begin{array}{l} \sum_{n = 0}^{N - 1} {| K (x, \bar{u}) {u^{'}}_{h} |}_{x_{n}} {〚 u_{h} 〛}_{x_{n}} \\ \leq \sum_{n = 0}^{N - 1} \frac{ε^{(n)} K_{0}^{(n)}}{2} {‖ {u^{'}}_{h} ‖}_{I_{n}}^{2} + \frac{{(K_{1}^{(0)} C_{tr, p - 1}^{(0)})}^{2}}{ε^{(0)} K_{0}^{(0)}} \frac{{〚 u_{h} 〛}_{x_{0}}^{2}}{h} + \frac{{(K_{1}^{(N - 1)} C_{tr, p - 1}^{(N - 1)})}^{2}}{ε^{(N - 1)} K_{0}^{(N - 1)}} \frac{{〚 u_{h} 〛}_{x_{N}}^{2}}{h} \\ + \sum_{n = 1}^{N - 1} (\frac{{(K_{1}^{(n - 1)} C_{tr, p - 1}^{(n - 1)})}^{2}}{2 ε^{(n - 1)} K_{0}^{(n - 1)}} \frac{{〚 u_{h} 〛}_{x_{n}}^{2}}{2 h} + \frac{{(K_{1}^{(n)} C_{tr, p - 1}^{(n)})}^{2}}{2 ε^{(n)} K_{0}^{(n)}} \frac{{〚 u_{h} 〛}_{x_{n}}^{2}}{2 h}) \end{array}$ (89)

From the above inequality, we deduce a lower bound of ${\tilde{a}}_{h} (u_{h}, u_{h}; \bar{u}), \forall u_{h} \in V_{0}^{p} (ℰ_{h})$

(90)

where

${\begin{array}{l} σ_{n}^{*} = \frac{{(K_{1}^{(n)} C_{tr, p - 1}^{(n)})}^{2}}{2 ε^{(n)} K_{0}^{(n)}}, \forall n \in {1, \dots, N - 1} \\ σ_{0}^{*} = \frac{{(K_{1}^{(0)} C_{tr, p - 1}^{(0)})}^{2}}{ε^{(0)} K_{0}^{(0)}} \\ σ_{N}^{*} = \frac{{(K_{1}^{(N - 1)} C_{tr, p - 1}^{(N - 1)})}^{2}}{ε^{(N - 1)} K_{0}^{(N - 1)}} \end{array}$ (91)

Finally, thanks to the inequality (90), ${\tilde{a}}_{h}$ (32) is coercive if

${\begin{array}{l} ε^{(n)} < 2, & \forall n \in {0, \dots, N - 1} \\ σ_{n} > σ_{n}^{*}, & \forall n \in {1, \dots, N - 1} \\ σ_{0} > σ_{0}^{*} \\ σ_{N} > σ_{N}^{*} \end{array}$ (92)

which ends the proof. □

Proof of Lemma 5. For a given $\bar{u} \in V_{0}^{p} (ℰ_{h})$ , an upper bound for $| {\tilde{a}}_{h} (u_{h}, v_{h}; \bar{u}) |$ , $\forall u_{h}, v_{h} \in V_{0}^{p} (ℰ_{h})$ needs to be established in order to prove continuity of ${\tilde{a}}_{h}$ . Firstly, start bounding above the volume contribution using Hypothesis ( $ℋ_{n}$ ):

$\begin{matrix} | \sum_{n = 0}^{N - 1} \int_{I_{n}} K (x, \bar{u}) {u^{'}}_{h} {v^{'}}_{h} d x | \leq \sum_{n = 0}^{N - 1} K_{1}^{(n)} | \int_{I_{n}} {u^{'}}_{h} {v^{'}}_{h} d x | \leq \sum_{n = 0}^{N - 1} \sqrt{K_{1}^{(n)}} {‖ {u^{'}}_{h} ‖}_{I_{n}} \sqrt{K_{1}^{(n)}} {‖ {v^{'}}_{h} ‖}_{I_{n}} \\ \leq {(\sum_{n = 0}^{N - 1} K_{1}^{(n)} {‖ {u^{'}}_{h} ‖}_{I_{n}}^{2})}^{\frac{1}{2}} {(\sum_{n = 0}^{N - 1} K_{1}^{(n)} {‖ {v^{'}}_{h} ‖}_{I_{n}}^{2})}^{\frac{1}{2}} . \end{matrix}$ (93)

Then, penalization terms are bounded above

$\begin{array}{l} | \frac{σ_{0}}{h} {〚 u_{h} 〛}_{x_{0}} {〚 v_{h} 〛}_{x_{0}} + \sum_{n = 1}^{N - 1} \frac{σ_{n - 1} + σ_{n}}{2 h} {〚 u_{h} 〛}_{x_{n}} {〚 v_{h} 〛}_{x_{n}} + \frac{σ_{N}}{h} {〚 u_{h} 〛}_{x_{N}} {〚 v_{h} 〛}_{x_{N}} | \\ \leq {(\frac{σ_{0}}{h} {〚 u_{h} 〛}_{x_{0}}^{2} + \sum_{n = 1}^{N - 1} \frac{σ_{n - 1} + σ_{n}}{2 h} {〚 u_{h} 〛}_{x_{n}}^{2} + \frac{σ_{N}}{h} {〚 u_{h} 〛}_{x_{N}}^{2})}^{\frac{1}{2}} \end{array}$ (94)

$\begin{array}{l} {(\frac{σ_{0}}{h} {〚 v_{h} 〛}_{x_{0}}^{2} + \sum_{n = 1}^{N - 1} \frac{σ_{n - 1} + σ_{n}}{2 h} {〚 v_{h} 〛}_{x_{n}}^{2} + \frac{σ_{N}}{h} {〚 v_{h} 〛}_{x_{N}}^{2})}^{\frac{1}{2}} \\ \leq max (σ_{0}, σ_{N}, max_{\begin{matrix} n = 1, \dots, N - 1 \end{matrix}} (\frac{σ_{n}}{2})) ‖ u_{h} ‖ ‖ v_{h} ‖ \end{array}$ (95)

and one can write

$\begin{array}{l} \sum_{n = 0}^{N} | {| K (x, \bar{u}) {u^{'}}_{h} |}_{x_{n}} {〚 v_{h} 〛}_{x_{n}} | \\ \leq {(2 \sum_{n = 0}^{N - 1} ε^{(n)} K_{0}^{(n)} {‖ {u^{'}}_{h} ‖}_{I_{n}}^{2})}^{\frac{1}{2}} (\frac{{(K_{1}^{(0)} C_{tr, p - 1}^{(0)})}^{2}}{ε^{(0)} K_{0}^{(0)}} \frac{{〚 v_{h} 〛}_{x_{0}}^{2}}{h} + \frac{{(K_{1}^{(N - 1)} C_{tr, p - 1}^{(N - 1)})}^{2}}{ε^{(N - 1)} K_{0}^{(N - 1)}} \frac{{〚 v_{h} 〛}_{x_{N}}^{2}}{h} \\ + {\sum_{n = 1}^{N - 1} (\frac{{(K_{1}^{(n - 1)} C_{tr, p - 1}^{(n - 1)})}^{2}}{2 ε^{(n - 1)} K_{0}^{(n - 1)}} + \frac{{(K_{1}^{(n)} C_{tr, p - 1}^{(n)})}^{2}}{2 ε^{(n)} K_{0}^{(n)}}) \frac{v_{h}_{x_{n}}^{2}}{2 h})}^{\frac{1}{2}} . \end{array}$ (96)

From those inequalities, we obtain an upper bound $\forall u_{h}, v_{h} \in V_{0}^{p} (ℰ_{h})$ , as follows

$\begin{matrix} | {\tilde{a}}_{h} (u_{h}, v_{h}; \bar{u}) | \leq {(\sum_{n = 0}^{N - 1} K_{1}^{(n)} {‖ {u^{'}}_{h} ‖}_{I_{n}}^{2})}^{\frac{1}{2}} {(\sum_{n = 0}^{N - 1} K_{1}^{(n)} {‖ {v^{'}}_{h} ‖}_{I_{n}}^{2})}^{\frac{1}{2}} \\ + {(\sum_{n = 0}^{N - 1} 2 ε^{(n)} K_{1}^{(n)} {‖ {u^{'}}_{h} ‖}_{I_{n}}^{2})}^{\frac{1}{2}} {(\frac{σ_{0}^{*}}{h} {〚 v_{h} 〛}_{x_{0}}^{2} + \sum_{n = 1}^{N - 1} \frac{σ_{n - 1}^{*} + σ_{n}^{*}}{2 h} {〚 v_{h} 〛}_{x_{n}}^{2} + \frac{σ_{N}^{*}}{h} {〚 v_{h} 〛}_{x_{N}}^{2})}^{\frac{1}{2}} \\ + \max (σ_{0}, σ_{N}, \max_{\begin{matrix} n = 1, \dots, N - 1 \end{matrix}} (\frac{σ_{n}}{2})) ‖ u_{h} ‖ ‖ v_{h} ‖ \\ \leq \max_{\begin{matrix} n = 0, \dots, N - 1 \end{matrix}} (K_{1}^{(n)}) ‖ u_{h} ‖ ‖ v_{h} ‖ \\ + \sqrt{max_{\begin{matrix} n = 0, \dots, N - 1 \end{matrix}} (2 ε^{(n)} K_{1}^{(n)}) max (σ_{0}^{*}, σ_{N}^{*}, max_{\begin{matrix} n = 1, \dots, N - 1 \end{matrix}} (\frac{σ_{n}^{*}}{2}))} ‖ u_{h} ‖ ‖ v_{h} ‖ \\ + \max (σ_{0}, σ_{N}, \max_{\begin{matrix} n = 1, \dots, N - 1 \end{matrix}} (\frac{σ_{n}}{2})) ‖ u_{h} ‖ ‖ v_{h} ‖ \\ \leq \tilde{C} ‖ u_{h} ‖ ‖ v_{h} ‖ \end{matrix}$ (97)

where

$\begin{matrix} \tilde{C} (ϵ) = max_{\begin{matrix} n = 0, \dots, N - 1 \end{matrix}} (K_{1}^{(n)}) + \sqrt{max_{\begin{matrix} n = 0, \dots, N - 1 \end{matrix}} (2 ε^{(n)} K_{1}^{(n)}) max (σ_{0}^{*}, σ_{N}^{*}, max_{\begin{matrix} n = 1, \dots, N - 1 \end{matrix}} (\frac{σ_{n}^{*}}{2}))} \\ + \max (σ_{0}, σ_{N}, \max_{\begin{matrix} n = 1, \dots, N - 1 \end{matrix}} (\frac{σ_{n}}{2})) . \end{matrix}$ (98)

□

Proof of Lemma 6. An upper bound for $| l (v_{h}) |$ is established using Poincaré inequality and Cauchy Schwarz: $\forall v_{h} \in V_{0}^{p} (_{h}),$

$\begin{matrix} | \sum_{n = 0}^{N - 1} \int_{I_{n}} f v d x | \leq \sum_{n = 0}^{N - 1} {‖ f ‖}_{I_{n}} {‖ v_{h} ‖}_{I_{n}} \leq \sum_{n = 0}^{N - 1} {‖ f ‖}_{I_{n}} β_{n} {‖ {v^{'}}_{h} ‖}_{I_{n}} \\ \leq {(\sum_{n = 0}^{N - 1} {‖ f ‖}_{I_{n}}^{2})}^{\frac{1}{2}} {(\sum_{n = 0}^{N - 1} β_{n}^{2} {‖ {v^{'}}_{h} ‖}_{I_{n}}^{2})}^{\frac{1}{2}} \leq B ‖ v_{h} ‖ \end{matrix}$ (99)

with $B = max_{\begin{matrix} n = 0, \dots, N - 1 \end{matrix}} (β_{n}) {(\sum_{n = 0}^{N - 1} {‖ f ‖}_{I_{n}}^{2})}^{\frac{1}{2}}$ . □

Proof of Lemma 12. For a given $\bar{u} \in V_{0}^{p} (ℰ_{h})$ , let $v_{h}$ be a sequence in $V_{0}^{p} (ℰ_{h})$ bounded in the $‖ . ‖$ -norm and let $φ \in C_{0}^{\infty} (Ω)$ . For all $h \in ℝ_{+}^{*}$ , set $φ_{h} = π_{h} φ$ where $π_{h}$ denotes the $L^{2}$ -orthogonal projection onto $V_{0}^{p} (ℰ_{h})$ . Since $p \geq 1$ , infer $‖ φ - π_{h} φ ‖ \underset{h \to 0}{\to} 0$ . Owing to Equation (67) and since $G_{h}^{p} (φ) = φ^{'}$ because $φ \in C_{0}^{\infty} (Ω)$ , obtain for all $p \geq 0$

$G_{h}^{p} (π_{h} φ) \to φ^{'} strongly in L^{2} (Ω)$ (100)

One can observe that

${\tilde{a}}_{h} (v_{h}, π_{h} φ; \bar{u}) = \int_{Ω} K (x, \bar{u}) {\hat{G}}_{h}^{p} (v_{h}) G_{h}^{p} (π_{h} φ) d x + s_{h} (v_{h}, π_{h} φ) : = T_{1} + T_{2}$ (101)

Clearly as $h \to 0$ , $T_{1} \to \int_{Ω} K (x, \bar{u}) v^{'} φ^{'} d x$ owing to the weak convergence of ${\hat{G}}_{h}^{p} (v_{h})$ to $v^{'}$ and to the strong convergence of $G_{h}^{p} (π_{h} φ)$ to $φ^{'}$ . Furthermore, using Cauchy-Schwarz inequality yields:

$\begin{matrix} | T_{2} | = | s_{h} (v_{h}, π_{h} φ) | \\ = | \frac{σ_{0}}{h} {〚 v_{h} 〛}_{x_{0}} {〚 π_{h} φ 〛}_{x_{0}} + \sum_{n = 1}^{N - 1} \frac{σ_{n - 1} + σ_{n}}{2 h} {〚 v_{h} 〛}_{x_{n}} {〚 π_{h} φ 〛}_{x_{n}} + \frac{σ_{N}}{h} {〚 v_{h} 〛}_{x_{N}} {〚 π_{h} φ 〛}_{x_{N}} | \\ \leq {(σ_{0}^{2} \frac{{〚 v_{h} 〛}_{x_{0}}^{2}}{h} + \sum_{n = 1}^{N - 1} \frac{{(σ_{n - 1} + σ_{n})}^{2}}{4} \frac{{〚 v_{h} 〛}_{x_{n}}^{2}}{h} + σ_{N}^{2} \frac{{〚 v_{h} 〛}_{x_{N}}^{2}}{h})}^{\frac{1}{2}} {(\sum_{n = 0}^{N} \frac{{〚 π_{h} φ 〛}_{x_{n}}^{2}}{h})}^{\frac{1}{2}} \\ \leq C {| v_{h} |}_{J} {| π_{h} φ |}_{J} \end{matrix}$ (102)

where

$C = max {σ_{0}^{2}, \frac{{(σ_{n - 1} + σ_{n})}^{2}}{4}, σ_{N}^{2}} .$ (103)

Since ${| v_{h} |}_{J}$ is bounded by assumption and since ${| π_{h} φ |}_{J} = {| φ - π_{h} φ |}_{J} \underset{h \to 0}{\to} 0$ , infer $T_{2} \underset{h \to 0}{\to} 0$ . □

Proof of the Theorem 13. For a given $\bar{u} \in V_{0}^{p} (ℰ_{h})$ , owing to the discrete coercivity of ${\tilde{a}}_{h}$ , the sequence $u_{h}$ is bounded in the $‖ . ‖$ -norm. Theorem 11 implies that there is $v \in H_{0}^{1} (Ω)$ such that up to a subsequence, $u_{h} \to v$ in $L^{2} (Ω)$ and for all $p \geq 0$ , $G_{h}^{p} (u_{h}) ⇀ v^{'}$ weakly in $L^{2} (Ω)$ as $h \to 0$ . Let $φ \in C_{0}^{\infty} (Ω)$ . Owing Lemma 12, ${\tilde{a}}_{h} (u_{h}, π_{h} φ; \bar{u}) \to \tilde{a} (v, φ)$ as $h \to 0$ . Since $u_{h}$ solves the discrete linearized problem $({\tilde{W}}_{h})$ , infer as $h \to 0$

$\begin{matrix} {\tilde{a}}_{h} (u_{h} - π_{h} φ, u_{h} - π_{h} φ; \bar{u}) = {\tilde{a}}_{h} (u_{h}, u_{h} - π_{h} φ; \bar{u}) - {\tilde{a}}_{h} (π_{h} φ, u_{h} - π_{h} φ; \bar{u}) \\ \to \tilde{a} (v, v - φ) - \tilde{a} (φ, v - φ) \\ \to \int_{Ω} (v - φ) f d x - \int_{Ω} K (x, \bar{u}) φ^{'} {(v - φ)}^{'} d x \end{matrix}$ (104)

Hence, using ${\tilde{a}}_{h} (v_{h}, v_{h}; \bar{u}) \geq C^{*} {‖ v_{h} ‖}^{2}$ from Lemma 4

$\begin{array}{l} C^{*} ‖ u_{h} - π_{h} φ ‖ \leq {\tilde{a}}_{h} (u_{h} - π_{h} φ, u_{h} - π_{h} φ; \bar{u}) \\ \Leftrightarrow \underset{h \to 0}{\lim \sup} C^{*} ‖ u_{h} - π_{h} φ ‖ \leq \underset{h \to 0}{\lim \sup} {\tilde{a}}_{h} (u_{h} - π_{h} φ, u_{h} - π_{h} φ; \bar{u}) \\ \leq | \int_{Ω} (v - φ) f d x - \int_{Ω} K (x, \bar{u}) φ^{'} {(v - φ)}^{'} d x | \\ \leq {‖ f ‖}_{L^{2} (Ω)} {‖ v - φ ‖}_{L^{2} (Ω)} + K_{1} {‖ φ^{'} ‖}_{L^{2} (Ω)} {‖ {(v - φ)}^{'} ‖}_{L^{2} (Ω)} \\ \leq C_{f, φ} {({‖ v - φ ‖}_{L^{2} (Ω)}^{2} + {‖ {(v - φ)}^{'} ‖}_{L^{2} (Ω)}^{2})}^{\frac{1}{2}} \\ \leq C_{f, φ} {‖ v - φ ‖}_{H^{1} (Ω)} \end{array}$ (105)

with $C_{f, φ} = {({‖ f ‖}_{L^{2} (Ω)}^{2} + K_{1} {‖ φ^{'} ‖}_{L^{2} (Ω)}^{2})}^{\frac{1}{2}}$ . As a consequence,

$\underset{h \to 0}{\lim \sup} ‖ u_{h} - π_{h} φ ‖ \leq \frac{1}{C^{*}} C_{f, φ} {‖ v - φ ‖}_{H^{1} (Ω)} .$ (106)

One can observe that the choice for ${\hat{G}}_{h}^{p}$ satisfy the stability property

$\forall v_{h} \in V_{0}^{p} (ℰ_{h}), {‖ {\hat{G}}_{h}^{p} (v_{h}) ‖}_{L^{2} (Ω)} \leq \hat{C} ‖ v_{h} ‖$ (107)

for $\hat{C}$ independent of $h$ . As a result,

$\underset{h \to 0}{\lim \sup} {‖ {\hat{G}}_{h}^{p} (u_{h}) - {\hat{G}}_{h}^{p} (π_{h} φ) ‖}_{L^{2} (Ω)} \leq \hat{C} \frac{1}{C^{*}} C_{f, φ} {‖ v - φ ‖}_{H^{1} (Ω)}$ (108)

because

$\begin{array}{l} {‖ {\hat{G}}_{h}^{p} (u_{h}) - {\hat{G}}_{h}^{p} (π_{h} φ) ‖}_{L^{2} (Ω)} \leq \hat{C} ‖ u_{h} - π_{h} φ ‖ \\ \Leftrightarrow \underset{h \to 0}{\lim \sup} {‖ {\hat{G}}_{h}^{p} (u_{h}) - {\hat{G}}_{h}^{p} (π_{h} φ) ‖}_{L^{2} (Ω)} \leq \hat{C} \underset{h \to 0}{\lim \sup} ‖ u_{h} - π_{h} φ ‖ \\ \leq \hat{C} \frac{1}{C^{*}} C_{f, φ} {‖ v - φ ‖}_{H^{1} (Ω)} \end{array}$ (109)

And since ${\hat{G}}_{h}^{p} (π_{h} φ)$ strongly converges to $φ^{'}$ in $L^{2} (Ω)$ , this yields

$\underset{h \to 0}{\lim \sup} {‖ {\hat{G}}_{h}^{p} (u_{h}) - φ^{'} ‖}_{L^{2} (Ω)} \leq \hat{C} \frac{1}{C^{*}} C_{f, φ} {‖ v - φ ‖}_{H^{1} (Ω)} .$ (110)

Since $φ$ is arbitrary in $C_{0}^{\infty} (Ω)$ , and since this space is dense in $H_{0}^{1} (Ω)$ , the term on the right hand side can be made as small as desired taking $φ = v$ , infer

${\hat{G}}_{h}^{p} (u_{h}) \underset{h \to 0}{\to} v^{'} strongly in L^{2} (Ω)$ (111)

As a result, taking $φ$ arbitrary in $C_{0}^{\infty} (Ω)$ yields

$\int_{Ω} K (x, \bar{u}) v^{'} φ^{'} d x \underset{h \to 0}{\leftarrow} \int_{Ω} K (x, \bar{u}) {u^{'}}_{h} π_{h} φ^{'} d x = {\tilde{a}}_{h} (u_{h}, π_{h} φ) = \int_{Ω} f π_{h} φ \underset{h \to 0}{\to} \int_{Ω} f φ d x$ (112)

using Lemma 12., i.e., $v$ solves the Poisson problem by density of $C_{0}^{\infty} (Ω)$ in $H^{1} (Ω)$ . Since the solution $u$ to the Poisson problem is unique, the whole sequence $u_{h}$ strongly converges to $u$ in $L^{2} (Ω)$ and, for all $p \geq 0$ , the sequence ${(G_{h}^{p} (u_{h}))}_{h \in ℝ_{+}^{*}}$ weakly converges to $u^{'}$ in $L^{2} (Ω)$ .

$\begin{matrix} {\tilde{a}}_{h} (u_{h}, u_{h}; \bar{u}) = \int_{Ω} K (x, \bar{u}) {\hat{G}}_{h}^{p} (u_{h}) G_{h}^{p} (u_{h}) d x + s_{h} (u_{h}, u_{h}) \\ \geq \int_{Ω} K (x, \bar{u}) {\hat{G}}_{h}^{p} (u_{h}) G_{h}^{p} (u_{h}) d x \end{matrix}$ (113)

Thus

$\underset{h \to 0}{\lim \inf} {\tilde{a}}_{h} (u_{h}, u_{h}; \bar{u}) \geq \underset{h \to 0}{\lim \inf} \int_{Ω} K (x, \bar{u}) {\hat{G}}_{h}^{p} (u_{h}) G_{h}^{p} (u_{h}) d x \geq \int_{Ω} K (x, \bar{u}) u^{'} u^{'} d x$ (114)

Furthermore

$\int_{Ω} K (x, \bar{u}) {\hat{G}}_{h}^{p} (u_{h}) G_{h}^{p} (u_{h}) d x \leq {\tilde{a}}_{h} (u_{h}, u_{h}; \bar{u}) = \int_{Ω} f u_{h} d x$ (115)

yielding with Equation (112)

$\begin{matrix} \underset{h \to 0}{\lim \sup} \int_{Ω} K (x, \bar{u}) {\hat{G}}_{h}^{p} (u_{h}) G_{h}^{p} (u_{h}) d x \leq \underset{h \to 0}{\lim \sup} {\tilde{a}}_{h} (u_{h}, u_{h}; \bar{u}) \\ = \underset{h \to 0}{\lim \sup} \int_{Ω} f u_{h} d x \leq \int_{Ω} K (x, \bar{u}) u^{'} u^{'} d x \end{matrix}$ (116)

Thus, $\int_{Ω} K (x, \bar{u}) {\hat{G}}_{h}^{p} (u_{h}) G_{h}^{p} (u_{h}) d x \underset{h \to 0}{\to} \int_{Ω} K (x, \bar{u}) u^{'} u^{'} d x$ strongly. Moreover, ${\tilde{a}}_{h} (u_{h}, u_{h}; \bar{u}) \underset{h \to 0}{\to} \int_{Ω} K (x, \bar{u}) u^{'} u^{'} d x$ strongly. Owing that

$\begin{array}{l} {\tilde{a}}_{h} (u_{h}, u_{h}; \bar{u}) = \int_{Ω} K (x, \bar{u}) {\hat{G}}_{h}^{p} (u_{h}) G_{h}^{p} (u_{h}) d x + s_{h} (u_{h}, u_{h}) \\ \geq \int_{Ω} K (x, \bar{u}) {\hat{G}}_{h}^{p} (u_{h}) G_{h}^{p} (u_{h}) d x + min_{\begin{matrix} n = 0, \dots, N \end{matrix}} (σ_{n}) {| u_{h} |}_{J}^{2} \\ \Leftrightarrow min_{\begin{matrix} n = 0, \dots, N \end{matrix}} (σ_{n}) {| u_{h} |}_{J}^{2} \leq a_{h} (u_{h}, u_{h}) - \int_{Ω} K (x, \bar{u}) {\hat{G}}_{h}^{p} (u_{h}) G_{h}^{p} (u_{h}) d x \end{array}$ (117)

and since $min_{\begin{matrix} n = 0, \dots, N \end{matrix}} (σ_{n}) > 0$ and the right-hand side tends to zero, ${| u_{h} |}_{J} \to 0$ .

${‖ {u^{'}}_{h} - u^{'} ‖}_{L^{2} (Ω)} = {‖ {\hat{G}}_{h}^{p} (u_{h}) - u^{'} ‖}_{L^{2} (Ω)} \to 0,$ (118)

□

Conflicts of Interest

The authors declare no conflicts of interest regarding the publication of this paper.

References

[1]	Darcy, H. (1856) Les Fontaines Publiques de La Ville de Dijon. Librairie des Corps Impériaux des Ponts et des Chaussées et des Mines.
[2]	Buckingham, E. (1907) Studies on the Movement of Soil Moisture, Volume 38. Bulletin Edition, Washington, Government Publishing Office.
[3]	Richardson, L.F. (1922) Weather Prediction by Numerical Process. Cambridge University Press.
[4]	Richards, L.A. (1931) Capillary Conduction of Liquids through Porous Mediums. Physics, 1, 318-333.[CrossRef]
[5]	Rubin, J. (1968) Theoretical Analysis of Two-Dimensional, Transient Flow of Water in Unsaturated and Partly Unsaturated Soils. Soil Science Society of America Journal, 32, 607-615.[CrossRef]
[6]	Cooley, R.L. (1971) A Finite Difference Method for Unsteady Flow in Variably Saturated Porous Media: Application to a Single Pumping Well. Water Resources Research, 7, 1607-1625.[CrossRef]
[7]	Clément, J.-B. (2021) Numerical Simulation of Flows in Unsaturated Porous Media by an Adaptive Discontinuous Galerkin Method: Application to Sandy Beaches. Ph.D. Thesis, Université de Toulon.
[8]	Irmay, S. (1954) On the Hydraulic Conductivity of Unsaturated Soils. Eos, Transactions American Geophysical Union, 35, 463-467.
[9]	Vachaud, G. and Thony, J. (1971) Hysteresis during Infiltration and Redistribution in a Soil Column at Different Initial Water Contents. Water Resources Research, 7, 111-127.[CrossRef]
[10]	van Genuchten, M.T. (1980) A Closed-Form Equation for Predicting the Hydraulic Conductivity of Unsaturated Soils. Soil Science Society of America Journal, 44, 892-898.[CrossRef]
[11]	Dolejší, V., Kuraz, M. and Solin, P. (2019) Adaptive Higher-Order Space-Time Discontinuous Galerkin Method for the Computer Simulation of Variably-Saturated Porous Media Flows. Applied Mathematical Modelling, 72, 276-305.[CrossRef]
[12]	Clément, J., Golay, F., Ersoy, M. and Sous, D. (2021) An Adaptive Strategy for Discontinuous Galerkin Simulations of Richards’ Equation: Application to Multi-Materials Dam Wetting. Advances in Water Resources, 151, Article ID: 103897.[CrossRef]
[13]	Pietro, D.A.D. and Ern, A. (2012) Mathematical Aspects of Discontinuous Galerkin Methods. Springer.
[14]	Rivière, B. (2008) Discontinuous Galerkin Methods for Solving Elliptic and Parabolic Equations. Society for Industrial and Applied Mathematics.[CrossRef]
[15]	Shin, H.-G. (2025) A Posteriori Error Estimates and Adaptivity for Solving the Richards Equation. Ph.D. Thesis, Charles University. https://dspace.cuni.cz/bitstream/handle/20.500.11956/198009/140128914.pdf?sequence=1
[16]	Congreve, S., Dolejší, V. and Sakić, S. (2024) Error Analysis for Local Discontinuous Galerkin Semidiscretization of Richards’ Equation. IMA Journal of Numerical Analysis, 45, 580-630.[CrossRef]
[17]	Epshteyn, Y. and Rivière, B. (2007) Estimation of Penalty Parameters for Symmetric Interior Penalty Galerkin Methods. Journal of Computational and Applied Mathematics, 206, 843-872.[CrossRef]
[18]	Süli, E. and Mayers, D.F. (2003) An Introduction to Numerical Analysis. Cambridge University Press.[CrossRef]
[19]	Li, H., Farthing, M.W., Dawson, C.N. and Miller, C.T. (2007) Local Discontinuous Galerkin Approximations to Richards’ Equation. Advances in Water Resources, 30, 555-575.[CrossRef]
[20]	Farthing, M.W., Kees, C.E. and Miller, C.T. (2003) Mixed Finite Element Methods and Higher Order Temporal Approximations for Variably Saturated Groundwater Flow. Advances in Water Resources, 26, 373-394.[CrossRef]
[21]	Hay, A., Etienne, S., Pelletier, D. and Garon, A. (2015) Hp-Adaptive Time Integration Based on the BDF for Viscous Flows. Journal of Computational Physics, 291, 151-176.[CrossRef]
[22]	Dahlquist, G.G. (1963) A Special Stability Problem for Linear Multistep Methods. BIT Numerical Mathematics, 3, 27-43.[CrossRef]
[23]	Hairer, E. and Wanner, G. (1996) Solving Ordinary Differential Equations II, Volume 14 of Springer Series in Computational Mathematics. Springer.
[24]	Skelboe, S. (1977) The Control of Order and Steplength for Backward Differentiation Methods. BIT, 17, 91-107.[CrossRef]
[25]	Lehmann, F. and Ackerer, P. (1998) Comparison of Iterative Methods for Improved Solutions of the Fluid Flow Equation in Partially Saturated Porous Media. Transport in Porous Media, 31, 275-292.[CrossRef]
[26]	List, F. and Radu, F.A. (2016) A Study on Iterative Methods for Solving Richards’ Equation. Computational Geosciences, 20, 341-353.[CrossRef]
[27]	Costa-Solé, A., Ruiz-Gironés, E. and Sarrate, J. (2021) High-Order Hybridizable Discontinuous Galerkin Formulation with Fully Implicit Temporal Schemes for the Simulation of Two-Phase Flow through Porous Media. International Journal for Numerical Methods in Engineering, 122, 3583-3612.[CrossRef]
[28]	Farthing, M.W. and Ogden, F.L. (2017) Numerical Solution of Richards’ Equation: A Review of Advances and Challenges. Soil Science Society of America Journal, 81, 1257-1269.[CrossRef]
[29]	Bergamaschi, L. and Putti, M. (1999) Mixed Finite Elements and Newton-Type Linearizations for the Solution of Richards’ Equation. International Journal for Numerical Methods in Engineering, 45, 1025-1046.[CrossRef]
[30]	Miller, C.T., Abhishek, C. and Farthing, M.W. (2006) A Spatially and Temporally Adaptive Solution of Richards’ Equation. Advances in Water Resources, 29, 525-545.[CrossRef]
[31]	Brad Thoms, R., Johnson, R.L. and Healy, R.W. (2006) User’s Guide to the Variably Saturated Flow (VSF) Process to MODFLOW.[CrossRef]
[32]	Ersoy, M., Golay, F. and Yushchenko, L. (2013) Adaptive Multiscale Scheme Based on Numerical Density of Entropy Production for Conservation Laws. Open Mathematics, 11, 1392-1415.[CrossRef]
[33]	Golay, F., Ersoy, M., Yushchenko, L. and Sous, D. (2015) Block-Based Adaptive Mesh Refinement Scheme Using Numerical Density of Entropy Production for Three-Dimensional Two-Fluid Flows. International Journal of Computational Fluid Dynamics, 29, 67-81.[CrossRef]
[34]	Altazin, T., Ersoy, M., Golay, F., Sous, D. and Yushchenko, L. (2016) Numerical Investigation of BB-AMR Scheme Using Entropy Production as Refinement Criterion. International Journal of Computational Fluid Dynamics, 30, 256-271.[CrossRef]
[35]	Boccardo, L., Thierry, G. and Murat, F. (1992) Unicité de la solution de certaines équations elliptiques non linéaires. Comptes rendus de l’Académie des Sciences Paris, 315, 1159-1164.
[36]	Bassi, F. and Rebay, S. (1997) A High-Order Accurate Discontinuous Finite Element Method for the Numerical Solution of the Compressible Navier-Stokes Equations. Journal of Computational Physics, 131, 267-279.[CrossRef]
[37]	Brezzi, F., Manzini, G., Marini, D., Pietra, P. and Russo, A. (2000) Discontinuous Galerkin Approximations for Elliptic Problems. Numerical Methods for Partial Differential Equations, 16, 365-378.[CrossRef]
[38]	Haverkamp, R., Vauclin, M., Touma, J., Wierenga, P.J. and Vachaud, G. (1977) A Comparison of Numerical Simulation Models for One-Dimensional Infiltration. Soil Science Society of America Journal, 41, 285-294.[CrossRef]
[39]	Vauclin, M., Khanji, D. and Vachaud, G. (1979) Experimental and Numerical Study of a Transient, Two-Dimensional Unsaturated-Saturated Water Table Recharge Problem. Water Resources Research, 15, 1089-1101.[CrossRef]
[40]	Dolejší, V. and Feistauer, M. (2015) Discontinuous Galerkin Method. Springer International Publishing.
[41]	Celia, M.A., Bouloutas, E.T. and Zarba, R.L. (1990) A General Mass-Conservative Numerical Solution for the Unsaturated Flow Equation. Water Resources Research, 26, 1483-1496.[CrossRef]
[42]	Philip, J.R. (2006) The Theory of Infiltration. 1. The Infiltration Equation and Its Solution. Soil Science, 171, S34-S46.[CrossRef]
[43]	Sochala, P. (2008) Méthodes Numériques Pour Les Écoulements Souterrains et Couplage Avec Le Ruissellement. Ph.D. Thesis, Paris Est.
[44]	Manzini, G. and Ferraris, S. (2004) Mass-Conservative Finite Volume Methods on 2-D Unstructured Grids for the Richards’ Equation. Advances in Water Resources, 27, 1199-1215.[CrossRef]
[45]	Dogan, A. and Motz, L.H. (2005) Saturated-Unsaturated 3D Groundwater Model. II: Verification and Application. Journal of Hydrologic Engineering, 10, 505-515.[CrossRef]
[46]	Twarakavi, N.K.C., Šimůnek, J. and Seo, S. (2008) Evaluating Interactions between Groundwater and Vadose Zone Using the Hydrus-Based Flow Package for Modflow. Vadose Zone Journal, 7, 757-768.[CrossRef]
[47]	Zha, Y., Shi, L., Ye, M. and Yang, J. (2013) A Generalized Ross Method for Two-and Three-Dimensional Variably Saturated Flow. Advances in Water Resources, 54, 67-77.[CrossRef]
[48]	Warburton, T. and Hesthaven, J.S. (2003) On the Constants in Hp-Finite Element Trace Inverse Inequalities. Computer Methods in Applied Mechanics and Engineering, 192, 2765-2773.[CrossRef]

Journals Menu

Follow SCIRP

	customer@scirp.org
	+86 18163351462(WhatsApp)
	1655362766

	Paper Publishing WeChat

Journals Menu

Home

About SCIRP

Service

Policies