Solving Neumann Boundary Problem with Kernel-Regularized Learning Approach

Xuexue Ran; Baohuai Sheng

doi:10.4236/jamp.2024.124069

Journal of Applied Mathematics and Physics > Vol.12 No.4, April 2024

Solving Neumann Boundary Problem with Kernel-Regularized Learning Approach

Xuexue Ran, Baohuai Sheng
Department of Mathematics, Shaoxing University, Shaoxing, China.
DOI: 10.4236/jamp.2024.124069 PDF HTML XML 26 Downloads 197 Views

Abstract

We provide a kernel-regularized method to give theory solutions for Neumann boundary value problem on the unit ball. We define the reproducing kernel Hilbert space with the spherical harmonics associated with an inner product defined on both the unit ball and the unit sphere, construct the kernel-regularized learning algorithm from the view of semi-supervised learning and bound the upper bounds for the learning rates. The theory analysis shows that the learning algorithm has better uniform convergence according to the number of samples. The research can be regarded as an application of kernel-regularized semi-supervised learning.

Keywords

Neumann Boundary Value, Kernel-Regularized Approach, Reproducing Kernel Hilbert Space, The Unit Ball, The Unit Sphere

Share and Cite:

Ran, X. and Sheng, B. (2024) Solving Neumann Boundary Problem with Kernel-Regularized Learning Approach. Journal of Applied Mathematics and Physics, 12, 1101-1125. doi: 10.4236/jamp.2024.124069.

1. Introduction

It is known that approximation theory and skills have been used to give the analytic solution for PDE with boundary value problems and form the method of fundamental solutions (see e.g. [1] - [6] ; Appendices 1-3). Recently, the kernel-based collocation method for solving several PDEs with boundary problems has been developed (see e.g. [7] [8] [9] [10] [11] ) from the view of minimal norm interpolation of reproducing kernel Hilbert spaces (RKHSs), the existent theorem and the representation theorem for the numerical solutions are shown qualitatively. It is suggested by [12] that kernel-regularized gradient learning may be used to give numerical solution for PDE. Indeed, some kernel-regularized learning algorithms have been used to study the PDE with Dirichlet boundary value problem quantitatively (see e.g. [13] [14] [15] [16] ). For a given domain $D \subset R^{d}$ , the pairwise distinct collocation points chosen in the kernel-based collocation method are:

$X_{D} = {x_{1}, \dots, x_{N}} \subset D,$

$X_{\partial D} = {x_{N + 1}, \dots, x_{N + M}} \subset \partial D,$

and the sample values from the PDE (where P and B are given differential operators):

${\begin{cases} P u = f^{*}, in D \subset R^{d}, \\ B u = g, on \partial D \end{cases}$ (1)

at the given collocation, points are (see [7] [11] ):

$y_{j} = f^{*} (x_{j}) + η_{x_{j}}, j = 1,2, \dots, N,$

$y_{k} = g (x_{N + k}), k = 1, 2, \dots, M,$

where the random variable $\vec{η} : = {(η_{x_{1}}, \dots, η_{x_{N}})}^{Τ} ~ N (\vec{0}, {\vec{Ψ}}_{\partial D})$ .

There are two typical PDE problems (see Chapter 1 of [17] ). When $P = \nabla = \sum_{j = 1}^{d} \frac{\partial^{2}}{\partial x_{j}^{2}}$ is the Laplace operator and $B = I$ is the unit operator, we have the Dirichlet problem:

${\begin{cases} Δ u = f^{*}, in D \subset R^{d}, \\ u = g, on \partial D \end{cases}$ (2)

When $P = I$ is the unit operator and $B = \frac{\partial}{\partial \vec{n}}$ is the directional derivative along the outward normal vector $\vec{n}$ , i.e. $B u = \frac{\partial u}{\partial \vec{n}} = \nabla u \cdot \vec{n}$ , problem (1) become the Neumann boundary problem:

${\begin{cases} u = f^{*}, in D \subset R^{d}, \\ \frac{\partial u}{\partial \vec{n}} = g, on \partial D . \end{cases}$ (3)

The observations ${(x_{j}, y_{j})}_{j = 1}^{N}$ can be regarded as the supervised labeled observations in the setting of semi-supervised learning and $y_{k} (k = 1,2, \dots, M)$ maybe regarded as the unlabeled ones. This similarity encourages us to construct kernel-regularized learning algorithms to give numerical solution for problem (3) referring to the kernel semi-supervised learning frameworks (see e.g. [18] [19] [20] [21] ). Along this line, we constructed in [15] a kind of kernel-regularized learning algorithm for solving problem (2) and showed the convergence rate. In the present paper, we shall construct a kind of kernel-regularized regression algorithm to solve problem (2) in case that D is the unit ball and $\partial D$ is the unit sphere. To this aim, we restate problem (3) in the setting of Sobolev spaces.

Let $Ω \subset R^{d}$ be a given bounded closed domain with a smoothness surface (i.e. the outward normal derivative is continuous) and $ρ_{Ω}$ be a Borel measure on Ω. Let $C^{(s)} (Ω)$ denote the set of functions such that $\partial_{x}^{α} f (x) \in C (Ω)$ and $| α | \leq s$ , where for $α = (α_{1}, \dots, α_{d}) \in Z_{+}^{d}$ we define $| α | = \sum_{i = 1}^{d} α_{i}$ and

$\partial^{α} f (x) : = \partial_{x}^{α} f (x) = \partial_{1}^{α_{1}} \dots \partial_{d}^{α_{d}} f (x) = \frac{\partial^{| α |} f (x)}{\partial {(x^{1})}^{α_{1}} \dots {(x^{d})}^{α_{d}}} .$

For a $K \in C^{(s)} (Ω \times Ω)$ , we define:

$\partial_{x}^{α} \partial_{y}^{β} K (x, y) = \frac{\partial^{| α | + | β |} K (x, y)}{\partial {(x^{1})}^{α_{1}} \dots {(x^{d})}^{α_{d}} \partial {(y^{1})}^{β_{1}} \dots {(y^{d})}^{β_{d}}}, x = (x^{1}, \dots, x^{d}), y = (y^{1}, \dots, y^{d}) .$

Denoted by $W^{1} (ρ_{Ω})$ , the set of all functions whose 1-order partial derivatives are all in $L^{2} (ρ_{Ω})$ , i.e.

$W^{1} (ρ_{Ω}) = {f : {‖ f ‖}_{W^{1} (ρ_{Ω})} = {(\sum_{| α | \leq 1} {‖ D^{α} f ‖}_{2, ρ_{Ω}}^{2})}^{1 / 2} < + \infty},$

and $L^{2} (ρ_{Ω}) = {f (x) : {‖ f ‖}_{2, ρ_{Ω}} = {(\int_{Ω} {| f (x) |}^{2} d ρ_{Ω})}^{\frac{1}{2}} < + \infty}$ . Defined by $\frac{\partial}{\partial \vec{n}}$ , the outward normal derivative operator, i.e. $\frac{\partial f (x)}{\partial \vec{n}} : = \frac{\partial_{x} f (x)}{\partial \vec{n}} = \nabla f (x) \cdot \vec{n}$ , where $\nabla f (x) = (\frac{\partial f}{\partial x^{1}}, \dots, \frac{\partial f}{\partial x^{d}})$ and $\vec{n}$ is the outward normal vector at $x = {x^{1}, \dots, x^{d}} \in \partial Ω$ .

To borrow the setting of learning theory (see [22] ), we rewrite the problem (3). Let $f^{*}$ be an unknown function and g be a given function. Then, the collocation points $z = {x_{1}, \dots, x_{m}} \subset i n t Ω$ and $ν = {x_{m + 1}, \dots, x_{m + u}} \subset \partial Ω$ for the Neumann boundary problem:

${\begin{array}{l} f (x) = f^{*} (x), & x \in Ω, \\ \frac{\partial_{x} f (x)}{\partial \vec{n}} = g (x), & x \in \partial Ω \end{array}$ (4)

are scattered points with values $y_{x_{i}} = f^{*} (x_{i}) + η_{x_{i}}$ , $i = 1,2, \dots, m$ , and $y_{x_{i}} = g (x_{i})$ , $i = m + 1, m + 2, \dots, m + u$ , where for a given $x_{i} = (x_{i}^{1}, \dots, x_{i}^{d}) \in Ω$ , $ξ_{x_{i}}$ is a random variable subject to a condition distribution $ρ (y | x_{i})$ satisfying $| ρ (y | x_{i}) | \leq B$ , B is a given constant number, $E_{ρ (\cdot | x)} (η_{x}) = \int_{[- B, B]} η_{x} (y) d ρ (y | x) = 0$ and $σ = {(\sum_{i = 1}^{m} σ_{x_{i}}^{2})}^{\frac{1}{2}} < + \infty$ and $σ_{x}^{2} = E_{ρ (\cdot | x)} (η_{x}^{2})$ . The correspondence of (4) is:

${\begin{array}{l} f (x_{i}) = y_{x_{i}}, & i = 1,2, \dots, m, \\ {\frac{\partial_{x} f (x)}{\partial \vec{n}} |}_{x = x_{i}} = g (x_{i}), & i = m + 1, \dots, m + u \end{array}$ (5)

We shall give an investigation on the convergence analysis of problem (5) when $Ω$ is the unit ball $B^{d} = {x \in R^{d} : ‖ x ‖ \leq 1}$ and $\partial Ω$ is the unit sphere $S^{d - 1} = {x \in R^{d} : ‖ x ‖ = 1}$ . The paper is organized as follows. In Section 2.1, we shall provide some notions on the kernel-regularized regression learning, which contain the concept of reproducing kernel Hilbert spaces, the concept of kernel-regularized regression learning model and the kernel-regularized semi-supervised regression learning model. In Section 2.2, we shall provide some notions and results on spherical analysis. In particular, we shall define an RKHS with spherical harmonics, which have the reproducing property for the outward normal vector operator. The Neumann boundary value problem with the RKHS as the hypothesis space is defined. Based on these notions, we define in Section 2.3 a kind of kernel-regularized learning algorithm for solving the Neumann boundary value problem, show the representation theorem and give the error decomposition, with which we give the learning rates (i.e. the main Theorem 2.1) in Section 2.4. In Section 3, we shall give some lemmas, which will be used to show Theorem 2.1 in Section 4. Section 5 is the appendices containing some knowledge about the convex function, a kind of RKHS $H_{K}^{\vec{n}} (ρ_{Ω})$ defined in a Sobolev space $H^{\vec{n}} (ρ_{Ω})$ associating a general domain $Ω$ and its boundary $\partial Ω$ , and a probability inequality defined on a general RKHS.

Throughout the paper, we denote by $A = O (B)$ the fact that there is a constant C independent of A and B such that $A \leq C B$ . We say $a ~ B$ if both $A = O (B)$ and $B = O (A)$ .

2. Kernel-Regularized Regression and Error Analysis

We first provide some notions and results about the kernel-regularized regression learning problem.

2.1. Notions and Results on Kernel-Regularized Regression

Follow all the definitions and notions in Section 1. Let $K_{x} (y) = K (x, y) : Ω \times Ω \to R$ be a Mercer kernel (i.e. it is continuous, symmetry and positive semi-definite, i.e. for any given integer $l \geq 1$ and any given set ${x_{1}, x_{2}, \dots, x_{l}} \subset Ω$ , the matrix ${(K (x_{i}, x_{j}))}_{i, j = 1,2, \dots, l}$ is positive definite) satisfying ${(\int_{Ω \times Ω} {| K (x, y) |}^{2} d ρ_{Ω} (x) d ρ_{Ω} (y))}^{\frac{1}{2}} < + \infty$ . The reproducing kernel Hilbert space $(H_{K}, {‖ \cdot ‖}_{K})$ associated with $K (x, y)$ is a Hilbert space consisting of all the real functions defined on Ω such that:

$f (x) = {〈 f, K_{x} 〉}_{K}, \forall x \in Ω, \forall f \in H_{K} .$

Define an operator $L_{K} (f, x)$ as:

$L_{K} (f, x) = \int_{Ω} f (y) K_{x} (y) d ρ_{Ω}, x \in Ω .$

Then, $L : L^{2} (ρ_{Ω}) \to L^{2} (ρ_{Ω})$ . Denoted by $λ_{k}$ the k-th eigenvalue associated with eigenfunction $φ_{k}$ . Then, we have by the Mercer theorem that:

$K (x, y) = \sum_{l = 0}^{+ \infty} λ_{l} φ_{l} (x) φ_{l} (y), x, y \in Ω,$

where we assume the convergence on the right side is absolute (for every $x, y \in Ω$ ) and uniform on $x, y \in Ω$ . If ${φ_{k}}_{k = 0}^{+ \infty}$ forms an orthonormal system in $L^{2} (ρ_{Ω})$ , then:

$\begin{matrix} H_{K} = L_{K}^{\frac{1}{2}} (L^{2} (ρ_{Ω})) \\ = {f (x) = \sum_{l = 0}^{+ \infty} a_{l} (f) φ_{l} (x) : {‖ f ‖}_{K} = {(\sum_{l = 0}^{+ \infty} \frac{{| a_{l} (f) |}^{2}}{λ_{l}})}^{\frac{1}{2}} < + \infty}, \end{matrix}$

where $L_{K} = L_{K}^{\frac{1}{2}} \circ L_{K}^{\frac{1}{2}}$ .

Let $z = {(x_{k}, y_{k})}_{k = 1}^{m}$ be a set of observations about an unknown function $f^{*}$ . Then, to obtain an good approximation of $f^{*}$ , one usually borrow the kernel-regularized regression learning model:

$f_{z} = \arg \min_{f \in H_{K}} (E_{z} (f) + λ {‖ f ‖}_{K}^{2}),$ (6)

where $E_{z} (f) = \frac{1}{m} \sum_{k = 1}^{m} {(y_{k} - f (x_{k}))}^{2}$ is the empirical variance. In learning theory, the convergence analysis is sum up to bound the convergence rate for the error ( [23] or [24] ):

${‖ f_{z} - f^{*} ‖}_{L^{2} (ρ_{Ω})} .$ (7)

When the observation set z has the form $z = {(x_{k}, y_{k})}_{k = 1}^{m} \cup {x_{m + l}}_{l = 1}^{u}$ , i.e. the nodes ${x_{m + l}}_{l = 1}^{u}$ have no labeled observation values ${y_{m + l}}_{l = 1}^{u}$ , we call this case the semi-supervised learning. In practical applications, most of the observations belong to semi-supervised samples since data is precious. Many mathematicians have paid their attentions to this field (see e.g. [18] [19] [21] [25] [26] [27] [28] ). The main ideas of dealing with this problem are add a term to make the use of unlabeled data, i.e. we need to modify (8) as the form of:

$f_{z, λ, γ} = \arg \min_{f \in H_{K}} (E_{z} (f) + γ Ω_{u, m} (f) + λ {‖ f ‖}_{K}^{2}), γ > 0.$ (8)

For example, in [19] , one choose $γ Ω_{u, m} (f) = \frac{γ}{{(u + m)}^{2}} \sum_{i, j = 1}^{l + u} {(f (x_{i}) - f (x_{j}))}^{2} W_{i, j}$ , where $W_{i, j}$ are edge weights in the data adjacency graph. Also, in [21] , one chooses:

$\begin{array}{l} γ Ω_{u, m} (f) = \frac{γ}{(m + u) (m + u - 1)} \sum_{i, j = 1, i \neq j}^{m + u} w_{i j}^{(σ)} \times {(f (x_{i}) - f (x_{j}))}^{2}, \\ w_{i j}^{(σ)} = \exp {- \frac{d_{K} (x_{i}, x_{j})}{σ}}, \end{array}$

where $σ > 0$ and

$d_{K} (x, y) = {‖ K_{x} - K_{y} ‖}_{K} = \sqrt{K (x, x) + K (y, y) - 2 K (x, y)} .$

These choices encourage us to choose suitable $γ Ω_{u, m} (f)$ to give good approximation solution for problem (5).

2.2. Some Results on Spherical Analysis

In this subsection, we shall define a kind of RKHS with spherical harmonics, with which define a kernel-regularized regression learning algorithm for solving problem (5) when Ω is the unit ball $B^{d}$ and show the learning rates.

Let $B^{d} = {x \in R^{d} : ‖ x ‖ \leq 1}$ denote the unit ball in d-dimensional Euclidean space $R^{d}$ with the usual inner product $〈 x, y 〉$ , and $‖ x ‖ = \sqrt{〈 x, x 〉}$ is the usual Euclidean norm. For weight $W_{μ} (x) = {(1 - {‖ x ‖}^{2})}^{μ}$ , $μ > - 1$ , we denote by $L_{p, μ} (B^{d}) \equiv L_{p} (B^{d}, W_{μ})$ , $1 \leq p < + \infty$ , the space of measurable functions defined on $B^{d}$ with:

${‖ f ‖}_{L_{p, μ} (B^{d})} = {(\int_{B^{d}} {| f (x) |}^{p} W_{μ} (x) d x)}^{\frac{1}{2}} < + \infty, 1 \leq p < + \infty$

and for $p = + \infty$ , we assume that $L_{\infty, μ}$ denotes the space $C (B^{d})$ of continuous functions on $B^{d}$ with the uniform norm.

We denote by $Π_{n}^{d}$ the space of all polynomials in d variables of degree at most n, and by $ν_{n}^{d} (W_{μ})$ the space of all polynomials of degree n which are orthogonal to polynomials of low degree in $L_{2, μ} (B^{d})$ . The $ν_{n}^{d} (W_{μ})$ is mutually orthogonal in $L_{2, μ} (B^{d})$ and (see [29] ):

$L_{2, μ} (B^{d}) = \oplus_{n = 0}^{\infty} ν_{n}^{d} (W_{μ}), Π_{n}^{d} = \oplus_{k = 0}^{n} ν_{k}^{d} (W_{μ}) .$

Let $d σ$ denote the Lebesgue measure on $S^{d - 1} = {x : ‖ x ‖ = 1}$ and denote the area of $S^{d - 1}$ by $σ_{d}$ , $σ_{d} = \int_{S^{d - 1}} d σ = 2 π^{d / 2} / Γ (d / 2)$ . Let $H_{n}^{d}$ denote the space of homogeneous harmonic polynomials of degree n, which are homogeneous polynomials of degree n satisfying equation $Δ p = \sum_{k = 1}^{d} \frac{\partial^{2} p}{\partial {(x^{k})}^{2}} = 0$ . Also, we denote by $P_{n}^{d}$ the set of homogeneous polynomials of degree n. It is well known that:

$a_{n}^{d} : = \dim H_{n}^{d} = (\begin{matrix} n + d - 1 \\ n - 1 \end{matrix}) - (\begin{matrix} n + d - 3 \\ n - 2 \end{matrix}) .$

Let $W^{1} {(B^{d})}_{μ}$ denote the set of functions whose 1-th derivatives are all in $L_{2, μ} (B^{d})$ , i.e.

$W^{1} {(B^{d})}_{μ} = {f : {‖ f ‖}_{W^{2} {(B^{d})}_{μ}} = {(\sum_{| α | \leq 1} {‖ \partial^{α} f ‖}_{L_{2, μ} (B^{d})}^{2})}^{1 / 2} < + \infty} .$

In this case, $S^{d - 1} = {x : {‖ x ‖}^{2} = \sum_{i = 1}^{d} {(x^{i})}^{2} = 1}$ and

$\frac{\partial_{x} f}{\partial \vec{n}} (x) = \sum_{i = 1}^{d} x^{i} \frac{\partial f}{\partial x^{i}} (x), x = (x^{1}, \dots, x^{d}) \in S^{d - 1} .$ (9)

Define a subclass of $W^{1} {(B^{d})}_{μ}$ as:

$H_{μ}^{\vec{n}} = {f \in W^{1} {(B^{d})}_{μ} : {‖ f ‖}_{H_{μ}^{\vec{n}}} = {(\int_{B^{d}} {| f (x) |}^{2} W_{μ} (x) d x + \int_{S^{d - 1}} {| \frac{\partial_{ξ} f}{\partial \vec{n}} (ξ) |}^{2} d σ (ξ))}^{\frac{1}{2}} < + \infty} .$

An inner product defined on $H_{μ}^{\vec{n}}$ is:

${〈 f, g 〉}_{H_{μ}^{\vec{n}}} = \int_{B^{d}} f (x) g (x) W_{μ} (x) d x + \int_{S^{d - 1}} \frac{\partial f}{\partial \vec{n}} (ξ) \frac{\partial g}{\partial \vec{n}} (ξ) d σ (ξ) .$

Denoted by $ν_{n}^{d} (W_{μ}, S)$ the space of orthogonal polynomials with respect to ${〈 \cdot, \cdot 〉}_{H_{μ}^{\vec{n}}}$ . Then, by Theorem 3 of [30] , we know $ν_{n}^{d} (W_{μ}, S)$ contains a mutually orthonormal basis ${Q_{n, k} \equiv Q_{n, k}^{d} : k = 1,2, \dots, a_{n}^{d}}$ with respect to ${〈 \cdot, \cdot 〉}_{H_{μ}^{\vec{n}}}$ . Then, there holds the expansion:

$f (x) ~ \sum_{k = 0}^{\infty} \sum_{l = 1}^{a_{k}^{d}} a_{k, l} (f) Q_{k, l} (x), x \in B^{d}, f \in H_{μ}^{\vec{n}},$

where $a_{k, l} (f) = {〈 f, Q_{k, l} 〉}_{μ}^{S}$ and by the Bessel inequality, we have:

${(\sum_{k = 0}^{\infty} \sum_{l = 1}^{d_{k}^{d}} {| a_{k, l} (f) |}^{2})}^{\frac{1}{2}} \leq {‖ f ‖}_{H_{μ}^{\vec{n}}} < + \infty .$

Let $K^{μ} (x, y) : B^{d} \times B^{d} \to R$ be a Mercer kernel with the form:

$K_{x}^{μ} (y) = K^{μ} (x, y) : = \sum_{k = 0}^{\infty} λ_{k} P_{k} (x, y), x \in B^{d}, y \in B^{d},$ (10)

where $P_{k} (x, y) = \sum_{l = 1}^{a_{k}^{d}} Q_{k, l} (x) Q_{k, l} (y)$ and $\sum_{k = 0}^{\infty} λ_{k} c_{k} < + \infty$ with $λ_{k} > 0$ being defined as $\sup_{x \in B^{d}} P_{k} (x, x) = c_{k} (k = 0, 1, 2, \dots)$ .

Define

$\begin{matrix} L_{K^{μ}} (f) (x) = L_{K^{μ}} (f, x) : = {〈 f, K_{x} (\cdot) 〉}_{H_{μ}^{\vec{n}}} \\ = \sum_{k = 0}^{\infty} λ_{k} \sum_{l = 1}^{a_{k}^{d}} a_{k, l} (f) Q_{k, l} (x), x \in B^{d} \end{matrix}$

and $H_{K^{μ}}^{\vec{n}} = L_{K^{μ}}^{\frac{1}{2}} (H_{μ}^{\vec{n}})$ . Then,

$H_{K^{μ}}^{\vec{n}} = {\sum_{k = 0}^{\infty} \sum_{l = 1}^{a_{k}^{d}} a_{k, l} (f) Q_{k, l} (x) : {‖ f ‖}_{K^{μ}} = {(\sum_{k = 0}^{\infty} \sum_{l = 1}^{a_{k}^{d}} \frac{{| a_{k, l} (f) |}^{2}}{λ_{k}})}^{\frac{1}{2}} < + \infty} .$

For $f (x) = \sum_{k = 0}^{\infty} \sum_{l = 1}^{a_{k}^{d}} a_{k, l} (f) Q_{k, l} (x) \in H_{K^{μ}}^{\vec{n}}$ and $g (x) = \sum_{k = 0}^{\infty} \sum_{l = 1}^{a_{k}^{d}} a_{k, l} (g) Q_{k, l} (x) \in H_{K^{μ}}^{\vec{n}}$ , we define:

${〈 f, g 〉}_{K^{μ}} = \sum_{k = 0}^{\infty} \sum_{l = 1}^{a_{k}^{d}} \frac{a_{k, l} (f) a_{k, l} (g)}{λ_{k}} .$

Then, we shall show in Proposition 2.1 that $H_{K^{μ}}^{\vec{n}}$ is an RKHS associated with kernel (10).

To give quantitatively description for the kernel $K^{μ}$ , we give two assumptions.

Assumption A. Assume $K^{μ} \in C^{(1)} (B^{d} \times B^{d})$ .

Assumption B. Assume $\sup_{x \in S^{d - 1}} | \frac{\partial_{x} P_{k} (x, x)}{\partial \vec{n}} | = {c^{'}}_{k} (k = 0,1,2, \dots)$ and

$\sum_{k = 0}^{\infty} λ_{k} {c^{'}}_{k} < + \infty .$ (11)

Since ${Q_{n, k} \equiv Q_{n, k}^{d} : k = 1,2, \dots, a_{n}^{d}}$ are algebraic polynomials, $c_{k}$ and ${c^{'}}_{k}$ must exist. The real numbers $λ_{k}$ satisfy (11) are also existent, for example, we can take $λ_{k} = e^{- (c_{k} + {c^{'}}_{k})} (k = 0,1,2, \dots)$ .

If (11) holds, then by Theorem 2.4 in [31] , or Theorem 4.2 in [32] or Proposition 6.2 in [33] that:

$\partial_{x}^{α} (\partial_{y}^{β} K_{x}^{μ} (y)) = K^{μ} (x, y) : = \sum_{k = 0}^{\infty} λ_{k} \partial_{x}^{α} (\partial_{y}^{β} P_{k} (x, y)), x \in S^{d - 1}, y \in S^{d - 1},$ (12)

and the convergence in the right-side of (12) is absolute and uniform on $S^{d - 1} \times S^{d - 1}$ .

Proposition 2.1. Assume above Assumptions A and B hold. Then,

1) There holds the reproducing property:

$f (x) = {〈 f, K_{x}^{μ} (\cdot) 〉}_{K^{μ}}, f \in H_{K^{μ}}^{\vec{n}}, x \in B^{d} .$ (13)

2) There holds the reproducing property for the outward normal vector operator, i.e.

$\frac{\partial_{x} f (x)}{\partial \vec{n}} = {〈 f, \frac{\partial_{x} K_{x}^{μ} (\cdot)}{\partial \vec{n}} 〉}_{K^{μ}}, f \in H_{K^{μ}}^{\vec{n}}, x \in S^{d - 1} .$ (14)

3) Define

$k = \sup_{x \in B^{d}} {‖ K_{x}^{μ} (\cdot) ‖}_{K^{μ}} + \sup_{x \in S^{d - 1}} {‖ \frac{\partial_{x}}{\partial \vec{n}} K_{x}^{μ} (\cdot) ‖}_{K^{μ}} .$

Then, for all $x \in B^{d}$ and $y \in S^{d - 1}$ , we have:

$| f (x) | \leq k {‖ f ‖}_{K^{μ}}, | \frac{\partial_{x} f (y)}{\partial \vec{n}} | \leq k {‖ f ‖}_{K^{μ}} .$ (15)

Proof. See the proofs in Section 4.

Let ${x_{i}}_{i = 1}^{m + l}$ be observations drawn i.i.d. according to $ρ_{B^{d}}$ , $y_{i} = f^{*} (x_{i}) + η_{x_{i}}$ , $i = 1, 2, \dots, m$ and for a given $x_{i} \in B^{d} ξ_{x_{i}}$ is a random variable subject to a condition distribution $ρ_{B^{d}} (y | x_{i})$ satisfying $| ρ (y | x_{i}) | \leq B$ (B is a given constant number), $E_{ρ (\cdot | x)} (η_{x}) = \int_{[- B, B]} η_{x} (y) d ρ_{B^{d}} (y | x) = 0$ , $σ = {(\int_{B^{d}} σ_{x}^{2} d ρ_{B^{d}})}^{\frac{1}{2}} < + \infty$ and $σ_{x}^{2} = E_{ρ_{B^{d}} (\cdot | x)} (η_{x}^{2})$ . Then, $z = {(x_{i}, y_{i})}_{i = 1}^{m}$ can be regarded as observations drawn i.i.d. according to $ρ (x, y) = ρ_{B^{d}} (x) ρ_{B^{d}} (y | x)$ and ${x_{i}}_{i = m + 1}^{m + l}$ be samples drawn i.i.d. according to $ρ_{S^{d - 1}}$ . The correspondence of problem (5) then is:

${\begin{array}{l} f (x_{i}) = y_{x_{i}}, & i = 1, 2, \dots, m, \\ \frac{\partial f (x_{i})}{\partial \vec{n}} = \sum_{k = 1}^{d} x_{i}^{k} \frac{\partial f (x_{i})}{\partial x^{k}} = g (x_{i}), & i = m + 1, \dots, m + l \end{array}$ (16)

We shall give an investigation on the numerical solutions of problem (16) with kernel-regularized approaches. A kernel learning algorithm with $H_{K^{μ}}^{\vec{n}}$ being the hypothesis space will be defined in Section 2.2. The representation theorem for it is provided and an error decomposition for its error analysis is given, from which a learning rate for Algorithm (16) is shown.

2.3. Kernel-Regularized Regression

With above notions in hand, we now give following kernel-regularized learning algorithms for giving solutions for problem (16):

$f_{z, λ} = \arg \min_{f \in H_{K^{μ}}^{\vec{n}}} E_{z} (f) + \frac{1}{l} \sum_{i = m}^{m + l} {(\frac{\partial_{x} f (x_{i})}{\partial \vec{n}} - g (x_{i}))}^{2} + λ {‖ f ‖}_{K^{μ}}^{2},$ (17)

where $l ~ m$ , i.e. there exist $c_{1} > 0, c_{2} > 0$ such that $c_{1} \leq \frac{l}{m} \leq c_{2}$ and

$E_{z} (f) = \frac{1}{m} \sum_{i = 1}^{m} {(f (x_{i}) - y_{x_{i}})}^{2} .$

Corresponding to (17), we define a model for the observations without the disturbances $ξ_{x_{i}}$ by:

$f_{\bar{X}, λ} = \arg \min_{f \in H_{K^{μ}}^{\vec{n}}} E_{\bar{X}} (f) + \frac{1}{l} \sum_{i = m}^{m + l} {(\frac{\partial_{x} f (x_{i})}{\partial \vec{n}} - g (x_{i}))}^{2} + λ {‖ f ‖}_{K^{μ}}^{2},$ (18)

where

$E_{\bar{X}} (f) = \frac{1}{m} \sum_{i = 1}^{m} {(f (x_{i}) - f^{*} (x_{i}))}^{2} .$

The integral model for (18) is defined as:

$f_{λ} = \arg \min_{f \in H_{K^{μ}}^{\vec{n}}} E (f) + \int_{S^{d - 1}} {(\frac{\partial_{x} f (x)}{\partial \vec{n}} - g (x))}^{2} d ρ_{S^{d - 1}} + λ {‖ f ‖}_{K^{μ}}^{2},$ (19)

where

$E (f) = \int_{B^{d}} {(f (x) - f^{*} (x))}^{2} d ρ_{B^{d}} .$

Proposition 2.2. (Representation theorem). Assume above Assumptions A and B hold. Then,

1) Algorithm (17) has unique solution $f_{z, λ}$ and there are coefficients ${a_{i}}_{i = 1}^{m + l}$ depending upon $λ, m, l, f_{z, λ}, z, f^{*}$ and g such that:

$f_{z, λ} (\cdot) = \sum_{i = 1}^{m} a_{i} K_{x_{i}}^{μ} (\cdot) + \sum_{k = 1}^{l} a_{m + k} \frac{\partial_{x} K_{x_{m + k}}^{μ} (\cdot)}{\partial \vec{n}} .$ (20)

2) Algorithm (18) has unique solution $f_{\bar{X}, λ}$ and there are coefficients ${b_{i}}_{i = 1}^{m + l}$ depending upon $λ, m, l, f_{\bar{X}, λ}, \bar{X}, f^{*}$ and g such that:

$f_{\bar{X}, λ} (\cdot) = \sum_{i = 1}^{m} b_{i} K_{x_{i}}^{μ} (\cdot) + \sum_{k = 1}^{l} b_{m + k} \frac{\partial_{x} K_{x_{m + k}}^{μ} (\cdot)}{\partial \vec{n}} .$ (21)

3) Algorithm (19) has unique solution $f_{λ}$ and there is a function $G_{λ, f^{*}} (x)$ depending upon $λ, f^{*}$ and a function $p_{λ, g} (x)$ depending upon $λ$ and g such that:

$f_{λ} (\cdot) = \int_{B^{d}} G_{λ, f^{*}} (x) K_{x}^{μ} (\cdot) d ρ_{B^{d}} + \int_{S^{d - 1}} p_{λ, g} (x) \frac{\partial_{x} K_{x}^{μ} (\cdot)}{\partial \vec{n}} d ρ_{S^{d - 1}} .$ (22)

Proof. See the proof in Section 4.

(20) shows that Algorithm (17) can be replaced by some coefficient regularized models and is a new topic, such kind of research can be found from literature [34] [35] .

We give the following error decomposition:

$\begin{array}{l} {‖ f_{z, λ} - f^{*} ‖}_{L^{2} (ρ_{B^{d}})} + {‖ \frac{\partial f_{z, λ}}{\partial \vec{n}} - g ‖}_{L^{2} (ρ_{S^{d - 1}})} \\ \leq {‖ f_{z, λ} - f_{\bar{X}, λ} ‖}_{L^{2} (ρ_{B^{d}})} + {‖ f_{\bar{X}, λ} - f^{*} ‖}_{L^{2} (ρ_{B^{d}})} \\ + {‖ \frac{\partial f_{z, λ}}{\partial \vec{n}} - \frac{\partial f_{\bar{X}, λ}}{\partial \vec{n}} ‖}_{L^{2} (ρ_{S^{d - 1}})} + {‖ \frac{\partial f_{\bar{X}, λ}}{\partial \vec{n}} - g ‖}_{L^{2} (ρ_{S^{d - 1}})} \\ \leq {‖ f_{z, λ} - f_{\bar{X}, λ} ‖}_{L^{2} (ρ_{B^{d}})} + {‖ f_{\bar{X}, λ} - f_{λ} ‖}_{L^{2} (ρ_{B^{d}})} + {‖ f_{λ} - f^{*} ‖}_{L^{2} (ρ_{B^{d}})} \\ + {‖ \frac{\partial f_{z, λ}}{\partial \vec{n}} - \frac{\partial f_{\bar{X}, λ}}{\partial \vec{n}} ‖}_{L^{2} (ρ_{S^{d - 1}})} + {‖ \frac{\partial f_{\bar{X}, λ}}{\partial \vec{n}} - \frac{\partial f_{λ}}{\partial \vec{n}} ‖}_{L^{2} (ρ_{S^{d - 1}})} + {‖ \frac{\partial f_{λ}}{\partial \vec{n}} - g ‖}_{L^{2} (ρ_{S^{d - 1}})} \end{array}$

$\begin{array}{l} \leq ({‖ f_{z, λ} - f_{\bar{X}, λ} ‖}_{L^{2} (ρ_{B^{d}})} + {‖ \frac{\partial f_{z, λ}}{\partial \vec{n}} - \frac{\partial f_{\bar{X}, λ}}{\partial \vec{n}} ‖}_{L^{2} (ρ_{S^{d - 1}})}) \\ + ({‖ f_{\bar{X}, λ} - f_{λ} ‖}_{L^{2} (ρ_{B^{d}})} + {‖ \frac{\partial f_{\bar{X}, λ}}{\partial \vec{n}} - \frac{\partial f_{λ}}{\partial \vec{n}} ‖}_{L^{2} (ρ_{S^{d - 1}})}) + \sqrt{2} \sqrt{K (f^{*}, g, λ)} \\ \leq k ({‖ f_{z, λ} - f_{\bar{X}, λ} ‖}_{K^{μ}} + {‖ f_{\bar{X}, λ} - f_{λ} ‖}_{K^{μ}}) + \sqrt{2} \sqrt{K (f^{*}, g, λ)}, \end{array}$ (23)

where in the last derivation, we have used (15) and

$K (f^{*}, g, λ) = \inf_{f \in H_{K^{μ}}^{\vec{n}}} ({‖ f - f^{*} ‖}_{L^{2} (ρ_{B^{d}})}^{2} + \int_{S^{d - 1}} {(\frac{\partial f (x)}{\partial \vec{n}} - g (x))}^{2} d ρ_{S^{d - 1}} + λ {‖ f ‖}_{K^{μ}}^{2}),$

which controls the approximation errors. Then, to bound the error:

$E (f_{z, λ}, f^{*}, g) = {‖ f_{z, λ} - f^{*} ‖}_{L^{2} (ρ_{B^{d}})} + {‖ \frac{\partial f_{z, λ}}{\partial \vec{n}} - g ‖}_{L^{2} (ρ_{S^{d - 1}})},$

we need only to bound upper bounds for the sample errors ${‖ f_{z, λ} - f_{\bar{X}, λ} ‖}_{K^{μ}}$ and ${‖ f_{\bar{X}, λ} - f_{ρ, λ} ‖}_{K^{μ}}$ respectively.

2.4. Learning Rates

Theorem 2.1. Let the above Assumptions A and B hold and let $f_{z, λ}$ be the solution of (17) and let $f^{*} \in C (B^{d})$ and $g \in L^{2} (ρ_{S^{d - 1}})$ . Then, for any $δ \in (0,1)$ , with confidence $1 - δ$ , holds:

$\begin{array}{l} {‖ f_{z, λ} - f^{*} ‖}_{L^{2} (ρ_{B^{d}})} + {‖ \frac{\partial f_{z, λ}}{\partial \vec{n}} - g ‖}_{L^{2} (ρ_{S^{d - 1}})} \\ = O (\frac{σ}{λ \sqrt{m δ}} + (\frac{\sqrt{K (f^{*}, g, λ)}}{λ^{\frac{3}{2}} \sqrt{m}} + \frac{{‖ f^{*} ‖}_{C (B^{d})}}{λ \sqrt{m}}) \log \frac{4}{δ} \\ + \frac{\sqrt{K (f^{*}, g, λ)}}{λ \sqrt{l δ}} + \sqrt{K (f^{*}, g, λ)}) . \end{array}$ (24)

If $g = \frac{\partial f^{*} (x)}{\partial \vec{n}}$ for $x \in S^{d - 1}$ , then

$E (f_{z, λ}, f^{*}, g) = {‖ f_{z, λ} - f^{*} ‖}_{H^{\vec{n}} (ρ_{B^{d}})}$

and in this case,

$K (f^{*}, g, λ) = K (f^{*}, λ) = \inf_{f \in H_{K^{μ}}^{\vec{n}}} ({‖ f - f^{*} ‖}_{H^{\vec{n}} (ρ_{B^{d}})}^{2} + λ {‖ f ‖}_{K^{μ}}^{2}), λ > 0,$

where the norm ${‖ \cdot ‖}_{H^{\vec{n}} (ρ_{B^{d}})}$ is defined in Section 2.2. The decay for $K (f^{*}, λ)$ has recently been discussed in [36] .

We then have the following Corollary 2.1.

Corollary 2.1. Let Assumptions A and B hold and let $f_{z, λ}$ be the solution of (17) and $f^{*} \in C (B^{d})$ . If $g (x) = \frac{\partial f^{*} (x)}{\partial \vec{n}}$ for $x \in S^{d - 1}$ . Then, for any $δ \in (0,1)$ , with confidence $1 - δ$ , holds:

$\begin{array}{l} {‖ f_{z, λ} - f^{*} ‖}_{H^{\vec{n}} (ρ_{B^{d}})} \\ = O (\frac{σ}{λ \sqrt{m δ}} + (\frac{\sqrt{K (f^{*}, λ)}}{λ^{\frac{3}{2}} \sqrt{m}} + \frac{{‖ f^{*} ‖}_{C (B^{d})}}{λ \sqrt{m}}) \log \frac{4}{δ} + \frac{\sqrt{K (f^{*}, g, λ)}}{λ \sqrt{l δ}} + \sqrt{K (f^{*}, λ)}) . \end{array}$ (25)

By (25), we know if $λ = λ (m) ↓_{0^{+}} (m \to + \infty)$ is chosen in such a way that $\underset{λ \to 0^{+}}{l i m} K (f^{*}, λ) = 0$ , $λ^{\frac{3}{2}} \sqrt{m} \to + \infty$ when $m \to + \infty$ , then, with confidence $1 - δ$ , holds (since $m ~ l$ ):

$\underset{m \to + \infty}{l i m} {‖ f_{z, λ} - f^{*} ‖}_{H^{\vec{n}} (ρ_{B^{d}})} = 0.$ (26)

2.5. Further Discussions

We now give some explanation on the results and the assumptions.

1) There are three reasons encouraging us to choose $D = B^{d}$ and $\partial D = S^{d - 1}$ to show the kernel-regularized regression model for solving the Neumann boundary problem (5).

i) When $D = B^{d}$ and $\partial D = S^{d - 1}$ , we can easily give explicit representation for outward normal derivatives (see (9)). Therefore, we can extend the method used in this paper to the domains whose outward normal derivatives may be computed easily.

ii) By the statement in Section 2.1, we know a tool for us to construct a reproducing kernel Hilbert space is the orthonormal basis. We notice that function orthogonal basis theory has been developed not only on the unit ball $B^{d}$ (see [37] ) and the unit sphere $S^{d - 1}$ (see [29] ), but also on the some Sobolev space associated with both $B^{d}$ and $S^{d - 1}$ (see e.g. [30] ). These facts also encourage us to choose $D = B^{d}$ and $\partial D = S^{d - 1}$ for problem (5).

iii) In learning theory, the learning rate estimate for the kernel-regularized learning algorithm are sum up to bound the error bounds, which belong to the scope of approximation theory. One can make use of the rich spherical approximation theory skills to bound the learning rates (see [24] ).

2) In the present paper, we give two Assumptions A and B. They are reasonable.

Since $p_{k} (x, y)$ is a bivariate polynomial on $B^{d} \times B^{d}$ for a given k, $\frac{\partial_{x} p_{k} (x, x)}{\partial \vec{n}}$ is still a polynomial whose sup can be attained on $S^{d - 1}$ . The Assumption B is reasonable. By the same way, we know $d_{k} = \sup_{x, y \in B^{d}} \partial^{α} p_{k} (x, y)$ can be attained. If we choose $λ_{l} = e^{- k^{β}} (β > 0)$ , then, we can have $K^{μ} \in C^{(1)} (B^{d} \times B^{d})$ . So, the Assumption A is reasonable as well.

3) The convergence in the present paper are uniform convergence (see (26)), which admits the reliability for the proposed method.

3. Lemmas

To give the feature description of the optimal solutions of Algorithms (17)-(19), we need the concept of Gâteaux derivative.

Let $(H, {‖ \cdot ‖}_{H})$ be a Hilbert space, $F (f) : H \to R \cup {\mp \infty}$ be a real function. We say F is Gâteaux differentiable at $f \in H$ if there is a $ξ \in H$ such that for any $g \in H$ there holds:

$\underset{t \to 0}{l i m} \frac{F (f + t g) - F (f)}{t} = {〈 g, ξ 〉}_{H}$

and write $\nabla_{f} F (f) = ξ$ as the Gâteaux derivative of $F (f)$ at f.

To prove Theorem 2.1, we need some lemmas.

Lemma 3.1. There hold the following equations:

1) $f_{z, λ}$ satisfies equation:

$\begin{array}{l} \frac{1}{m} \sum_{i = 1}^{m} (f_{z, λ} (x_{i}) - y_{x_{i}}) K_{x_{i}}^{μ} (\cdot) + \frac{1}{l} \sum_{k = 1}^{l} (\frac{\partial f_{z, λ} (x_{m + k})}{\partial \vec{n}} - g (x_{m + k})) \frac{\partial K_{x_{m + k}}^{μ} (\cdot)}{\partial \vec{n}} \\ + λ f_{z, λ} (\cdot) = 0. \end{array}$ (27)

2) $f_{\bar{X}, λ}$ satisfies equation:

$\begin{array}{l} \frac{1}{m} \sum_{i = 1}^{m} (f_{\bar{X}, λ} (x_{i}) - f^{*}) (x_{i}) K_{x_{i}}^{μ} (\cdot) + \frac{1}{l} \sum_{k = 1}^{l} (\frac{\partial f_{\bar{X}, λ} (x_{m + k})}{\partial \vec{n}} - g (x_{m + k})) \frac{\partial K_{x_{m + k}}^{μ} (\cdot)}{\partial \vec{n}} \\ + λ f_{\bar{X}, λ} (\cdot) = 0. \end{array}$ (28)

3) $f_{λ}$ satisfies equation:

$\begin{array}{l} \int_{B^{d}} (f_{λ} (x) - f^{*} (x)) K_{x}^{μ} (\cdot) d ρ_{B^{d}} + \int_{S^{d - 1}} (\frac{\partial f_{λ} (x)}{\partial \vec{n}} - g (x)) \frac{\partial K_{x}^{μ} (\cdot)}{\partial \vec{n}} d ρ_{S^{d - 1}} \\ + λ f_{λ} (\cdot) = 0. \end{array}$ (29)

Proof. Proof of 1). Define $Ω_{z} (f)$ as:

$Ω_{z} (f) = E_{z} (f) + \frac{1}{l} \sum_{k = 1}^{l} {(\frac{\partial f (x_{k + m})}{\partial \vec{n}} - g (x_{m + k}))}^{2} + λ {‖ f ‖}_{K^{μ}}^{2}, f \in H_{K^{μ}}^{\vec{n}} .$

Then,

$\begin{array}{l} \underset{t \to 0^{+}}{l i m} \frac{Ω_{z} (f + t h) - Ω_{z} (f)}{t} \\ = \underset{t \to 0^{+}}{l i m} \frac{E_{z} (f + t h) - E_{z} (f)}{t} + 2 λ {〈 h, f 〉}_{K^{μ}}, \\ + \frac{1}{l} \sum_{k = 1}^{l} \underset{t \to 0^{+}}{l i m} \frac{{(\frac{\partial f (x_{k + m})}{\partial \vec{n}} + t \frac{\partial h (x_{k + m})}{\partial \vec{n}} - g (x_{k + m}))}^{2} - {(\frac{\partial f (x_{k + m})}{\partial \vec{n}} - g (x_{k + m}))}^{2}}{t}, \end{array}$

where

$\begin{matrix} \underset{t \to 0^{+}}{l i m} \frac{E_{z} (f + t h) - E_{z} (f)}{t} = \frac{1}{m} \sum_{i = 1}^{m} \underset{t \to 0^{+}}{l i m} \frac{{(f (x_{i}) + t h (x_{i}) - y_{x_{i}})}^{2} - {(f (x_{i}) - y_{x_{i}})}^{2}}{t} \\ = \frac{2}{m} \sum_{i = 1}^{m} (f (x_{i}) - y_{x_{i}}) h (x_{i}) \\ = \frac{2}{m} \sum_{i = 1}^{m} (f (x_{i}) - y_{x_{i}}) {〈 h, K_{x_{i}}^{μ} (\cdot) 〉}_{K^{μ}} \\ = {〈 h, \frac{2}{m} \sum_{i = 1}^{m} (f (x_{i}) - y_{x_{i}}) K_{x_{i}}^{μ} (\cdot) 〉}_{K^{μ}} \end{matrix}$

and

$\begin{array}{l} \frac{1}{l} \sum_{k = 1}^{l} \underset{t \to 0^{+}}{l i m} \frac{{(\frac{\partial f (x_{k + m})}{\partial \vec{n}} + t \frac{\partial h (x_{k + m})}{\partial \vec{n}} - g (x_{k + m}))}^{2} - {(\frac{\partial f (x_{k + m})}{\partial \vec{n}} - g (x_{k + m}))}^{2}}{t} \\ = {〈 h, \frac{2}{l} \sum_{k = 1}^{l} (\frac{\partial f (x_{k + m})}{\partial \vec{n}} - g (x_{k + m})) \frac{\partial K_{x_{k + m}}^{μ} (\cdot)}{\partial \vec{n}} 〉}_{K^{μ}} . \end{array}$ (30)

It follows

$\begin{array}{l} \underset{t \to 0^{+}}{l i m} \frac{Ω_{z} (f + t h) - Ω_{z} (f)}{t} \\ = {〈 h, \frac{2}{m} \sum_{i = 1}^{m} (f (x_{i}) - y_{x_{i}}) K_{x_{i}}^{μ} (\cdot) + \frac{2}{l} \sum_{k = 1}^{l} (\frac{\partial f (x_{k + m})}{\partial \vec{n}} - g (x_{k + m})) \frac{\partial K_{x_{k + m}}^{μ} (\cdot)}{\partial \vec{n}} + 2 λ f 〉}_{K^{μ}} . \end{array}$

By the definition of Gâteaux derivative, we have:

$\begin{matrix} \nabla_{f} Ω_{z} (f) = \frac{2}{m} \sum_{i = 1}^{m} (f (x_{i}) - y_{x_{i}}) K_{x_{i}}^{μ} (\cdot) \\ + \frac{2}{l} \sum_{k = 1}^{l} (\frac{\partial f (x_{k + m})}{\partial \vec{n}} - g (x_{k + m})) \frac{\partial K_{x_{k + m}}^{μ} (\cdot)}{\partial \vec{n}} + 2 λ f (\cdot) . \end{matrix}$

By Fermat’s rule (see 1) in Proposition A1) and the definition of $f_{z, λ}$ , we have ${\nabla_{f} Ω_{z} (f) |}_{f = f_{z, λ}} = 0$ , i.e. (27) holds.

Lemma 3.2. There hold the inequality:

${‖ f_{z, λ} - f_{\bar{X}, λ} ‖}_{K^{μ}} \leq \frac{2}{λ m} {‖ \sum_{i = 1}^{m} η_{x_{i}} (y) K_{x_{i}}^{μ} (\cdot) ‖}_{K^{μ}}$ (31)

and the inequality:

$\begin{array}{l} {‖ f_{\bar{X}, λ} - f_{ρ, λ} ‖}_{K^{μ}} \\ \leq \frac{2}{λ} ({‖ \int_{B^{d}} (f_{λ} (x) - f^{*} (x)) K_{x}^{μ} (\cdot) d ρ_{B^{d}} - \frac{1}{m} \sum_{i =1}^{m} (f_{λ} (x_{i}) - f^{*} (x_{i})) K_{x_{i}}^{μ} (\cdot) ‖}_{K^{μ}} \\ + {‖ \int_{S^{d - 1}} (\frac{\partial f (x)}{\partial \vec{n}} - g (x)) K_{x}^{μ} (\cdot) d ρ_{S^{d - 1}} - \frac{1}{l} \sum_{k =1}^{l} (\frac{\partial f (x_{k + m})}{\partial \vec{n}} - g (x_{m + k})) K_{x_{m + k}}^{μ} (\cdot) ‖}_{K^{μ}}) . \end{array}$ (32)

Proof. The definition of $f_{z, λ}$ and the inequality (43) give:

$\begin{array}{l} 0 \geq Ω_{z} (f_{z, λ}) - Ω_{z} (f_{\bar{X}, λ}) \\ \geq \frac{2}{m} \sum_{i = 1}^{m} (f_{\bar{X}, λ} (x_{i}) - y_{x_{i}}) (f_{z, λ} (x_{i}) - f_{\bar{X}, λ} (x_{i})) \\ + \frac{2}{l} \sum_{k = 1}^{l} (\frac{\partial f_{\bar{X}, λ} (x_{m + k})}{\partial \vec{n}} - g (x_{m + k})) (\frac{\partial f_{z, λ} (x_{m + k})}{\partial \vec{n}} - \frac{\partial f_{\bar{X}, λ} (x_{m + k})}{\vec{n}}) \\ + λ ({‖ f_{z, λ} ‖}_{K^{μ}}^{2} - {‖ f_{\bar{X}, λ} ‖}_{K^{μ}}^{2}) \end{array}$

$\begin{matrix} \geq 〈 f_{z, λ} - f_{\bar{X}, λ}, \frac{2}{m} \sum_{i = 1}^{m} (f_{\bar{X}, λ} (x_{i}) - y_{x_{i}}) K_{x_{i}}^{μ} (\cdot) \\ + {\frac{2}{l} \sum_{k = 1}^{l} (\frac{\partial f_{\bar{X}, λ} (x_{m + k})}{\partial \vec{n}} - g (x_{m + k})) \frac{\partial K_{x_{m + k}}^{μ} (\cdot)}{\partial \vec{n}} 〉}_{K^{μ}} \\ + {〈 f_{z, λ} - f_{\bar{X}, λ}, 2 λ f_{\bar{X}, λ} 〉}_{K^{μ}} + λ {‖ f_{z, λ} - f_{\bar{X}, λ} ‖}_{K^{μ}}^{2}, \end{matrix}$

where we have used (44). Since (28) and $y_{x_{i}} = f^{*} (x_{i}) + η_{x_{i}}$ , we have:

$\begin{matrix} 0 \geq Ω_{z} (f_{z, λ}) - Ω_{z} (f_{\bar{X}, λ}) \\ \geq {〈 f_{z, λ} - f_{\bar{X}, λ}, \frac{2}{m} \sum_{i = 1}^{m} η_{x_{i}} (y) K_{x_{i}}^{μ} (\cdot) 〉}_{K^{μ}} + λ {‖ f_{z, λ} - f_{\bar{X}, λ} ‖}_{K^{μ}}^{2} . \end{matrix}$

It follows:

$\begin{matrix} λ {‖ f_{z, λ} - f_{\bar{X}, λ} ‖}_{K^{μ}}^{2} \leq {〈 f_{\bar{X}, λ} - f_{z, λ}, \frac{2}{m} \sum_{i = 1}^{m} η_{x_{i}} (y) K_{x_{i}}^{μ} (\cdot) 〉}_{K^{μ}} \\ \leq {‖ \frac{2}{m} \sum_{i = 1}^{m} η_{x_{i}} (y) K_{x_{i}}^{μ} (\cdot) ‖}_{K^{μ}} \times {‖ f_{\bar{X}, λ} - f_{z, λ} ‖}_{K^{μ}} . \end{matrix}$

(31) thus holds.

Proof of (32). Define $Ω_{\bar{X}} (f)$ as:

$Ω_{\bar{X}} (f) = E_{\bar{X}} (f) + \frac{1}{l} \sum_{k = 1}^{l} {(\frac{\partial f (x_{k + m})}{\partial \vec{n}} - g (x_{m + k}))}^{2} + λ {‖ f ‖}_{K^{μ}}^{2}, f \in H_{K^{μ}}^{\vec{n}} .$

Then, by the definition of $f_{\bar{X}, λ}$ and inequalities (43) and (44), we have:

$\begin{matrix} 0 \geq Ω_{\bar{X}} (f_{\bar{X}, λ}) - Ω_{\bar{X}} (f_{λ}) \\ \geq {〈 f_{\bar{X}, λ} - f_{λ}, \frac{2}{m} \sum_{i = 1}^{m} (f_{λ} (x_{i}) - f^{*} (x_{i})) K_{x_{i}}^{μ} (\cdot) 〉}_{K^{μ}} \\ + {〈 f_{\bar{X}, λ} - f_{λ}, \frac{2}{l} \sum_{k = 1}^{l} (\frac{\partial f_{λ} (x_{m + k})}{\partial \vec{n}} - g (x_{m + k})) \frac{\partial K_{x_{m + k}}^{μ} (\cdot)}{\partial \vec{n}} 〉}_{K^{μ}} \\ + {〈 f_{\bar{X}, λ} - f_{λ}, 2 λ f_{λ} 〉}_{K^{μ}} + λ {‖ f_{\bar{X}, λ} - f_{λ} ‖}_{K^{μ}}^{2} . \end{matrix}$

By (29), we have:

$\begin{array}{l} 0 \geq 〈 f_{\bar{X}, λ} - f_{λ}, \int_{B^{d}} (f_{λ} (x) - f^{*} (x)) K_{x}^{μ} (\cdot) d ρ_{B^{d}} \begin{matrix} \end{matrix} \\ - \frac{1}{m} \sum_{i = 1}^{m} (f_{λ} (x_{i}) - f^{*} (x_{i})) K_{x_{i}}^{μ} (\cdot) + \int_{S^{d - 1}} (\frac{\partial f_{λ} (x)}{\partial \vec{n}} - g (x)) \frac{\partial K_{x}^{μ} (\cdot)}{\partial \vec{n}} d ρ_{S^{d - 1}} \\ - {\frac{1}{l} \sum_{k = 1}^{l} (\frac{\partial f_{λ} (x_{m + k})}{\partial \vec{n}} - g (x_{m + k})) \frac{\partial K_{x_{m + k}}^{μ} (\cdot)}{\partial \vec{n}} 〉}_{K^{μ}} + λ {‖ f_{\bar{X}, λ} - f_{λ} ‖}_{K^{μ}}^{2} . \end{array}$

It follows:

$\begin{array}{l} λ {‖ f_{\bar{X}, λ} - f_{λ} ‖}_{K^{μ}}^{2} \\ \leq 〈 f_{\bar{X}, λ} - f_{λ}, \frac{1}{m} \sum_{i = 1}^{m} (f_{λ} (x_{i}) - f^{*} (x_{i})) K_{x_{i}}^{μ} (\cdot) - \int_{B^{d}} (f_{λ} (x) - f^{*} (x)) K_{x}^{μ} (\cdot) d ρ_{B^{d}} \\ - \int_{S^{d - 1}} (\frac{\partial f_{λ} (x)}{\partial \vec{n}} - g (x)) \frac{\partial K_{x}^{μ} (\cdot)}{\partial \vec{n}} d ρ_{S^{d - 1}} \\ + {\frac{1}{l} \sum_{k = 1}^{l} (\frac{\partial f_{λ} (x_{m + k})}{\partial \vec{n}} - g (x_{m + k})) \frac{\partial K_{x_{m + k}}^{μ} (\cdot)}{\partial \vec{n}} 〉}_{K^{μ}} \end{array}$

$\begin{array}{l} \leq ({‖ \frac{1}{m} \sum_{i = 1}^{m} (f_{λ} (x_{i}) - f^{*} (x_{i})) K_{x_{i}}^{μ} (\cdot) - \int_{B^{d}} (f_{λ} (x) - f^{*} (x)) K_{x}^{μ} (\cdot) d ρ_{B^{d}} ‖}_{K^{μ}} \\ \times {‖ f_{\bar{X}, λ} - f_{λ} ‖}_{K^{μ}} + ‖ \int_{S^{d - 1}} (\frac{\partial f_{λ} (x)}{\partial \vec{n}} - g (x)) \frac{\partial K_{x}^{μ} (\cdot)}{\partial \vec{n}} d ρ_{S^{d - 1}} \\ - {\frac{1}{l} \sum_{k = 1}^{l} (\frac{\partial f_{λ} (x_{m + k})}{\partial \vec{n}} - g (x_{m + k})) \frac{\partial K_{x_{m + k}}^{μ} (\cdot)}{\partial \vec{n}} ‖}_{K^{μ}}) . \end{array}$

Above inequality gives (32).

Lemma 3.3. There hold following inequalities.

1) For any $δ \in (0,1)$ , with confidence $1 - δ$ , holds:

${‖ f_{z, λ} - f_{\bar{X}, λ} ‖}_{K^{μ}} \leq \frac{2 k σ}{λ \sqrt{m δ}} .$ (33)

2) For any $δ \in (0,1)$ , with confidence $1 - δ$ , holds:

${‖ f_{\bar{X}, λ} - f_{λ} ‖}_{K^{μ}} = (\frac{2 k^{2} \sqrt{K (f^{*}, g, λ)}}{λ \sqrt{m λ}} + \frac{2 k {‖ f^{*} ‖}_{C (B^{d})}}{λ \sqrt{m}}) \log \frac{2}{δ} + \frac{k \sqrt{K (f^{*}, g, λ)}}{\sqrt{l δ}} .$ (34)

Proof. Proof of (33). The definition of ${‖ \cdot ‖}_{K^{μ} (B^{d})}$ and (14) give:

$\begin{matrix} {‖ \frac{1}{m} \sum_{i = 1}^{m} η_{x_{i}} (y) K_{x_{i}}^{μ} (\cdot) ‖}_{K^{μ}}^{2} = {〈 \frac{1}{m} \sum_{i = 1}^{m} η_{x_{i}} (y) K_{x_{i}}^{μ} (\cdot), \frac{1}{m} \sum_{j = 1}^{m} η_{x_{j}} (y) K_{x_{j}}^{μ} (\cdot) 〉}_{K^{μ}} \\ = \frac{1}{m^{2}} \sum_{i, j = 1}^{m} η_{x_{i}} (y) η_{x_{j}} (y) K^{μ} (x_{i}, x_{j}) . \end{matrix}$

By Markov inequality, we have:

$P ({‖ f_{z, λ} - f_{\bar{X}, λ} ‖}_{K^{μ}} > ε) \leq \frac{E ({‖ f_{z, λ} - f_{\bar{X}, λ} ‖}_{K^{μ}}^{2})}{ε^{2}} .$ (35)

Then, by (31), we have:

$\begin{matrix} {‖ f_{z, λ} - f_{\bar{X}, λ} ‖}_{K^{μ}}^{2} \leq \frac{4}{λ^{2} m^{2}} {‖ \sum_{i = 1}^{m} η_{x_{i}} (y) K_{x_{i}}^{μ} (\cdot) ‖}_{K^{μ}}^{2} \\ \leq \frac{4}{λ^{2} m^{2}} {〈 \sum_{i = 1}^{m} η_{x_{i}} (y) K_{x_{i}}^{μ} (\cdot), \sum_{j = 1}^{m} η_{x_{j}} (y) K_{x_{j}}^{μ} (\cdot) 〉}_{K^{μ}} \\ = \frac{4}{λ^{2} m^{2}} \sum_{i, j = 1}^{m} η_{x_{i}} (y) η_{x_{j}} (y) K^{μ} (x_{i}, x_{j}) . \end{matrix}$

Since $E_{ρ (\cdot | x)} (η_{x}) = 0$ , we have:

$E ({‖ f_{z, λ} - f_{\bar{X}, λ} ‖}_{K^{μ}}^{2}) \leq \frac{4 k^{2}}{λ^{2} m} \int_{B^{d}} η_{x}^{2} d ρ_{B^{d}} = \frac{4 k^{2} σ^{2}}{λ^{2} m} .$

By (35), we have:

$P ({‖ f_{z, λ} - f_{\bar{X}, λ} ‖}_{K^{μ}} \leq ε) \geq 1 - \frac{4 k^{2} σ^{2}}{λ^{2} m ε^{2}} .$

Taking $δ = \frac{4 k^{2} σ^{2}}{λ^{2} m ε^{2}}$ . We have $ε = \frac{2 k σ}{λ \sqrt{m δ}}$ and (33) thus holds.

We now show (34). Take

$A = {‖ \int_{B^{d}} (f_{λ} (x) - f^{*} (x)) K_{x}^{μ} (\cdot) d ρ_{B^{d}} - \frac{1}{m} \sum_{i = 1}^{m} (f_{λ} (x_{i}) - f^{*} (x_{i})) K_{x_{i}}^{μ} (\cdot) ‖}_{K^{μ}}$

and

$\begin{matrix} B = ‖ \int_{S^{d - 1}} (\frac{\partial f_{λ} (x)}{\partial \vec{n}} - g (x)) \frac{\partial K_{x}^{μ} (\cdot)}{\partial \vec{n}} d ρ_{S^{d - 1}} \\ - {\frac{1}{l} \sum_{k = 1}^{l} (\frac{\partial f_{λ} (x_{m + k})}{\partial \vec{n}} - g (x_{m + k})) \frac{\partial K_{x_{m + k}}^{μ} (\cdot)}{\partial \vec{n}} ‖}_{K^{μ}} . \end{matrix}$

Then,

${‖ f_{\bar{X}, λ} - f_{ρ, λ} ‖}_{K^{μ}} \leq \frac{2}{λ} (A + B),$ (36)

where

$\begin{matrix} A \leq {‖ \int_{B^{d}} f_{λ} (x) K_{x}^{μ} (\cdot) d ρ_{B^{d}} - \frac{1}{m} \sum_{i = 1}^{m} f_{λ} (x_{i}) K_{x_{i}}^{μ} (\cdot) ‖}_{K^{μ}} \\ + {‖ \int_{B^{d}} f^{*} (x) K_{x}^{μ} (\cdot) d ρ_{B^{d}} - \frac{1}{m} \sum_{i = 1}^{m} f^{*} (x_{i}) K_{x_{i}}^{μ} (\cdot) ‖}_{K^{μ}} . \end{matrix}$

Since

$\begin{matrix} {‖ f_{λ} (x) K_{x}^{μ} (\cdot) ‖}_{K^{μ}} = | f_{λ} (x) | \sqrt{K_{x}^{μ} (x)} \leq k | f_{λ} (x) | \\ \leq k^{2} {‖ f ‖}_{K^{μ}} \leq \frac{k^{2} \sqrt{K (f^{*}, g, λ)}}{\sqrt{λ}} \end{matrix}$

and ${‖ f^{*} (x) K_{x}^{μ} (\cdot) ‖}_{K^{μ}} \leq k | f^{*} (x) | \leq k {‖ f^{*} ‖}_{C (B^{d})}$ , we have by (47) that, with confidence $1 - δ$ , holds:

$A \leq (\frac{2 k^{2} \sqrt{K (f^{*}, g, λ)}}{λ \sqrt{m λ}} + \frac{2 k {‖ f^{*} ‖}_{C (B^{d})}}{λ \sqrt{m}}) \log \frac{2}{δ} .$ (37)

On the other hand, take $ξ (x) = (\frac{\partial f_{λ} (x)}{\partial \vec{n}} - g (x))$ . Then,

$B = {‖ \int_{S^{d - 1}} ξ (x) \frac{\partial K_{x}^{μ} (\cdot)}{\partial \vec{n}} d ρ_{S^{d - 1}} - \frac{1}{l} \sum_{k = 1}^{l} ξ (x_{m + k}) \frac{\partial K_{x_{m + k}}^{μ} (\cdot)}{\partial \vec{n}} ‖}_{K^{μ}} .$

By the definition of ${‖ \cdot ‖}_{K^{μ}}$ , we have:

$\begin{matrix} B^{2} = 〈 \int_{S^{d - 1}} ξ (x) \frac{\partial K_{x}^{μ} (\cdot)}{\partial \vec{n}} d ρ_{S^{d - 1}} - \frac{1}{l} \sum_{k = 1}^{l} ξ (x_{m + k}) \frac{\partial K_{x_{m + k}}^{μ} (\cdot)}{\partial \vec{n}}, \\ \int_{S^{d - 1}} ξ (u) \frac{\partial K_{u}^{μ} (\cdot)}{\partial \vec{n}} d ρ_{S^{d - 1}} - {\frac{1}{l} \sum_{i = 1}^{l} ξ (x_{m + i}) \frac{\partial K_{x_{m + i}}^{μ} (\cdot)}{\partial \vec{n}} 〉}_{K^{μ}} \\ = \int_{S^{d - 1}} \int_{S^{d - 1}} ξ (x) ξ (u) {〈 \frac{\partial K_{x}^{μ} (\cdot)}{\partial \vec{n}}, \frac{\partial K_{u}^{μ} (\cdot)}{\partial \vec{n}} 〉}_{K^{μ}} d ρ_{S^{d - 1}} (x) d ρ_{S^{d - 1}} (u) \\ - 2 \int_{S^{d - 1}} ξ (x) (\frac{1}{l} \sum_{i = 1}^{l} ξ (x_{m + i}) {〈 \frac{\partial K_{x}^{μ} (\cdot)}{\partial \vec{n}}, \frac{\partial K_{x_{m + i}}^{μ} (\cdot)}{\partial \vec{n}} 〉}_{K^{μ}}) d ρ_{S^{d - 1}} (x) \\ + \frac{1}{l^{2}} \sum_{k = 1}^{l} ξ^{2} (x_{m + k}) {〈 \frac{\partial K_{x_{m + k}}^{μ} (\cdot)}{\partial \vec{n}}, \frac{\partial K_{x_{m + k}}^{μ} (\cdot)}{\partial \vec{n}} 〉}_{K^{μ}} \\ + \frac{1}{l^{2}} \sum_{k, i = 1, k \neq i}^{l} ξ (x_{m + i}) ξ (x_{m + j}) {〈 \frac{\partial K_{x_{m + k}}^{μ} (\cdot)}{\partial \vec{n}}, \frac{\partial K_{x_{m + i}}^{μ} (\cdot)}{\partial \vec{n}} 〉}_{K^{μ}} . \end{matrix}$

Since (14), we have:

${〈 \frac{\partial K_{x}^{μ} (\cdot)}{\partial \vec{n}}, \frac{\partial K_{u}^{μ} (\cdot)}{\partial \vec{n}} 〉}_{K^{μ}} = \frac{\partial_{u}}{\partial \vec{n}} \frac{\partial_{x}}{\partial \vec{n}} K_{x}^{μ} (u),$

${〈 \frac{\partial K_{x}^{μ} (\cdot)}{\partial \vec{n}}, \frac{\partial K_{x_{m + i}}^{μ} (\cdot)}{\partial \vec{n}} 〉}_{K^{μ}} = {\frac{\partial_{u}}{\partial \vec{n}} \frac{\partial_{x}}{\partial \vec{n}} K_{x}^{μ} (u) |}_{u = x_{m + i}}$

and

${〈 \frac{\partial K_{x_{m + k}}^{μ} (\cdot)}{\partial \vec{n}}, \frac{\partial K_{x_{m + k}}^{μ} (\cdot)}{\partial \vec{n}} 〉}_{K^{μ}} = {\frac{\partial_{u}}{\partial \vec{n}} \frac{\partial_{u}}{\partial \vec{n}} K_{u}^{μ} (u) |}_{u = x_{m + k}},$

${〈 \frac{\partial K_{x_{m + k}}^{μ} (\cdot)}{\partial \vec{n}}, \frac{\partial K_{x_{m + i}}^{μ} (\cdot)}{\partial \vec{n}} 〉}_{K^{μ}} = {\frac{\partial_{u}}{\partial \vec{n}} \frac{\partial_{x}}{\partial \vec{n}} K_{x}^{μ} (u) |}_{u = x_{m + i}, x = x_{m + k}} .$

It follows that:

$\begin{matrix} B^{2} = \int_{S^{d - 1}} \int_{S^{d - 1}} ξ (x) ξ (u) \frac{\partial_{u}}{\partial \vec{n}} \frac{\partial_{x}}{\partial \vec{n}} K_{x}^{μ} (u) d ρ_{S^{d - 1}} (x) d ρ_{S^{d - 1}} (u) \\ - 2 \int_{S^{d - 1}} ξ (x) (\frac{1}{l} \sum_{i = 1}^{l} ξ (x_{m + i}) {\frac{\partial_{u}}{\partial \vec{n}} \frac{\partial_{x}}{\partial \vec{n}} K_{x}^{μ} (u) |}_{u = x_{m + i}}) d ρ_{S^{d - 1}} (x) \\ + \frac{1}{l^{2}} \sum_{k = 1}^{l} ξ^{2} (x_{m + k}) {\frac{\partial_{u}}{\partial \vec{n}} \frac{\partial_{u}}{\partial \vec{n}} K_{u}^{μ} (u) |}_{u = x_{m + k}} \\ + \frac{1}{l^{2}} \sum_{k, i = 1, k = i}^{l} ξ (x_{m + i}) ξ (x_{m + j}) {\frac{\partial_{u}}{\partial \vec{n}} \frac{\partial_{x}}{\partial \vec{n}} K_{x}^{μ} (u) |}_{u = x_{m + i}, x = x_{m + k}} . \end{matrix}$

Since $(x_{m + 1}, x_{m + 2}, \dots, x_{m + l})$ are i.i.d. according to $ρ_{ρ_{S^{d - 1}}}$ , we have:

$\begin{matrix} E (B^{2}) = \frac{1}{l} (\int_{S^{d - 1}} ξ^{2} (x) \frac{\partial_{x}}{\partial \vec{n}} \frac{\partial_{x}}{\partial \vec{n}} K_{x}^{μ} (x) d ρ_{S^{d - 1}} (x) \\ - \int_{S^{d - 1}} \int_{S^{d - 1}} ξ (x) ξ (u) \frac{\partial_{u}}{\partial \vec{n}} \frac{\partial_{x}}{\partial \vec{n}} K_{x}^{μ} (u) d ρ_{S^{d - 1}} (x) d ρ_{S^{d - 1}} (u)) \\ \leq \frac{1}{l} (\int_{S^{d - 1}} ξ^{2} (x) \frac{\partial_{x}}{\partial \vec{n}} \frac{\partial_{x}}{\partial \vec{n}} K_{x}^{μ} (x) d ρ_{S^{d - 1}} (x)), \end{matrix}$

where we have used the fact that:

$\int_{S^{d - 1}} \int_{S^{d - 1}} ξ (x) ξ (u) \frac{\partial_{u}}{\partial \vec{n}} \frac{\partial_{x}}{\partial \vec{n}} K_{x}^{μ} (u) d ρ_{S^{d - 1}} (x) d ρ_{S^{d - 1}} (u) \geq 0$

since $\frac{\partial_{u}}{\partial \vec{n}} \frac{\partial_{x}}{\partial \vec{n}} K_{x}^{μ} (u)$ is a positive definition function about u and x. According to Assumption B, we have:

$| \frac{\partial_{u}}{\partial \vec{n}} \frac{\partial_{x}}{\partial \vec{n}} K_{x}^{μ} (u) | \leq k^{2},$

It follows that:

$E (B^{2}) \leq \frac{k^{2}}{l} (\int_{S^{d - 1}} ξ^{2} (x) d ρ_{S^{d - 1}} (x)) .$

By Markov inequality, we have:

$\begin{matrix} P (B > ε) \leq \frac{E (B^{2})}{ε^{2}} \\ \leq \frac{k^{2}}{l ε^{2}} (\int_{S^{d - 1}} ξ^{2} (x) d ρ_{S^{d - 1}} (x)) \\ \leq \frac{k^{2}}{l ε^{2}} K (f^{*}, g, λ) . \end{matrix}$

Take $δ = \frac{k^{2}}{l ε^{2}} K (f^{*}, g, λ)$ . Then, $ε = \frac{k \sqrt{K (f^{*}, g, λ)}}{\sqrt{l δ}}$ . It follows that with confidence $1 - δ$ :

$B \leq \frac{k \sqrt{K (f^{*}, g, λ)}}{\sqrt{l δ}} .$ (38)

Collecting (38), (37) and (36), we arrive at (34).

4. Proofs

Proof of Proposition 2.1. Proof of 1). For any $f (x) = \sum_{k = 0}^{\infty} \sum_{l = 1}^{a_{k}^{d}} a_{k, l} (f) Q_{k, l} (x) \in H_{K^{μ}}^{\vec{n}}$ . We rewrite $K^{μ} (x, y)$ as:

$K^{μ} (x, y) = K_{y}^{μ} (x) = \sum_{k = 0}^{\infty} \sum_{l = 1}^{a_{k}^{d}} (λ_{k, l} Q_{k, l} (y)) Q_{k, l} (x) .$

Then,

${〈 f, K_{y}^{μ} 〉}_{K^{μ}} = \sum_{k = 0}^{\infty} \sum_{l = 1}^{a_{k}^{d}} \frac{(λ_{k, l} Q_{k, l} (y)) a_{k, l} (f)}{λ_{k, l}} = f (y),$

which yields (13).

Proof of 2). Since (9) and (46), we have:

$\begin{matrix} \frac{\partial f (x)}{\partial \vec{n}} = {〈 f, \sum_{i = 1}^{d} x^{i} \frac{\partial K (x, \cdot)}{\partial x^{i}} (x) 〉}_{K^{μ}} \\ = {〈 f, \frac{\partial}{\partial \vec{n}} K (x, \cdot) 〉}_{K^{μ}}, x = (x^{1}, \dots, x^{d}) \in S^{d - 1} . \end{matrix}$

(14) thus holds.

Proof of 3). By (14), we have:

$| \frac{\partial f (x)}{\partial \vec{n}} | = | {〈 f, \frac{\partial K_{x}^{μ} (\cdot)}{\partial \vec{n}} 〉}_{K^{μ}} | \leq {‖ f ‖}_{K^{μ}} {‖ \frac{\partial K_{x}^{μ} (\cdot)}{\partial \vec{n}} ‖}_{K^{μ}} \leq k {‖ f ‖}_{K^{μ}} .$

By the same method, we have by (13) that:

$\begin{matrix} | f (x) | = | {〈 f, K_{x}^{μ} (\cdot) 〉}_{K^{μ}} | \\ \leq {‖ f ‖}_{K^{μ}} {‖ K_{x}^{μ} (\cdot) ‖}_{K^{μ}} \\ \leq k {‖ f ‖}_{K^{μ}} . \end{matrix}$

(15) thus holds.

Proof of Proposition 2.2. By (27), we have:

$\begin{matrix} f_{z, λ} (\cdot) = \frac{1}{λ m} \sum_{i = 1}^{m} (y_{x_{i}} - f_{z, λ} (x_{i})) K_{x_{i}}^{μ} (\cdot) \\ + \frac{1}{λ l} \sum_{k = 1}^{l} (g (x_{m + k}) - \frac{\partial f_{z, λ} (x_{m + k})}{\partial \vec{n}}) \frac{\partial K_{x_{m + k}}^{μ} (\cdot)}{\partial \vec{n}} . \end{matrix}$ (39)

Taking $a_{i} = \frac{1}{λ m} (y_{x_{i}} - f_{z, λ} (x_{i}))$ and $a_{m + k} = \frac{1}{λ l} (g (x_{m + k}) - \frac{\partial f_{z, λ} (x_{m + k})}{\partial \vec{n}})$ into (40), we have (20).

By (29), we have:

$\begin{matrix} f_{\bar{X}, λ} (\cdot) = \frac{1}{λ m} \sum_{i = 1}^{m} (f^{*} (x_{i}) - f_{\bar{X}, λ} (x_{i})) K_{x_{i}}^{μ} (\cdot) \\ + \frac{1}{λ l} \sum_{k = 1}^{l} (g (x_{m + k}) - \frac{\partial f_{\bar{X}, λ} (x_{m + k})}{\vec{n}}) \frac{\partial K_{x_{m + k}}^{μ} (\cdot)}{\partial \vec{n}} . \end{matrix}$ (40)

Taking $b_{i} = \frac{1}{λ m} (f^{*} (x_{i}) - f_{\bar{X}, λ} (x_{i}))$ and $b_{m + k} = \frac{1}{λ l} (g (x_{m + k}) - \frac{\partial f_{\bar{X}, λ} (x_{m + k})}{\vec{n}})$ into (40), we have (21).

By (29), we have:

$\begin{array}{l} f_{λ} (\cdot) = \frac{1}{λ} \int_{B^{d}} (f^{*} (x) - f_{λ} (x)) K_{x}^{μ} (\cdot) d ρ_{B^{d}} \\ + \frac{1}{λ} \int_{S^{d - 1}} (g (x) - \frac{\partial f_{λ} (x)}{\vec{n}}) \frac{\partial K_{x}^{μ} (\cdot)}{\partial \vec{n}} d ρ_{S^{d - 1}} . \end{array}$ (41)

Taking $G_{λ, f^{*}} (x) = \frac{1}{λ} (f^{*} (x) - f_{λ} (x))$ and $p_{λ, g} (x) = \frac{1}{λ} (g (x) - \frac{\partial f_{λ} (x)}{\partial \vec{n}})$ into (41), we have (22).

Proof of Theorem 2.1. Collecting (33), (34) and (23), we have:

$\begin{array}{l} {‖ f_{z, λ} - f^{*} ‖}_{L^{2} (ρ_{B^{d}})} + {‖ \frac{\partial f_{z, λ}}{\partial \vec{n}} - g ‖}_{L^{2} (ρ_{S^{d - 1}})} \\ \leq \frac{2 k^{2} σ}{λ \sqrt{m δ}} + k (\frac{\sqrt{K (f^{*}, g, λ)}}{λ^{\frac{3}{2}} \sqrt{m}} + \frac{{‖ f^{*} ‖}_{C (B^{d})}}{λ \sqrt{m}}) \log \frac{4}{δ} \\ + \frac{k \sqrt{K (f^{*}, g, λ)}}{\sqrt{l δ}} + \sqrt{2} \sqrt{K (f^{*}, g, λ)} . \end{array}$ (42)

(42) yields (24).

Founding

Supported partially by the NSF (Project No. 61877039), the NSFC/RGC Joint Research Scheme (Project No. 12061160462 and N_CityU102/20) of China.

Appendices

Appendix 1. Gâteaux Derivative and the Convex Function

Following Proposition A1 can be found from the Proposition 17.4, Proposition 17.10 and Proposition 17.12 of [38] .

Proposition A1. Let $(H, {‖ \cdot ‖}_{H})$ be a Hilbert space and $F (f) : H \to R \cup {\mp \infty}$ be a function defined on $H$ . Then,

1) If $F (f)$ is a convex function, then, $F (f)$ attains minimal value at $f_{0}$ if and only if $\nabla_{f} F (f_{0}) = 0$ .

2) If $F (f) : H \to R \cup {\mp \infty}$ is a Gâteaux differentiable function, then $F (f)$ is a convex on $H$ if and only if for any $f, g \in H$ , there holds:

$F (g) - F (f) \geq {〈 g - f, \nabla_{f} F (f) 〉}_{H} .$

In particular, we have:

$x^{2} - y^{2} \geq 2 y (x - y), \forall x, y \in R .$ (43)

3) For function $F (f) = {‖ f ‖}_{H}^{2}$ , there holds $\nabla_{f} F (f) = 2 f$ and there holds equality:

${‖ f ‖}_{H}^{2} - {‖ g ‖}_{H}^{2} = {〈 f - g, 2 g 〉}_{H} + {‖ f - g ‖}_{H}^{2},$ (44)

i.e.

${‖ f ‖}_{H}^{2} - {‖ g ‖}_{H}^{2} = {〈 f - g, \nabla_{g} F (g) 〉}_{H} + {‖ f - g ‖}_{H}^{2} .$

Appendix 2. Derivatives Reproducing Property

Let $Ω \subset R^{d}$ be a compact subset which is the closure of its nonempty interior $Ω^{0}$ . Let $K (x, y)$ be a Mercer kernel on $Ω \times Ω$ having the expansion (see e.g. [39] ):

$K (x, y) = \sum_{k = 0}^{+ \infty} λ_{k} φ_{k} (x) φ_{k} (y), x, y \in Ω,$ (45)

where the convergence is absolute (for each $x, y \in Ω$ ) and uniform on $Ω \times Ω$ . Then, we have a proposition.

Proposition A2. Let $K (x, y)$ be a Mercer kernel of form (45) and $K \in C^{(1)} (Ω \times Ω)$ . If $H_{K}$ is a reproducing kernel Hilbert space such that:

$f (x) = {〈 f, K (\cdot, x) 〉}_{H_{K}}, f \in H_{K}, x \in Ω .$

Then,

$\partial_{x}^{α} f (x) = {〈 f, \partial_{x}^{α} K (\cdot, x) 〉}_{H_{K}^{\vec{n}} (ρ_{Ω})}, | α | \leq 1, f \in H_{K}, x \in Ω,$ (46)

where $| α | = \sum_{i = 1}^{d} α_{i} \leq 1$ .

Proof. It can be found from Theorem 1 in [12] , or see (v) in Theorem 4.7 in [40] .

Appendix 3. A Probability Inequality

Proposition A4. [41] Let $ξ$ be a random variable taking values in a real separable Hilbert space H on a probability space $(Ω, F, P)$ . Assume that there are a positive constant L such that ${‖ ξ ‖}_{H} \leq L$ . Then, for all $n \geq 1$ and $0 < η < 1$ , it holds, with confidence $1 - η$ , that:

${‖ \frac{1}{n} \sum_{i = 1}^{n} ξ (Ω_{i}) - E (ξ) ‖}_{H} \leq \frac{4 L}{\sqrt{n}} \log \frac{2}{η} .$ (47)

Conflicts of Interest

The authors declare no conflicts of interest regarding the publication of this paper.

References

[1]	Atkinson, K., Hansen, O. and Chien, D. (2011) A Spectral Method for Elliptic Equations: The Neumann Problem. Advances in Computational Mathematics, 34, 295-317. https://doi.org/10.1007/s10444-010-9154-3
[2]	Atkinson, K., Chien, D. and Hansen, O. (2010) A Spectral Method for Elliptic Equation: The Dirichlet Problem. Advances in Computational Mathematics, 33, 169-189. https://doi.org/10.1007/s10444-009-9125-8
[3]	Atkinson, K. and Hansen, O. (2010) A Spectral Method for the Eigenvalue Problem for Elliptic Equations. Electronic Transactions on Numerical Analysis, 37, 386-412.
[4]	Li, X. (2009) Approximation of Potential Integral by Radial Bases for Solutions of Helmholtz Equation. Advances in Computational Mathematics, 30, 201-230. https://doi.org/10.1007/s10444-008-9065-8
[5]	Li, X. (2008) Rate of Convergence of the Method of Fundamental Solutions and Hyperinterpolation for Modified Helmholtz Equations on the Unit Ball. Advances in Computational Mathematics, 29, 393-413. https://doi.org/10.1007/s10444-007-9056-1
[6]	Li, X. (2008) Convergence of the Method of Fundamental Solutions for Poisson’s Equation on the Unit Sphere. Advances in Computational Mathematics, 28, 269-282. https://doi.org/10.1007/s10444-006-9022-3
[7]	Cialenco, I., Fasshauer, G.E. and Ye, Q. (2012) Approximation of Stochastic Partial Differential Equations by a Kernel-Based Collocation Method. International Journal of Computer Mathematics, 89, 2543-2561. https://doi.org/10.1080/00207160.2012.688111
[8]	Ding, L.L., Liu, Z.Y. and Xu, Q.Y. (2021) Multilevel RBF Collocation Method for the Fourth-Order Thin Plate Problem. International Journal of Wavelets, Multiresolution and Information Processing, 19, Article ID: 2050079. https://doi.org/10.1142/S0219691320500794
[9]	Fasshauer, G.E. and Ye, Q. (2013) Kernel-Based Collocation Methods versus Galerkin Finite Element Methods for Approximating Elliptic Stochastic Partial Differential Equation. In: Griebel, M. and Schweitzer, M., Eds., Meshfree Methods for Partial Differential Equations VI, Springer, Berlin, 155-170. https://doi.org/10.1007/978-3-642-32979-1_10
[10]	Fasshauer, G.E. and Ye, Q. (2012) A Kernel-Based Collocation Method for Elliptic Partial Differential Equations with Random Coefficients. In: Dick, J., Kuo, F., Peters, G. and Sloan, I., Eds., Monte Carlo and Quasi-Monte Carlo Methods 2012, Springer, Berlin, 331-347. https://doi.org/10.1007/978-3-642-41095-6_14
[11]	Ye, Q. (2014) Approximation of Nonlinear Stochastic Partial Differential Equations by a Kernel-Based Collocation Method. International Journal of Applied Nonlinear Science, 1, 156-172. https://doi.org/10.1504/IJANS.2014.061018
[12]	Zhou, D.X. (2008) Derivative Reproducing Properties for Kernel Methods in Learning Theory. Journal of Computational and Applied Mathematics, 220, 456-463. https://doi.org/10.1016/j.cam.2007.08.023
[13]	Bao, K.J., Qian, X., Liu, Z.Y. and Song, S.B. (2022) An Operator Learning Approach via Function—Valued Reproducing Kernel Hilbert Space for Diferential Equations. arXiv: 2202.09488.
[14]	Mo, Y. and Qian, T. (2014) Support Vector Machine Adapted Tikhonov Regularization Method to Solve Dirichlet Problem. Applied Mathematics and Computation, 245, 509-519. https://doi.org/10.1016/j.amc.2014.07.089
[15]	Sheng, B.H., Zhou, D.P. and Wang, S.H. (2022) The Kernel Regularized Learning Algorithm for Solving Laplace Equationn with Dirichlet Boundary. International Journal of Wavelets, Multiresolution and Information Processing, 20, Article ID: 2250031. https://doi.org/10.1142/S021969132250031X
[16]	Stepaniants, G. (2023) Learning Partial Differential Equations in Reproducing Kernel Hilbert Spaces. Journal of Machine Learning Research, 24, 1-72
[17]	Harosko, D.D. and Triebel, H. (2008) Distributions, Sobolev Spaces, Elliptic Equations. European Mathematical Society, Helsinki. https://doi.org/10.4171/042
[18]	Belkin, M. and Niyogi, P. (2004) Semi-Supervised Learning on Riemannian Manifolds. Machine Learning, 56, 209-239. https://doi.org/10.1023/B:MACH.0000033120.25363.1e
[19]	Belkin, M., Niyogi, P. and Sindhwani, V. (2006) Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples. Journal of Machine Learning Research, 7, 2399-2434.
[20]	Niyogi, P. (2013) Manifold Regularization and Semi-Supervised Learning: Some Theoretical Analysis, Journal of Machine Learning Research, 14, 1229-1250.
[21]	Sheng, B.H. and Zhu, H.C. (2018) The Convergence Rate of Semi-Supervised Regression with Quadratic Loss. Applied Mathematics and Computation, 321, 11-24. https://doi.org/10.1016/j.amc.2017.10.033
[22]	Smale, S. and Zhou, D.X. (2004) Shannon Sampling and Function Reconstruction from Point Values. Bulletin of the American Mathematical Society, 41, 279-305. https://doi.org/10.1090/S0273-0979-04-01025-0
[23]	Cucker, F. and Smale, S. (2002) On the Mathematical Foundations of Learning Theory. Bulletin of the American Mathematical Society, 39, 1-49. https://doi.org/10.1090/S0273-0979-01-00923-5
[24]	Cucker, F. and Zhou, D.X. (2007) Learning Theory: An Approximation Theory Viewpoint. Cambridge University Press, Cambridge. https://doi.org/10.1017/CBO9780511618796
[25]	Chen, H., Zhou, Y.C., Tang, Y.Y., Li, L.Q. and Pan, Z.B. (2013) Convergence Rate of the Semi-Supervised Greedy Algorithm. Neural Networks, 44, 44-50. https://doi.org/10.1016/j.neunet.2013.03.001
[26]	Sheng, B.H. and Xiang, D.H. (2017) The Performance of Semi-Supervised Laplacian Regularized Regression with Least Square Loss. International Journal of Wavelets, Multiresolution and Information Processing, 15, Article ID: 1750016. https://doi.org/10.1142/S0219691317500163
[27]	Sheng, B.H., Xiang, D.H. and Ye, P.X. (2015) Convergence Rate of Semi-Supervised Gradient Learning. International Journal of Wavelets, Multiresolution and Information Processing, 13, Article ID: 1550021. https://doi.org/10.1142/S0219691315500216
[28]	Sheng, B.H. and Zhang, H.Z. (2020) Performance Analysis of the LapRSSLG Algorithmin Learning Theory. Analysis and Applications, 18, 79-108. https://doi.org/10.1142/S0219530519410033
[29]	Wang, K.Y. and Li, L.Q. (2000) Harmonic Analysis and Approximation on the Unit Sphere. Science Press, Beijing.
[30]	Delgado, A.M., Fernández, L., Lubinsky, D., Pérez, T.E. and Piñar, M.A. (2016) Sobolev Orthogonal Polynomials on the Unit Ball via Outward Normal Derivatives. Journal of Mathematical Analysis and Applications, 440, 716-740. https://doi.org/10.1016/j.jmaa.2016.03.041
[31]	Jordao, T. and Menegatto, V.A. (2012) Reproducing Properties of Differentiable Mercer-Like Kernels on the Sphere. Numerical Functional Analysis and Optimization, 33, 1221-1243. https://doi.org/10.1080/01630563.2012.660590
[32]	Castro, M.H., Menegatto, V.A. and Oliveira, C.P. (2013) Laplace-Beltrami Differentiability of Positive Definite Kernels on the Sphere. Acta Mathematica Sinica, English Series, 29, 93-104. https://doi.org/10.1007/s10114-012-1067-2
[33]	Ferreira, J.C. and Menegatto, V.A. (2013) Positive Definiteness, Reproducing Kernel Hilbert Spaces and Beyond. Annals of Functional Analysis, 4, 64-88. https://doi.org/10.15352/afa/1399899838
[34]	Sun, H.W. and Wu, Q. (2011) Least Square Regression with Independent Kernels and Coefficient Regularization. Applied and Computational Harmonic Analysis, 30, 96-109. https://doi.org/10.1016/j.acha.2010.04.001
[35]	Zhang, J., Wang, J.L. and Sheng, B.H. (2011) Learning from Regularized Regression Algorithms with P-Order Markov Chain Sampling. Applied Mathematics: A Journal of Chinese Universities, 226, 295-306. https://doi.org/10.1007/s11766-011-2701-y
[36]	Sheng, B.H. and Wang, J.L. (2024) Moduli of Smoothness, K-Functionals and Jackson-Type Inequalities Associated with Kernel Function Approximation in Learning Theory. Analysis and Applications. https://doi.org/10.1142/S021953052450009X
[37]	Dai, F. and Xu, Y. (2013) Approximation Theory and Harmonic Analysis on Spheres and Balls. Springer-Verlag, New York. https://doi.org/10.1007/978-1-4614-6660-4
[38]	Bauschke, H.H. and Combettes, P.L. (2010) Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer-Verlag, New York. https://doi.org/10.1007/978-1-4419-9467-7
[39]	Aronszajn, N. (1950) Theory of Reproducing Kernels. Transactions of the American Mathematical Society, 68, 337-404. https://doi.org/10.1090/S0002-9947-1950-0051437-7
[40]	Ferreira, J.C. and Menegatto, V.A. (2012) Reproducing Properties of Differentiable Mercer-Like Kernels. Mathematische Nachrichten, 285, 959-973. https://doi.org/10.1002/mana.201100072
[41]	Smale, S. and Zhou, D.X. (2007) Learning Theory Estimates via Integral Operators and Their Applications. Constructive Approximation, 26, 153-172. https://doi.org/10.1007/s00365-006-0659-y

Journals Menu

Follow SCIRP

	+1 323-425-8868
	customer@scirp.org
	+86 18163351462(WhatsApp)
	1655362766

	Paper Publishing WeChat

Journals Menu

Home

About SCIRP

Service

Policies