Solving Neumann Boundary Problem with Kernel-Regularized Learning Approach ()
1. Introduction
It is known that approximation theory and skills have been used to give the analytic solution for PDE with boundary value problems and form the method of fundamental solutions (see e.g. [1] - [6] ; Appendices 1-3). Recently, the kernel-based collocation method for solving several PDEs with boundary problems has been developed (see e.g. [7] [8] [9] [10] [11] ) from the view of minimal norm interpolation of reproducing kernel Hilbert spaces (RKHSs), the existent theorem and the representation theorem for the numerical solutions are shown qualitatively. It is suggested by [12] that kernel-regularized gradient learning may be used to give numerical solution for PDE. Indeed, some kernel-regularized learning algorithms have been used to study the PDE with Dirichlet boundary value problem quantitatively (see e.g. [13] [14] [15] [16] ). For a given domain
, the pairwise distinct collocation points chosen in the kernel-based collocation method are:
and the sample values from the PDE (where P and B are given differential operators):
(1)
at the given collocation, points are (see [7] [11] ):
where the random variable
.
There are two typical PDE problems (see Chapter 1 of [17] ). When
is the Laplace operator and
is the unit operator, we have the Dirichlet problem:
(2)
When
is the unit operator and
is the directional derivative along the outward normal vector
, i.e.
, problem (1) become the Neumann boundary problem:
(3)
The observations
can be regarded as the supervised labeled observations in the setting of semi-supervised learning and
maybe regarded as the unlabeled ones. This similarity encourages us to construct kernel-regularized learning algorithms to give numerical solution for problem (3) referring to the kernel semi-supervised learning frameworks (see e.g. [18] [19] [20] [21] ). Along this line, we constructed in [15] a kind of kernel-regularized learning algorithm for solving problem (2) and showed the convergence rate. In the present paper, we shall construct a kind of kernel-regularized regression algorithm to solve problem (2) in case that D is the unit ball and
is the unit sphere. To this aim, we restate problem (3) in the setting of Sobolev spaces.
Let
be a given bounded closed domain with a smoothness surface (i.e. the outward normal derivative is continuous) and
be a Borel measure on Ω. Let
denote the set of functions such that
and
, where for
we define
and
For a
, we define:
Denoted by
, the set of all functions whose 1-order partial derivatives are all in
, i.e.
and
. Defined by
, the outward normal derivative operator, i.e.
, where
and
is the outward normal vector at
.
To borrow the setting of learning theory (see [22] ), we rewrite the problem (3). Let
be an unknown function and g be a given function. Then, the collocation points
and
for the Neumann boundary problem:
(4)
are scattered points with values
,
, and
,
, where for a given
,
is a random variable subject to a condition distribution
satisfying
, B is a given constant number,
and
and
. The correspondence of (4) is:
(5)
We shall give an investigation on the convergence analysis of problem (5) when
is the unit ball
and
is the unit sphere
. The paper is organized as follows. In Section 2.1, we shall provide some notions on the kernel-regularized regression learning, which contain the concept of reproducing kernel Hilbert spaces, the concept of kernel-regularized regression learning model and the kernel-regularized semi-supervised regression learning model. In Section 2.2, we shall provide some notions and results on spherical analysis. In particular, we shall define an RKHS with spherical harmonics, which have the reproducing property for the outward normal vector operator. The Neumann boundary value problem with the RKHS as the hypothesis space is defined. Based on these notions, we define in Section 2.3 a kind of kernel-regularized learning algorithm for solving the Neumann boundary value problem, show the representation theorem and give the error decomposition, with which we give the learning rates (i.e. the main Theorem 2.1) in Section 2.4. In Section 3, we shall give some lemmas, which will be used to show Theorem 2.1 in Section 4. Section 5 is the appendices containing some knowledge about the convex function, a kind of RKHS
defined in a Sobolev space
associating a general domain
and its boundary
, and a probability inequality defined on a general RKHS.
Throughout the paper, we denote by
the fact that there is a constant C independent of A and B such that
. We say
if both
and
.
2. Kernel-Regularized Regression and Error Analysis
We first provide some notions and results about the kernel-regularized regression learning problem.
2.1. Notions and Results on Kernel-Regularized Regression
Follow all the definitions and notions in Section 1. Let
be a Mercer kernel (i.e. it is continuous, symmetry and positive semi-definite, i.e. for any given integer
and any given set
, the matrix
is positive definite) satisfying
. The reproducing kernel Hilbert space
associated with
is a Hilbert space consisting of all the real functions defined on Ω such that:
Define an operator
as:
Then,
. Denoted by
the k-th eigenvalue associated with eigenfunction
. Then, we have by the Mercer theorem that:
where we assume the convergence on the right side is absolute (for every
) and uniform on
. If
forms an orthonormal system in
, then:
where
.
Let
be a set of observations about an unknown function
. Then, to obtain an good approximation of
, one usually borrow the kernel-regularized regression learning model:
(6)
where
is the empirical variance. In learning theory, the convergence analysis is sum up to bound the convergence rate for the error ( [23] or [24] ):
(7)
When the observation set z has the form
, i.e. the nodes
have no labeled observation values
, we call this case the semi-supervised learning. In practical applications, most of the observations belong to semi-supervised samples since data is precious. Many mathematicians have paid their attentions to this field (see e.g. [18] [19] [21] [25] [26] [27] [28] ). The main ideas of dealing with this problem are add a term to make the use of unlabeled data, i.e. we need to modify (8) as the form of:
(8)
For example, in [19] , one choose
, where
are edge weights in the data adjacency graph. Also, in [21] , one chooses:
where
and
These choices encourage us to choose suitable
to give good approximation solution for problem (5).
2.2. Some Results on Spherical Analysis
In this subsection, we shall define a kind of RKHS with spherical harmonics, with which define a kernel-regularized regression learning algorithm for solving problem (5) when Ω is the unit ball
and show the learning rates.
Let
denote the unit ball in d-dimensional Euclidean space
with the usual inner product
, and
is the usual Euclidean norm. For weight
,
, we denote by
,
, the space of measurable functions defined on
with:
and for
, we assume that
denotes the space
of continuous functions on
with the uniform norm.
We denote by
the space of all polynomials in d variables of degree at most n, and by
the space of all polynomials of degree n which are orthogonal to polynomials of low degree in
. The
is mutually orthogonal in
and (see [29] ):
Let
denote the Lebesgue measure on
and denote the area of
by
,
. Let
denote the space of homogeneous harmonic polynomials of degree n, which are homogeneous polynomials of degree n satisfying equation
. Also, we denote by
the set of homogeneous polynomials of degree n. It is well known that:
Let
denote the set of functions whose 1-th derivatives are all in
, i.e.
In this case,
and
(9)
Define a subclass of
as:
An inner product defined on
is:
Denoted by
the space of orthogonal polynomials with respect to
. Then, by Theorem 3 of [30] , we know
contains a mutually orthonormal basis
with respect to
. Then, there holds the expansion:
where
and by the Bessel inequality, we have:
Let
be a Mercer kernel with the form:
(10)
where
and
with
being defined as
.
Define
and
. Then,
For
and
, we define:
Then, we shall show in Proposition 2.1 that
is an RKHS associated with kernel (10).
To give quantitatively description for the kernel
, we give two assumptions.
Assumption A. Assume
.
Assumption B. Assume
and
(11)
Since
are algebraic polynomials,
and
must exist. The real numbers
satisfy (11) are also existent, for example, we can take
.
If (11) holds, then by Theorem 2.4 in [31] , or Theorem 4.2 in [32] or Proposition 6.2 in [33] that:
(12)
and the convergence in the right-side of (12) is absolute and uniform on
.
Proposition 2.1. Assume above Assumptions A and B hold. Then,
1) There holds the reproducing property:
(13)
2) There holds the reproducing property for the outward normal vector operator, i.e.
(14)
3) Define
Then, for all
and
, we have:
(15)
Proof. See the proofs in Section 4.
Let
be observations drawn i.i.d. according to
,
,
and for a given
is a random variable subject to a condition distribution
satisfying
(B is a given constant number),
,
and
. Then,
can be regarded as observations drawn i.i.d. according to
and
be samples drawn i.i.d. according to
. The correspondence of problem (5) then is:
(16)
We shall give an investigation on the numerical solutions of problem (16) with kernel-regularized approaches. A kernel learning algorithm with
being the hypothesis space will be defined in Section 2.2. The representation theorem for it is provided and an error decomposition for its error analysis is given, from which a learning rate for Algorithm (16) is shown.
2.3. Kernel-Regularized Regression
With above notions in hand, we now give following kernel-regularized learning algorithms for giving solutions for problem (16):
(17)
where
, i.e. there exist
such that
and
Corresponding to (17), we define a model for the observations without the disturbances
by:
(18)
where
The integral model for (18) is defined as:
(19)
where
Proposition 2.2. (Representation theorem). Assume above Assumptions A and B hold. Then,
1) Algorithm (17) has unique solution
and there are coefficients
depending upon
and g such that:
(20)
2) Algorithm (18) has unique solution
and there are coefficients
depending upon
and g such that:
(21)
3) Algorithm (19) has unique solution
and there is a function
depending upon
and a function
depending upon
and g such that:
(22)
Proof. See the proof in Section 4.
(20) shows that Algorithm (17) can be replaced by some coefficient regularized models and is a new topic, such kind of research can be found from literature [34] [35] .
We give the following error decomposition:
(23)
where in the last derivation, we have used (15) and
which controls the approximation errors. Then, to bound the error:
we need only to bound upper bounds for the sample errors
and
respectively.
2.4. Learning Rates
Theorem 2.1. Let the above Assumptions A and B hold and let
be the solution of (17) and let
and
. Then, for any
, with confidence
, holds:
(24)
If
for
, then
and in this case,
where the norm
is defined in Section 2.2. The decay for
has recently been discussed in [36] .
We then have the following Corollary 2.1.
Corollary 2.1. Let Assumptions A and B hold and let
be the solution of (17) and
. If
for
. Then, for any
, with confidence
, holds:
(25)
By (25), we know if
is chosen in such a way that
,
when
, then, with confidence
, holds (since
):
(26)
2.5. Further Discussions
We now give some explanation on the results and the assumptions.
1) There are three reasons encouraging us to choose
and
to show the kernel-regularized regression model for solving the Neumann boundary problem (5).
i) When
and
, we can easily give explicit representation for outward normal derivatives (see (9)). Therefore, we can extend the method used in this paper to the domains whose outward normal derivatives may be computed easily.
ii) By the statement in Section 2.1, we know a tool for us to construct a reproducing kernel Hilbert space is the orthonormal basis. We notice that function orthogonal basis theory has been developed not only on the unit ball
(see [37] ) and the unit sphere
(see [29] ), but also on the some Sobolev space associated with both
and
(see e.g. [30] ). These facts also encourage us to choose
and
for problem (5).
iii) In learning theory, the learning rate estimate for the kernel-regularized learning algorithm are sum up to bound the error bounds, which belong to the scope of approximation theory. One can make use of the rich spherical approximation theory skills to bound the learning rates (see [24] ).
2) In the present paper, we give two Assumptions A and B. They are reasonable.
Since
is a bivariate polynomial on
for a given k,
is still a polynomial whose sup can be attained on
. The Assumption B is reasonable. By the same way, we know
can be attained. If we choose
, then, we can have
. So, the Assumption A is reasonable as well.
3) The convergence in the present paper are uniform convergence (see (26)), which admits the reliability for the proposed method.
3. Lemmas
To give the feature description of the optimal solutions of Algorithms (17)-(19), we need the concept of Gâteaux derivative.
Let
be a Hilbert space,
be a real function. We say F is Gâteaux differentiable at
if there is a
such that for any
there holds:
and write
as the Gâteaux derivative of
at f.
To prove Theorem 2.1, we need some lemmas.
Lemma 3.1. There hold the following equations:
1)
satisfies equation:
(27)
2)
satisfies equation:
(28)
3)
satisfies equation:
(29)
Proof. Proof of 1). Define
as:
Then,
where
and
(30)
It follows
By the definition of Gâteaux derivative, we have:
By Fermat’s rule (see 1) in Proposition A1) and the definition of
, we have
, i.e. (27) holds.
Lemma 3.2. There hold the inequality:
(31)
and the inequality:
(32)
Proof. The definition of
and the inequality (43) give:
where we have used (44). Since (28) and
, we have:
It follows:
(31) thus holds.
Proof of (32). Define
as:
Then, by the definition of
and inequalities (43) and (44), we have:
By (29), we have:
It follows:
Above inequality gives (32).
Lemma 3.3. There hold following inequalities.
1) For any
, with confidence
, holds:
(33)
2) For any
, with confidence
, holds:
(34)
Proof. Proof of (33). The definition of
and (14) give:
By Markov inequality, we have:
(35)
Then, by (31), we have:
Since
, we have:
By (35), we have:
Taking
. We have
and (33) thus holds.
We now show (34). Take
and
Then,
(36)
where
Since
and
, we have by (47) that, with confidence
, holds:
(37)
On the other hand, take
. Then,
By the definition of
, we have:
Since (14), we have:
and
It follows that:
Since
are i.i.d. according to
, we have:
where we have used the fact that:
since
is a positive definition function about u and x. According to Assumption B, we have:
It follows that:
By Markov inequality, we have:
Take
. Then,
. It follows that with confidence
:
(38)
Collecting (38), (37) and (36), we arrive at (34).
4. Proofs
Proof of Proposition 2.1. Proof of 1). For any
. We rewrite
as:
Then,
which yields (13).
Proof of 2). Since (9) and (46), we have:
(14) thus holds.
Proof of 3). By (14), we have:
By the same method, we have by (13) that:
(15) thus holds.
Proof of Proposition 2.2. By (27), we have:
(39)
Taking
and
into (40), we have (20).
By (29), we have:
(40)
Taking
and
into (40), we have (21).
By (29), we have:
(41)
Taking
and
into (41), we have (22).
Proof of Theorem 2.1. Collecting (33), (34) and (23), we have:
(42)
(42) yields (24).
Founding
Supported partially by the NSF (Project No. 61877039), the NSFC/RGC Joint Research Scheme (Project No. 12061160462 and N_CityU102/20) of China.
Appendices
Appendix 1. Gâteaux Derivative and the Convex Function
Following Proposition A1 can be found from the Proposition 17.4, Proposition 17.10 and Proposition 17.12 of [38] .
Proposition A1. Let
be a Hilbert space and
be a function defined on
. Then,
1) If
is a convex function, then,
attains minimal value at
if and only if
.
2) If
is a Gâteaux differentiable function, then
is a convex on
if and only if for any
, there holds:
In particular, we have:
(43)
3) For function
, there holds
and there holds equality:
(44)
i.e.
Appendix 2. Derivatives Reproducing Property
Let
be a compact subset which is the closure of its nonempty interior
. Let
be a Mercer kernel on
having the expansion (see e.g. [39] ):
(45)
where the convergence is absolute (for each
) and uniform on
. Then, we have a proposition.
Proposition A2. Let
be a Mercer kernel of form (45) and
. If
is a reproducing kernel Hilbert space such that:
Then,
(46)
where
.
Proof. It can be found from Theorem 1 in [12] , or see (v) in Theorem 4.7 in [40] .
Appendix 3. A Probability Inequality
Proposition A4. [41] Let
be a random variable taking values in a real separable Hilbert space H on a probability space
. Assume that there are a positive constant L such that
. Then, for all
and
, it holds, with confidence
, that:
(47)