Kernel-Based Partial Conditional Mean Dependence

Zhentao Tian; Zhongzhan Zhang

doi:10.4236/ojs.2025.153015

Open Journal of Statistics > Vol.15 No.3, June 2025

Kernel-Based Partial Conditional Mean Dependence

Zhentao Tian^*, Zhongzhan Zhang
School of Mathematics, Statistics and Mechanics, Beijing University of Technology, Beijing, China.
DOI: 10.4236/ojs.2025.153015 PDF HTML XML 1 Downloads 19 Views

Abstract

We introduce the Kernel-based Partial Conditional Mean Dependence, a scalar-valued measure of conditional mean dependence of $Y$ given $X$ , while adjusting for the nonlinear dependence on $Z$ . Here $X$ , $Y$ and $Z$ are random elements from arbitrary separable Hilbert spaces. This measure extends the Kernel-based Conditional Mean Dependence. As the estimator of the measure is developed, the concentration property of the estimator is proved. Numerical results demonstrate the effectiveness of the new dependence measure in the context of dependence testing, highlighting their advantages in capturing nonlinear partial conditional mean dependencies.

Keywords

Partial Conditional Mean Dependence, Hilbert Space, High Dimension, Test of Independence

Share and Cite:

Tian, Z. and Zhang, Z. (2025) Kernel-Based Partial Conditional Mean Dependence. Open Journal of Statistics, 15, 294-311. doi: 10.4236/ojs.2025.153015.

1. Introduction

Before constructing a regression model, it is important to determine whether the covariate $X$ has an effect on the response $Y$ . As pointed out by [1], in most cases, we are more concerned with the conditional mean of the response. Thus, the conditional mean dependence has received attention, which measures the departure of $E [Y | X]$ from $E [Y]$ . When $X$ has no effect on the conditional mean of $Y$ , i.e., $E [Y | X] = E [Y]$ , $X$ should not be included in a conditional mean regression model. In practice, based on historical analysis or domain knowledge, some covariates related to response will be known. Our aim is to determine whether $X$ has contribution on the conditional mean of $Y$ after controlling the affect from the known variable $Z$ .

Regarding partial dependence, work has been increasing recently. An intuitive approach to measure partial conditional mean dependence is

$E {[E (Y | X, Z) - E (Y | Z)]}^{2} .$ (1)

Based on a plug-in estimator of equation (1), [2] developed a partial conditional mean independence test. However, as pointed out by [3], under the null hypothesis, i.e., when quantity (1) equals zero, the test statistic has a degenerate distribution. To deal with the degenerate limit distribution, [3] developed a significance test based on the black-box learner, [4] proposed a general framework to evaluate feature importance, and [5] considered measuring the partial dependence based on the decomposition formula of the conditional variance. These methods combine machine learning with sample splitting. Therefore, to a certain extent, they suffer from the loss of power caused by sample splitting. Another issue is that they only consider scalar responses and cannot handle vector or functional responses. In the field of vector or functional data analysis, conditional mean regression is an important analytical tool (see [6] for a regression model with vector response, see [7]-[9]) for regression models with function response among others), and it is necessary to consider the partial conditional mean dependence for vector or functional response.

To our knowledge, among these tools for partial conditional mean dependence, the Partial Martingale Difference Divergence (pMDD), as introduced in [10], is currently the only one applicable to response variables in Hilbert space. pMDD is a scalar-valued measure of conditional mean dependence of $Y$ given $X$ , adjusting for the nonlinear dependence on $Z$ , where $X$ , $Y$ and $Z$ are random vectors of arbitrary dimensions. It extends the martingale difference divergence (MDD) introduced in [11]. However, as shown in [12] [13], the performance of MDD suffers from the curse of dimensionality. Let $(X^{'}, Y^{'})$ be an independent copy of $(X, Y)$ , and let $MDD (Y | X)$ be the martingale difference divergence of $Y$ given $X$ . When $X = (X_{1}, \dots, X_{p}) \in R^{p}$ and $Y \in R$ , [12] shows that

$MDD (Y | X) \approx \frac{1}{\sqrt{τ}} \sum_{i = 1}^{p} cov (X_{i}, Y),$

where $τ = E {‖ X - X^{'} ‖}^{2}$ , and $cov (X_{i}, Y)$ is the covariance of $X_{i}$ and $Y$ . Since the covariance only captures the linear dependence, the martingale difference divergence may have less power when it is employed to detect nonlinear relationships, especially in the cases of high dimensions. pMDD, as an extension of MDD, will suffer from the curse of dimensionality for the same reason. This phenomenon can be found in the numerical results in Section 4.

In this paper, we introduce a new tool to measure the partial conditional mean dependence for vector or functional responses. The numerical experiments in [13] demonstrated the advantages of kernel-based conditional mean dependence over MDD in identifying nonlinear relationships and handling high-dimensional variables. This prompts us to develop a tool based on kernel methods for measuring partial conditional mean dependence. Our development follows that in [10], so we name our tool as Kernel-based Partial Conditional Mean Dependence. Simulation results show that Kernel-based Partial Conditional Mean Dependence has an advantage over Partial Martingale Difference Divergence in identifying the nonlinear dependence of $Y$ ’s conditional mean on $X$ after controlling for $Z$ .

The rest of the paper is organized as follows. In Section 2, we review the kernel-based conditional mean dependence measure. In Section 3, we explore the procedure of constructing Kernel-based Partial Conditional Mean Dependence, and give its sample analogy. A group of finite sample simulation studies is carried out in Section 4. In Section 5, some discussions are included. All technical proofs are presented in the Appendix.

2. Kernel-Based Conditional Mean Dependence

Before formally introducing the Kernel-based Partial Conditional Mean Dependence, it is necessary to review a tool for measuring conditional mean Dependence—Kernel-based Conditional Mean Dependence.

As proposed by [13], the kernel-based conditional mean dependence (KCMD) is defined as

$KCMD (Y | Z) = E 〈 Y - E Y, Y^{'} - E Y 〉 K_{Z} (Z, Z^{'}),$

where $Y$ , $Z$ are separable Hilbert spaces, $Y$ , $Z$ are random elements valued in $Y$ , $Z$ , respectively. $(Z^{'}, Y^{'})$ is an independent copy of $(Z, Y)$ , $K_{Z} (\cdot, \cdot)$ is a characteristic kernel (For details on characteristic kernels, please refer to [14] [15]) defined on $Z \times Z$ . We specifically point out that in this article, the kernel function $K_{Z} (\cdot, \cdot)$ used in KCMD is not fixed. Its subscript $Z$ indicates that the form and domain of the kernel function depend on $Z$ and the space in which $Z$ is located. Through the paper, $〈 \cdot, \cdot 〉$ and $‖ \cdot ‖$ represent inner products and norms respectively. Kernel-based conditional mean dependence (KCMD) is intended to measure departure from the relationship

$E (Y | Z) = E (Y) almost surely$

for $Y \in Y$ and $Z \in Z$ . Lemma 1 below summarizes the fundamental properties of KCMD.

Lemma 1. Suppose that $K_{Z} (\cdot, \cdot)$ is a positive definite and bounded characteristic kernel. Then $KCMD (Y | Z)$ is well defined, and

a) $KCMD (Y | Z) \geq 0$ ;

b) $KCMD (Y | Z) = 0$ if and only if $E (Y | Z) = E (Y)$ , almost surely.

Denote

$ϕ_{Z} (z, z^{'}) = K_{Z} (z, z^{'}) - E K_{Z} (z, Z^{'}) - E K_{Z} (Z, z^{'}) + E K_{Z} (Z, Z^{'}),$ (2)

$ψ_{Y} (y, y^{'}) = 〈 y - E Y, y^{'} - E Y 〉 = 〈 y, y^{'} 〉 - E 〈 y, Y 〉 - E 〈 Y, y^{'} 〉 + E 〈 Y, Y^{'} 〉,$

as shown in [13], one can give another expression of the KCMD as follows:

$KCMD (Y | Z) = E [ϕ_{Z} (Z, Z^{'}) ψ_{Y} (Y, Y^{'})] .$ (3)

Using the expression, we can provide an unbiased estimator of $KCMD (Y | Z)$ . It is closely related to the so-called $U$ -centred matrix.

$(\tilde{A} \cdot \tilde{B}) = \frac{1}{n (n - 3)} \sum_{i \neq j} {\tilde{A}}_{i j} {\tilde{B}}_{i j}$ (4)

and $| \tilde{A} | = {(\tilde{A} \cdot \tilde{A})}^{1 / 2}$ as the norm of $\tilde{A}$ . Theorem 1 in [16] shows that the linear span of all matrices in $H_{n}$ is a Hilbert space with inner product defined in (4). Using the $U$ -centered matrices, we can construct an estimator of $KCMD (Y | Z)$ . Given independent and identically distributed (i.i.d) observations ${(Z_{i}, Y_{i})}_{i = 1}^{n}$ from the joint distribution of $(Z, Y)$ , an unbiased estimator of $KCMD (Y | Z)$ provided in [13] is defined as

$\hat{KCMD} (Y | Z) = \frac{1}{n (n - 3)} \sum_{i \neq j} {\tilde{A}}_{i j} {\tilde{B}}_{i j} .$ (5)

Here, ${\tilde{A}}_{i j}$ and ${\tilde{B}}_{i j}$ are the $U$ -centred versions of matrixes $A$ and $B$ respectively, and the $(i, j)$ -th elements of $A$ is $a_{i j} = K_{Z} (Z_{i}, Z_{j})$ , the $(i, j)$ -th elements of $B$ is $b_{i j} = {‖ Y_{i} - Y_{j} ‖}^{2} / 2$ . [13] has shown that the estimator (5) is unbiased and admits a U-statistic expression

$\hat{KCMD} (Y | Z) = \frac{1}{C_{n}^{4}} \sum_{i < j < s < t} h (V_{i}, V_{j}, V_{s}, V_{t}),$ (6)

with the kernel

$h (V_{i}, V_{j}, V_{s}, V_{t}) = \frac{1}{4!} \sum_{(u, v, q, r)}^{(i, j, s, t)} (a_{u v} b_{u v} - a_{u v} b_{u q} - a_{u v} b_{v r} + a_{u v} b_{q r}),$

where $V_{i} = (Z_{i}, Y_{i})$ and the sum is over all 4! permutations of $(i, j, s, t)$ .

3. Kernel-Based Partial Conditional Mean Dependence

In this section, we introduce the kernel-based partial conditional mean Dependence (partial KCMD), which can measure the conditional mean dependence of a response $Y$ given a predictor variable $X$ after controlling some variable $Z$ , where $X$ is an random element valued in the separable Hilbert space $X$ .

3.1. Population Partial KCMD

For any symmetric function $f (g, g^{'})$ defined on $G \times G$ , define $D$ as a $U$ -centered operator, and

$D f (g, g^{'}) = f (g, g^{'}) - E f (g, G^{'}) - E f (G, g^{'}) + E f (G, G^{'}),$

where $G$ is a random element valued in Hilbert space $G$ , $G^{'}$ is an independent copy of $G$ . The functional class $ℱ = {D f : f is a symmetric function}$ is a linear space. Define the inner product on $ℱ$ as

$D f_{1} \circ D f_{2} = E D f_{1} (G, G^{'}) D f_{2} (G, G^{'}) .$

It can be verified that the map satisfies the conditions of an inner product. In addition, define the norm on $ℱ$ as $‖ D f ‖ = \sqrt{D f \circ D f}$ . Take $G = Z \times Y$ . Let $K_{Z} ((z, y), (z^{'}, y^{'})) = K_{Z} (z, z^{'})$ and $ϕ_{Z} ((z, y), (z^{'}, y^{'})) = ϕ_{Z} (z, z^{'})$ for any $(z, y), (z^{'}, y^{'}) \in Z \times Y$ . Then $ϕ_{Z} \in ℱ$ . Similarly, $ψ_{Y} \in ℱ$ . Thus $KCMD (Y | Z)$ in equation (3) can also be written as

$KCMD (Y | Z) = ϕ_{Z} \circ ψ_{Y} .$ (7)

This implies that the KCMD measures the conditional mean dependence of $Y$ on $Z$ through the inner product of $ϕ_{Z}$ and $ψ_{Y}$ in a linear space. The cosine value

$cos (ϕ_{Z}, ψ_{Y}) = \frac{ϕ_{Z} \circ ψ_{Y}}{‖ ϕ_{Z} ‖ \cdot ‖ ψ_{Y} ‖}$

measures the strength of conditional mean dependence.

Lemma 2. Suppose $‖ ϕ_{Z} ‖ \neq 0$ and $‖ ψ_{Y} ‖ \neq 0$ , then $ψ_{Y}$ can be decomposed into two orthogonal parts,

$ψ_{Y} = \frac{‖ ψ_{Y} ‖ cos (ϕ_{Z}, ψ_{Y})}{‖ ϕ_{Z} ‖} ϕ_{Z} + (ψ_{Y} - \frac{‖ ψ_{Y} ‖ cos (ϕ_{Z}, ψ_{Y})}{‖ ϕ_{Z} ‖} ϕ_{Z}) .$ (8)

The first term in (8) represents the part of $Y$ ’s conditional mean that is affected by $Z$ , the second term represents the part that can not be interpreted by $Z$ , and in a sense it corresponds to $U = Y - E (Y | Z)$ since $E (U | Z) = 0$ . Next we define $W = (X, Z)$ and $W = X \times Z$ . One way to measure the additional contribution of $X$ to the conditional mean of $Y$ controlling for $Z$ , is to measure $E (U | W)$ .

Inspired by [10], we provide definitions for the Kernel-based Partial Conditional Mean Dependence and the Kernel-based Partial Conditional Mean Correlation. Define $ϕ_{W}$ similar to $ϕ_{Z}$ , replacing $K_{Z} (z, z^{'})$ with $K_{W} (w, w^{'})$ in $ϕ_{Z}$ is $ϕ_{W}$ .

Definition 1. The population partial KCMD of $Y$ given $X$ , after controlling for the effect of $Z$ , i.e., $pKCMD (Y | X; Z)$ is defined as

$pKCMD (Y | X; Z) = ϕ_{W} \circ (ψ_{Y} - \frac{‖ ψ_{Y} ‖ cos (ϕ_{Z}, ψ_{Y})}{‖ ϕ_{Z} ‖} ϕ_{Z}) .$

If $‖ ϕ_{Z} ‖ = 0$ , then we define $pKCMD (Y | X; Z) = ϕ_{W} \circ ψ_{Y}$ .

The population Kernel-based Partial Conditional Mean Correlation(pKCMC) is defined as

$pKCMC (Y | X; Z) = \frac{pKCMD (Y | X; Z)}{‖ ϕ_{W} ‖ \cdot ‖ ψ_{Y} - \frac{‖ ψ_{Y} ‖ cos (ϕ_{Z}, ψ_{Y})}{‖ ϕ_{Z} ‖} ϕ_{Z} ‖} .$

If $‖ ϕ_{W} ‖ \cdot ‖ ψ_{Y} - \frac{‖ ψ_{Y} ‖ cos (ϕ_{Z}, ψ_{Y})}{‖ ϕ_{Z} ‖} ϕ_{Z} ‖ = 0$ , then we define $pKCMC (Y | X; Z) = 0$ .

After performing some straightforward calculations, we obtain an equivalent expression for pKCMD, which is given by

$\begin{matrix} pKCMD (Y | X; Z) = ϕ_{W} \circ ψ_{Y} - \frac{(ϕ_{Z} \circ ψ_{Y}) (ϕ_{W} \circ ϕ_{Z})}{{‖ ϕ_{Z} ‖}^{2}} \\ = KCMD (Y | W) - \frac{KCMD (Y | Z) HSIC (Z, W)}{HSIC (Z, Z)} . \end{matrix}$ (9)

The $HSIC (Z, W)$ is the Hilbert-Schmidt Independence Criterion(HSIC) between $Z$ and $W$ , it measures the dependence between these two random elements. The kernel functions used in $HSIC (Z, W)$ are $K_{Z} (\cdot, \cdot) : Z \times Z \to R$ and $K_{W} (\cdot, \cdot) : W \times W \to R$ , which, like in KCMD, depend on the variables in their subscripts. The content about HSIC can be found in [15] [17]-[19] and so on. We reviewed the specific form of HSIC in the Appendix and derive the last equation of (9). When the conditional mean of $Y$ does not depend on $Z$ or $Z$ is a constant, we have $KCMD (Y | Z) = 0$ or $HSIC (Z, Z) = 0$ . As a result, we have $pKCMD (Y | X; Z) = KCMD (Y | W) = KCMD (Y | X)$ .

3.2. Sample pKCMD

Given the sample ${(X_{i}, Y_{i}, Z_{i})}_{i = 1}^{n}$ , we want to define sample partial KCMD, denoted as ${pKCMD}_{n} (Y | X; Z)$ as the sample analog of population partial KCMD. Let $W_{i} = (X_{i}, Z_{i})$ , and define $\tilde{C}$ be $n \times n$ matrix with entries ${\tilde{C}}_{i j}$ ,

${\tilde{C}}_{i j} = {\begin{array}{l} c_{i j} - c_{i \cdot} - c_{\cdot j} + c_{\cdot \cdot}, & i \neq j \\ 0, & i = j \end{array},$

where $c_{i j} = K_{W} (W_{i}, W_{j})$ , and $c_{i \cdot}$ , $c_{\cdot j}$ and $c_{\cdot \cdot}$ are defined similarly to $a_{i \cdot}$ , $a_{\cdot j}$ and $a_{\cdot \cdot}$ .

Definition 2. Given a random sample from the joint distribution $(X, Y, Z)$ , the sample partial kernel-based conditional mean dependence of $Y$ given $X$ , after controlling for the effect of $Z$ , is given by

${pKCMD}_{n} (Y | X; Z) = (\tilde{C} \cdot \tilde{B}) - \frac{(\tilde{A} \cdot \tilde{B}) (\tilde{A} \cdot \tilde{C})}{(\tilde{A} \cdot \tilde{A})},$

assuming $(\tilde{A} \cdot \tilde{A}) \neq 0$ and $(\tilde{C} \cdot \tilde{B})$ otherwise. The sample partial kernel-based conditional mean correlation is given by

${pKCMC}_{n} (Y | X; Z) = \frac{{| \tilde{A} |}^{2} {pKCMD}_{n} (Y | X; Z)}{| \tilde{C} | \cdot | {(| \tilde{A} |)}^{2} \tilde{B} - (\tilde{A} \circ \tilde{B}) \tilde{A} |}$

If $| \tilde{C} | \cdot | {(| \tilde{A} |)}^{2} \tilde{B} - (\tilde{A} \circ \tilde{B}) \tilde{A} | = 0$ , then we define ${pKCMC}_{n} (Y | X; Z) = 0$ .

We next outline theoretical properties of the sample pKCMD. Analogous results hold for the sample pKCMC, which we omit discussing further here.

Theorem 1. If one of the following two conditions holds,

a) $K_{Z} (\cdot, \cdot)$ and $K_{W} (\cdot, \cdot)$ are bounded kernels, $E ‖ Y ‖ < \infty$ ;

b) $E K_{Z}^{2} (Z, Z^{'}) < \infty$ , $E K_{W}^{2} (W, W^{'}) < \infty$ , and $E {‖ Y ‖}^{2} < \infty$ .

Then, as $n \to \infty$ , we have ${pKCMD}_{n} (Y | X; Z) \to pKCMD (Y | X; Z)$ a.s..

We also show that ${pKCMD}_{n} (Y | X; Z)$ is concentrated. To obtain the bounds of the deviation ${pKCMD}_{n} (Y | X; Z) - pKCMD (Y | X; Z)$ , we impose the following condition.

(C1) There exists a constant $s_{0} > 0$ such that for all $0 < s \leq 2 s_{0}$ , $E \exp (s {‖ Y ‖}^{2}) < \infty$ .

Condition (C1) follows immediately when $Y$ is bounded uniformly, or when it has a Gaussian distribution. Condition (C1) is widely used in statistical research, for example, in [11] [20] [21], to analyze the theoretical properties of feature screening.

Theorem 2. If $K_{Z} (\cdot, \cdot)$ and $K_{W} (\cdot, \cdot)$ are bounded kernels, $E ‖ Y ‖ < \infty$ , and Condition (C1) holds, then for any $ϵ > 0$ , there exist constants $0 < β < 1 / 2$ , $r_{1} > 0$ and $r_{2} > 0$ such that

$\begin{array}{l} P (| {pKCMD}_{n} (Y | X; Z) - pKCMD (Y | X; Z) | \geq ϵ) \\ \leq O (exp (- r_{1} n^{1 - 2 β} ϵ^{2}) + n \exp {- r_{2} n^{β}}) . \end{array}$

Take $ϵ = η n^{- γ}$ with $0 < γ < 1 / 2$ and a constant $η > 0$ . According to Theorem 2, there exists $0 < β + γ < 1 / 2$ , such that $P (| {pKCMD}_{n} (Y | X; Z) - pKCMD (Y | X; Z) | \geq η n^{- γ}) = o (1)$ . This implies that ${pKCMD}_{n} (Y | X; Z)$ is concentrated, and the deviation between ${pKCMD}_{n} (Y | X; Z)$ and $pKCMD (Y | X; Z)$ is less than $η n^{- γ}$ with probability at least $1 - O (exp (- r_{1} n^{1 - 2 β} ϵ^{2}) + n \exp {- r_{2} n^{β}})$ .

4. Simulation

In the section, we examine tests of the null hypothesis of zero pKCMD. When calculating pKCMD, we need to choose kernel functions $K_{Z} (\cdot, \cdot)$ and $K_{W} (\cdot, \cdot)$ . We use Gaussian kernels for $K_{Z} (\cdot, \cdot)$ and $K_{W} (\cdot, \cdot)$ , defined as

$K_{Z} (z_{i}, z_{j}) = exp {- {‖ z_{i} - z_{j} ‖}^{2} / (2 σ_{Z}^{2})}$

and

$K_{W} (w_{i}, w_{j}) = exp {- {‖ w_{i} - w_{j} ‖}^{2} / (2 σ_{W}^{2})},$

respectively. For the choice of bandwidths $σ_{Z}$ and $σ_{W}$ for these kernels, we can use median heuristic ([13]). We compare our proposed method with pMDD introduced in [10]. We use permutation to obtain the critical values and take the permutation number as 300. The permutation method is described in Section 5 of [10].

The simulations take into account varying sample sizes and levels of dependence between the two random variables to evaluate the performance of tests. For each setting, the empirical sizes or powers of the tests (represented by the proportions of rejections) are recorded through 1000 repetitions at different significance levels.

Example 1 Generate the i.i.d. sample of $(X, Y, Z)$ from the following model: $X = (X_{1}, \dots, X_{p})$ , $Z = (Z_{1}, \dots, Z_{p})$ , $Y = \cos (Z)$ , where $\cos (Z) = (\cos (Z_{1}), \dots, \cos (Z_{p}))$ . Consider two scenarios:

1) $X_{i} ~ N (0, 1)$ , and $Z_{i} ~ N (0, 1)$ .

2) $X_{1}, \dots, X_{p}$ are independent and identically distributed (i.i.d.) random variables from the Cauchy distribution with location parameter 0 and scale parameter 1. $Z_{1}, \dots, Z_{p}$ are i.i.d. random variables from the standard normal distribution.

In this example, $X$ and $Z$ are independent of each other, and $Y$ depends only on $Z$ . Thus, after controlling for the third random vector $Z$ , the conditional mean of $Y$ is independent of $X$ . From Table 1, both methods can reasonably control the type-I error rate.

Table 1. Empirical size of the two tests for Example 1 with $n = 50$ and $p = 5$ .

Scenario	Method	$α = 0.01$	$α = 0.05$	$α = 0.10$
(1)	pKCMD	0.014	0.050	0.098
(1)	pMDD	0.011	0.048	0.105
(2)	pKCMD	0.009	0.051	0.091
(2)	pMDD	0.010	0.059	0.111

Example 2 Generate the i.i.d. sample of $(X, Y, Z)$ from the following model: $X = (X_{1}, \dots, X_{p}) ~ N (0, I_{p})$ , $Z = (Z_{1}, \dots, Z_{p}) ~ N (0, I_{p})$ , $Y = f (0.6 X + Z)$ , where $N (0, I_{p})$ is a multivariate normal distribution with zero mean and identity covariance matrix $I_{p}$ , and $f (x) = (f (x_{1}), \dots, f (x_{p}))$ for any $x = (x_{1}, \dots, x_{p})$ .

We consider the following four relationships for $f (x)$ : a) $f (x) = x$ , b) $f (x) = x^{2}$ , c) $f (x) = sin (x)$ , and d) $f (x) = cos (x)$ .

Table 2. Empirical powers of the two tests for Example 2 with $p = 2$ .

Relationship	$α$	Method	$n = 10$	$n = 15$	$n = 20$	$n = 25$	$n = 30$
$f (x) = x$	0.01	pKCMD	0.271	0.492	0.664	0.780	0.843
	0.01	pMDD	0.424	0.667	0.810	0.900	0.939
	0.05	pKCMD	0.470	0.669	0.801	0.878	0.908
	0.05	pMDD	0.606	0.784	0.893	0.943	0.972
	0.10	pKCMD	0.573	0.749	0.856	0.911	0.933
	0.10	pMDD	0.698	0.836	0.924	0.955	0.979
$f (x) = x^{2}$	0.01	pKCMD	0.050	0.102	0.158	0.288	0.421
$f (x) = x^{2}$	0.01	pMDD	0.028	0.051	0.055	0.085	0.111
	0.05	pKCMD	0.165	0.287	0.398	0.538	0.688
	0.05	pMDD	0.101	0.141	0.166	0.223	0.273
	0.10	pKCMD	0.270	0.420	0.551	0.696	0.810
	0.10	pMDD	0.180	0.238	0.284	0.346	0.420
$f (x) = sin (x)$	0.01	pKCMD	0.235	0.480	0.694	0.835	0.909
	0.01	pMDD	0.241	0.474	0.676	0.793	0.897
	0.05	pKCMD	0.456	0.692	0.843	0.923	0.953
	0.05	pMDD	0.465	0.673	0.828	0.904	0.957
	0.10	pKCMD	0.583	0.778	0.905	0.949	0.966
	0.10	pMDD	0.602	0.772	0.886	0.940	0.972
$f (x) = cos (x)$	0.01	pKCMD	0.044	0.087	0.158	0.251	0.401
	0.01	pMDD	0.027	0.052	0.051	0.084	0.097
	0.05	pKCMD	0.157	0.276	0.383	0.543	0.679
	0.05	pMDD	0.089	0.143	0.168	0.217	0.255
	0.10	pKCMD	0.254	0.416	0.542	0.707	0.816
	0.10	pMDD	0.166	0.230	0.279	0.356	0.428

This example compares the empirical powers of pKCMD and pMDD across different functional relationships, significance levels, and sample sizes. According to Table 2, for the linear function $f (x) = x$ , pMDD consistently outperforms pKCMD, showing higher sensitivity. In the quadratic $f (x) = x^{2}$ and cosine $f (x) = cos (x)$ relationships, pKCMD generally demonstrates superior power, especially at larger samples, indicating better non-linear effect detection. For $f (x) = sin (x)$ relationship, both tests perform comparably well, with slight advantages for pMDD at lower significance levels. Overall, pMDD excels with linear relationships, while pKCMD is preferable for non-linear ones, particularly with larger sample sizes.

Example 3 Consider the model in Example 2, set $f (x) = \cos (x)$ , $n = 400$ .

Table 3 compares the empirical powers of pKCMD and pMDD at varying significance levels $α$ and varying dimension $p$ , with $n = 400$ . pKCMD consistently outperforms pMDD across all settings, with its power decreasing more gradually as $p$ increases. At lower significance levels ( $α = 0.01$ ), pKCMD maintains high power even with $p = 20$ , while pMDD’s power drops sharply. At higher $α$ , both tests improve, but pKCMD remains superior, especially for larger $p$ . Overall, according to the power shown in Table 3, pKCMD is more robust and powerful, particularly in high-dimensional contexts.

Table 3. Empirical powers of the two tests for Example 3 with $n = 400$ .

$α$	Method	$p = 10$	$p = 20$	$p = 30$	$p = 40$	$p = 50$
0.01	pKCMD	1.000	0.960	0.670	0.399	0.281
0.01	pMDD	0.933	0.385	0.183	0.101	0.091
0.05	pKCMD	1.000	0.995	0.863	0.638	0.506
0.05	pMDD	0.986	0.647	0.413	0.262	0.222
0.10	pKCMD	1.000	0.999	0.937	0.761	0.662
0.10	pMDD	0.998	0.771	0.552	0.394	0.336

Example 4 Consider two models with the formula $Y (t) = f (X (t)) + g (Z (t))$ , where $Z (t)$ is generated by Wiener process. Two processes for $X$ , including Ornstein-Uhlenbeck process (OU) and Gaussian process with exponential variogram (VP), are employed, and they are generated by rproc2f data function in R package fda.usc with default parameters. The models for generating $Y (t)$ are as follows:

1) $Y (t) = 1.5 X^{2} (t) + Z (t)$ , $t \in [0, 1]$ .

2) $Y (t) = 2 cos (X (t)) + 0.5 Z^{3} (t)$ , $t \in [0, 1]$ .

Table 4 and Table 5 reveal the performance of pKCMD and pMDD in detecting partial conditional mean dependencies under two distinct models of $Y (t)$ . For the quadratic model in (1), pKCMD consistently outperforms pMDD across varying sample sizes and significance levels, with its empirical power improving significantly as the sample size increases. For instance, at $α = 0.05$ , pKCMD achieves a power of 0.943 and 0.952 for $n = 50$ with the OU and VP processes, respectively, while pMDD attains only 0.741 and 0.591. This indicates that pKCMD is more effective in capturing the quadratic relationship, and the choice between the OU and VP processes does not substantially affect the relative performance of the tests. For the cosine model in (2), similar trends are observed, with pKCMD maintaining higher power than pMDD across all conditions. The empirical power of both tests increases with the sample size. Although pKCMD shows slightly higher power for the OU process compared to the VP process at larger sample sizes, the difference is not substantial, suggesting that the tests’ power is generally robust to the underlying stochastic process of $X (t)$ .

Overall, our analysis shows that the tests based on Kernel-based Partial Conditional Mean Dependence perform effectively in most of the situations we examined. As the sample size grows, the power of these tests increases. When compared to pMDD, our pKCMD test proves to be more efficient at capturing nonlinear relationships. Importantly, even when the dimension of variables increases, the decline in our test’s power is significantly slower compared to other tests.

Table 4. Empirical powers of the two tests for Example 4 (1).

$α$	Method	$n = 30$		$n = 50$
		OU	VP	OU	VP
0.01	pKCMD	0.606	0.561	0.884	0.863
0.01	pMDD	0.216	0.237	0.449	0.389
0.05	pKCMD	0.777	0.760	0.943	0.952
0.05	pMDD	0.409	0.402	0.741	0.591
0.10	pKCMD	0.835	0.848	0.957	0.973
0.10	pMDD	0.543	0.498	0.845	0.689

Table 5. Empirical powers of the two tests for Example 4 (2).

$n$	Method	$α = 0.01$		$α = 0.05$		$α = 0.10$
		OU	VP	OU	VP	OU	VP
30	pKCMD	0.388	0.274	0.526	0.416	0.593	0.519
30	pMDD	0.139	0.116	0.320	0.230	0.438	0.328
50	pKCMD	0.472	0.378	0.583	0.542	0.643	0.628
50	pMDD	0.206	0.147	0.428	0.309	0.529	0.429
70	pKCMD	0.575	0.459	0.671	0.588	0.730	0.671
70	pMDD	0.350	0.219	0.541	0.415	0.638	0.548
90	pKCMD	0.639	0.516	0.743	0.658	0.785	0.742
90	pMDD	0.435	0.338	0.644	0.550	0.711	0.633

5. Real Data

In this section, we consider exploring the Tecator dataset contained in R package fda.usc. This dataset includes values of a 100-channel spectrum (of wavelength 850 - 1050 mm) of absorbance ( $Z$ ), water content ( $X_{1}$ ), fat content ( $Y$ ), and protein content ( $X_{2}$ ) for 215 meat samples. This dataset has been widely studied in functional data analysis, and this literature mainly focuses on characterizing the influence of functional covariate $Z$ on scalar response $Y$ . Our goal is to determine whether the other two variables, $X_{1}$ and $X_{2}$ , have an impact on the conditional mean of the response after controlling for the influence of $Z$ . For $X_{1}$ , the p-values computed by pKCMD and pMDD all are 0.000, for $X_{2}$ , we obtain the same results. These values means that when constructing a conditional mean regression model, $X_{1}$ and $X_{2}$ should also be considered. [22] has already considered the influence of $X_{1}$ , $X_{2}$ , and $Z$ on $Y$ through a semi-functional partial linear regression.

6. Conclusion

This paper introduces pKCMD, a novel test for detecting partial conditional mean dependencies in Hilbert spaces, extending existing measures. We derive equivalent expressions for pKCMD at both population and sample levels. Numerical experiments show that pKCMD consistently outperforms pMDD across sample sizes and significance levels, particularly in nonlinear relationships. From the simulation results, compared with pMDD, pKCMD is more robust and performs better in high-dimensional situations without more computational loss. Overall, pKCMD is a competitive, reliable method for analyzing partial conditional mean dependencies, especially in nonlinear settings. How to choose the optimal kernel function in dependency testing is an important issue. This is our future research direction. In addition, we are also considering extending pKCMD to other tasks, such as conditional independence test, goodness-of-fit test, and feature screening, to broaden its applicability.

Appendix

Appendix A1. Hilbert-Schmidt Independent Criterion and Equation (9)

Definition A1. (HSIC) The Hilbert-Schmidt Independent Criterion of random elements $Z \in Z$ and $W \in W$ is defined as

$\begin{matrix} HSIC (Z, W) = E [K_{Z} (Z, Z^{'}) K_{W} (W, W^{'})] + E [K_{Z} (Z, Z^{'})] E [K_{W} (W, W^{'})] \\ - 2 E {E [K_{Z} (Z, Z^{'}) | Z] E [K_{W} (W, W^{'}) | W]}, \end{matrix}$

where $(Z^{'}, W^{'})$ is an independent copy of $(Z, W)$ , $K_{Z} (\cdot, \cdot)$ and $K_{W} (\cdot, \cdot)$ are two kernel functions.

Note that in Definition A1, the kernel functions $K_{Z} (\cdot, \cdot)$ and $K_{W} (\cdot, \cdot)$ are mutable and depend on $Z$ and $W$ , respectively. HSIC was developed to test the independence of $Z$ and $W$ , and has many good properties. a) It is non negative; b) If $K_{Z} (\cdot, \cdot)$ and $K_{W} (\cdot, \cdot)$ are characteristic, then $HSIC (Z, W) = 0$ if and only if $Z$ and $W$ are independent. Using the symbol $ϕ_{Z}$ defined in (2), symbol $ϕ_{W}$ is similar to $ϕ_{Z}$ , but the kernel function used in $K_{W} (\cdot, \cdot)$ is $ϕ_{W}$ , we have

$\begin{array}{l} ϕ_{Z} \circ ϕ_{W} \\ = E ϕ_{Z} (Z, Z^{'}) ϕ_{W} (W, W^{'}) \\ = E ({K_{Z} (Z, Z^{'}) - E [K_{Z} (Z, Z^{'}) | Z] - E [K_{Z} (Z, Z^{'}) | Z^{'}] + E K_{Z} (Z, Z^{'})} \\ \times {K_{W} (W, W^{'}) - E [K_{W} (W, W^{'}) | W] - E [K_{W} (W, W^{'}) | W^{'}] + E K_{W} (W, W^{'})}) \\ = E [K_{Z} (Z, Z^{'}) K_{W} (W, W^{'})] + E [K_{Z} (Z, Z^{'})] E [K_{W} (W, W^{'})] \\ - 2 E {E [K_{Z} (Z, Z^{'}) | Z] E [K_{W} (W, W^{'}) | W]} \\ = HSIC (Z, W), \end{array}$

thus (9) holds.

Appendix A2. The Proofs of Main Results

Proof of Lemma 1. The Lemma 1 is the Proposition 1 in [13]. □

Proof of Lemma 2. Because

$\frac{‖ ψ_{Y} ‖ cos (ϕ_{Z}, ψ_{Y})}{‖ ϕ_{Z} ‖} ϕ_{Z} (z, z^{'}) \in ℱ,$

and

$\begin{array}{l} (ψ_{Y} - \frac{‖ ψ_{Y} ‖ cos (ϕ_{Z}, ψ_{Y})}{‖ ϕ_{Z} ‖} ϕ_{Z}) ((y, z), (y^{'}, z^{'})) \\ = ψ_{Y} (y, y^{'}) - \frac{‖ ψ_{Y} ‖ cos (ϕ_{Z}, ψ_{Y})}{‖ ϕ_{Z} ‖} ϕ_{Z} (z, z^{'}) \in ℱ, \end{array}$

their inner product is

$\begin{array}{l} \frac{‖ ψ_{Y} ‖ cos (ϕ_{Z}, ψ_{Y})}{‖ ϕ_{Z} ‖} ϕ_{Z} \circ (ψ_{Y} - \frac{‖ ψ_{Y} ‖ cos (ϕ_{Z}, ψ_{Y})}{‖ ϕ_{Z} ‖} ϕ_{Z}) \\ = \frac{‖ ψ_{Y} ‖ cos (ϕ_{Z}, ψ_{Y})}{‖ ϕ_{Z} ‖} (ϕ_{Z} \circ ψ_{Y}) - \frac{{‖ ψ_{Y} ‖}^{2} {[cos (ϕ_{Z}, ψ_{Y})]}^{2}}{{‖ ϕ_{Z} ‖}^{2}} (ϕ_{Z} \circ ϕ_{Z}) \\ = {‖ ψ_{Y} ‖}^{2} cos (ϕ_{Z}, ψ_{Y}) \times \frac{(ϕ_{Z} \circ ψ_{Y})}{‖ ϕ_{Z} ‖ \cdot ‖ ψ_{Y} ‖} - {‖ ψ_{Y} ‖}^{2} {[cos (ϕ_{Z}, ψ_{Y})]}^{2} \\ = 0, \end{array}$

thus these two parts in (8) are orthogonal. □

Proof of Theorem 1. a) When $K_{Z} (\cdot, \cdot)$ and $K_{W} (\cdot, \cdot)$ are bounded kernels. We suppose that $K_{Z} (\cdot, \cdot) < M < \infty$ and $K_{W} (\cdot, \cdot) < M < \infty$ . First, we consider $(\tilde{A} \cdot \tilde{B})$ in ${pKCMD}_{n} (Y | X; Z)$ . Because $(\tilde{A} \cdot \tilde{B})$ is an unbiased estimate of $ϕ_{Z} \circ ψ_{Y}$ and can be written as an U-statistic by equation (6), and

$\begin{matrix} E | h (V_{1}, V_{2}, V_{3}, V_{4}) | \leq E | a_{u v} (b_{u v} - b_{u q} - b_{v r} + b_{q r}) | \\ \leq M E | b_{u v} - b_{u q} - b_{v r} + b_{q r} | \\ \leq 4 M E | b_{u v} | \\ \leq 4 M E ‖ Y ‖ E ‖ Y^{'} ‖ \\ < \infty . \end{matrix}$

By the strong law of large number for U-statistics,

$(\tilde{A} \cdot \tilde{B}) \to ϕ_{Z} \circ ψ_{Y}, a . s ..$

Similarly, we have

$(\tilde{C} \cdot \tilde{B}) \to ϕ_{W} \circ ψ_{Y}, a . s .,$

$(\tilde{A} \cdot \tilde{C}) \to ϕ_{Z} \circ ϕ_{W}, a . s .,$

$(\tilde{A} \cdot \tilde{A}) \to ϕ_{Z} \circ ψ_{Z}, a . s ..$

According to the continuous mapping theorem, we have $pKCMD (Y | X; Z) \to pKCMD (Y | X; Z)$ a.s..

b) For general kernel.

$\begin{matrix} E | h (V_{1}, V_{2}, V_{3}, V_{4}) | \leq E | a_{u v} (b_{u v} - b_{u q} - b_{v r} + b_{q r}) | \\ \leq E a_{u v} b_{u v} + E a_{u v} b_{u q} + E a_{u v} b_{v r} + E a_{u v} b_{q r} \\ \leq 2 E a_{u v}^{2} + 2 E b_{u v}^{2} \\ \leq 2 E K_{Z}^{2} (Z_{u}, Z_{v}) + 2 {(E {‖ Y ‖}^{2})}^{2} \\ (because E K_{Z}^{2} (Z, Z^{'}) < \infty and E {‖ Y ‖}^{2} < \infty) \\ < \infty . \end{matrix}$

Similar to (a), we can also prove that $pKCMD (Y | X; Z) \to pKCMD (Y | X; Z)$ a.s.. This completes the proof. □

Proof of Theorem 2. Denote $w = ϕ_{Z} \circ ψ_{Y}$ and $\hat{w} = \hat{KCMD} (Y | Z)$ . By employing the Markov inequality, we obtain that for any $ϵ > 0$ and $t > 0$ ,

$\begin{matrix} P (\hat{w} - w > ϵ) = P (exp (\hat{w} t - w t) > exp (ϵ t)) \\ \leq exp (- ϵ t) E \exp (\hat{w} t - w t) \\ = exp (- ϵ t - w t) E \exp (\hat{w} t) . \end{matrix}$

Note that $\hat{w}$ admits a U-statistic expression

$\hat{w} = \frac{1}{(\begin{array}{l} n \\ 4 \end{array})} \sum_{i < j < s < t} h (V_{i}, V_{j}, V_{s}, V_{t}),$

with the kernel $h (V_{i}, V_{j}, V_{s}, V_{t}) = \frac{1}{4!} \sum_{(u, v, q, r)}^{(i, j, s, t)} (a_{u v} b_{u v} - a_{u v} b_{u q} - a_{u v} b_{v r} + a_{u v} b_{q r})$ , where

$V_{i} = (Z_{i}, Y_{i})$ and the sum is over all 4! permutations of $(i, j, s, t)$ . Following [23] (Section 5.1.6), we write

${\hat{w}}_{k} = {(n!)}^{- 1} \sum_{n!} \sum_{i = 1}^{m} h (V_{(4 i - 3)}, V_{(4 i - 2)}, V_{(4 i - 1)}, V_{(4 i)}) / m,$

where where $\sum_{n!}$ denotes the summation over all possible permutations of $(1, \dots, n)$ , $V_{(i)}$ is the $i$ -th element under the permutation, $m = ⌊ n / 4 ⌋$ is the integer part of $n / 4$ . Denote $h_{i} = h (V_{(4 i - 3)}, V_{(4 i - 2)}, V_{(4 i - 1)}, V_{(4 i)})$ , write $\hat{w} = {\hat{w}}_{1} + {\hat{w}}_{2}$ , where

${\hat{w}}_{1} = {(n!)}^{- 1} \sum_{n!} \sum_{i = 1}^{m} h_{i} I (| h_{i} | \leq M_{0}) / m,$

${\hat{w}}_{2} = {(n!)}^{- 1} \sum_{n!} \sum_{i = 1}^{m} h_{i} I (| h_{i} | > M_{0}) / m,$

and $M_{0} > 0$ . Correspondingly, its population counterpart can also be decomposed as $w = E h_{1} I (| h_{1} | \leq M_{0}) + E h_{1} I (| h_{1} | > M_{0}) = w_{1} + w_{2}$ .

Jensen inequality yields

$\begin{matrix} E \exp ({\hat{w}}_{1} t) = E \exp (t {(n!)}^{- 1} \sum_{n!} \sum_{i = 1}^{m} h_{i} I (| h_{i} | \leq M_{0}) / m) \\ \leq E \exp (\frac{t}{m} \sum_{i = 1}^{m} h_{i} I (| h_{i} | \leq M_{0})) \\ = E^{m} \exp (\frac{t}{m} h_{i} I (| h_{i} | \leq M_{0})), \end{matrix}$

$P ({\hat{w}}_{1} - w_{1} > ϵ) \leq exp (- ϵ t) E^{m} \exp (\frac{t}{m} h_{i} I (| h_{i} | \leq M_{0}) - \frac{t}{m} w_{1}) .$

$E h_{i} I (| h_{i} | \leq M_{0}) = w_{1}$ , and $| h_{i} I (| h_{i} | \leq M_{0}) | \leq M_{0}$ , apply the Lemma 5.6.1.A of [23],

$E \exp (\frac{t}{m} h_{i} I (| h_{i} | \leq M_{0}) - \frac{t}{m} w_{1}) \leq exp (\frac{t^{2} M_{0}^{2}}{2 m^{2}}) .$

So,

$P ({\hat{w}}_{1} - w_{1} > ϵ) \leq exp (\frac{t^{2} M_{0}^{2}}{2 m} - ϵ t) .$

Furthermore, $P (| {\hat{w}}_{1} - w_{1} | > ϵ) \leq 2 \exp (\frac{t^{2} M_{0}^{2}}{2 m} - ϵ t)$ by the symmetry of the U-statistic.

Now we turn to ${\hat{w}}_{2}$ . $w_{k 2}^{2} = E^{2} h_{1} I (| h_{1} | > M_{0}) \leq E h_{1}^{2} P (| h_{1} | > M_{0}) \leq \frac{E h_{1}^{2} E {| h_{1} |}^{q_{1}}}{M_{0}^{q_{1}}}$

for any $q_{1} \in N$ . By the assumption that $K_{Z} (\cdot, \cdot)$ is bounded, there exists a positive constant $M$ such that $| a_{u v} | < M$ , and

$| h_{1} | = | h (V_{i}, V_{j}, V_{s}, V_{t}) | \leq a_{u v} | b_{u v} - b_{u q} - b_{v r} + b_{q r} | \leq M ({‖ Y_{u} ‖}^{2} + {‖ Y_{v} ‖}^{2} + {‖ Y_{q} ‖}^{2} + {‖ Y_{r} ‖}^{2})$ ,

this yields $E {| h_{1} |}^{q_{1}} \leq 4^{q_{1} - 1} A^{q_{1}} E {‖ Y ‖}^{2 q_{1}} < \infty$ by condition (C1).

Thus, if we choose $M_{0} = n^{β}$ for $0 < β < 1 / 2$ , then $| w_{2} | \leq ϵ / 2$ for sufficiently large $n$ . Hence, $P (| {\hat{w}}_{2} - w_{2} | \geq ϵ) \leq P (| {\hat{w}}_{2} | \geq ϵ / 2)$ .

$\begin{matrix} P (| {\hat{w}}_{2} | \geq ϵ / 2) \leq P (\cup_{i = 1}^{n} {{‖ Y_{i} ‖}^{2} > \frac{M_{0}}{4 M}}) \\ \leq \sum_{i = 1}^{n} P ({‖ Y_{i} ‖}^{2} > \frac{M_{1}}{4 M}) \\ = n P ({‖ Y_{i} ‖}^{2} > \frac{M_{0}}{4 M}) \\ \leq n C \exp {- r_{2} n^{β}}, \end{matrix}$

for $r_{2} \in (0, 2 s_{0}]$ .

$\begin{matrix} P (| \hat{w} - w | > 2 ϵ) \leq P (| {\hat{w}}_{1} - w_{1} | > ϵ) + P (| {\hat{w}}_{2} - w_{2} | > ϵ) \\ \leq exp (\frac{t^{2} n^{2 β}}{2 m} - ϵ t) + n C \exp {- r_{2} n^{β}}, \end{matrix}$

choosing $t = ϵ m / n^{2 β}$ , $P (| \hat{w} - w | \geq ϵ) \leq 2 \exp (- \frac{ϵ^{2} m}{2 n^{2 β}}) + n C \exp {- r_{2} n^{β}}$ , because $m = ⌊ n / 4 ⌋$ , we have

$P (| {\hat{w}}_{k} - w_{k} | \geq ϵ) \leq 2 \exp (- r_{1} n^{1 - 2 β} ϵ^{2}) + n C \exp {- r_{2} n^{β}},$

where the constants $r_{1}$ satisfy $n r_{1} = \frac{m η^{2}}{2}$ . Immediately, we have

$P (| {\hat{w}}_{k} - w_{k} | \geq ϵ) = O (exp (- r_{1} n^{1 - 2 β} ϵ^{2}) + n \exp {- r_{2} n^{β}}) .$

Thus, we obtain

$P (| (\tilde{C} \cdot \tilde{B}) - KCMD (Y | W) | \geq ϵ) \leq O (exp (- r_{1} n^{1 - 2 β} ϵ^{2}) + n \exp {- r_{2} n^{β}}),$

$P (| (\tilde{A} \cdot \tilde{B}) - KCMD (Y | Z) | \geq ϵ) \leq O (exp (- r_{1} n^{1 - 2 β} ϵ^{2}) + n \exp {- r_{2} n^{β}}) .$

Because these kernels are bounded, according Theorem 3 in [24], we have

$P (| (\tilde{A} \cdot \tilde{C}) - HSIC (Z, W) | \geq ϵ) \leq O (exp (- r_{3} n ϵ^{2})),$

$P (| (\tilde{A} \cdot \tilde{A}) - HSIC (Z, Z) | \geq ϵ) \leq O (exp (- r_{3} n ϵ^{2})) .$

$\begin{array}{l} P (| {pKCMD}_{n} (Y | X; Z) - pKCMD (Y | X; Z) |) \geq ϵ \\ \leq P (| (\tilde{C} \cdot \tilde{B}) - \frac{(\tilde{A} \cdot \tilde{B}) (\tilde{A} \cdot \tilde{C})}{(\tilde{A} \cdot \tilde{A})} - KCMD (Y | W) + \frac{KCMD (Y | Z) HSIC (Z, W)}{HSIC (Z, Z)} | \geq ϵ) \\ \leq P (| (\tilde{C} \cdot \tilde{B}) - KCMD (Y | W) | \geq ϵ / 2) + P (| (\tilde{A} \cdot \tilde{B}) - KCMD (Y | Z) | \geq C_{1} ϵ) \\ + P (| (\tilde{A} \cdot \tilde{C}) - HSIC (Z, W) | \geq C_{2} ϵ) + P (| (\tilde{C} \cdot \tilde{C}) - HSIC (Z, Z) | \geq C_{3} ϵ) \\ \leq O (exp (- r_{1} n^{1 - 2 β} ϵ^{2}) + n \exp {- r_{2} n^{β}}) + O (exp (- r_{3} n ϵ^{2})) \\ \leq O (exp (- r_{1} n^{1 - 2 β} ϵ^{2}) + n \exp {- r_{2} n^{β}}) . \end{array}$

This completes the proof of theorem. □

Conflicts of Interest

The authors declare no conflicts of interest regarding the publication of this paper.

References

[1]	Cook, R.D. and Li, B. (2002) Dimension Reduction for Conditional Mean in Regression. The Annals of Statistics, 30, 455-474. https://doi.org/10.1214/aos/1021379861
[2]	Williamson, B.D., Gilbert, P.B., Carone, M. and Simon, N. (2020) Nonparametric Variable Importance Assessment Using Machine Learning Techniques. Biometrics, 77, 9-22. https://doi.org/10.1111/biom.13392
[3]	Dai, B., Shen, X. and Pan, W. (2024) Significance Tests of Feature Relevance for a Black-Box Learner. IEEE Transactions on Neural Networks and Learning Systems, 35, 1898-1911. https://doi.org/10.1109/tnnls.2022.3185742
[4]	Williamson, B.D., Gilbert, P.B., Simon, N.R. and Carone, M. (2022) A General Framework for Inference on Algorithm-Agnostic Variable Importance. Journal of the American Statistical Association, 118, 1645-1658. https://doi.org/10.1080/01621459.2021.2003200
[5]	Cai, L., Guo, X. and Zhong, W. (2024) Test and Measure for Partial Mean Dependence Based on Machine Learning Methods. Journal of the American Statistical Association, 120, 833-845. https://doi.org/10.1080/01621459.2024.2366030
[6]	Welsh, A.H. and Yee, T.W. (2006) Local Regression for Vector Responses. Journal of Statistical Planning and Inference, 136, 3007-3031. https://doi.org/10.1016/j.jspi.2004.01.024
[7]	Scheipl, F., Staicu, A. and Greven, S. (2015) Functional Additive Mixed Models. Journal of Computational and Graphical Statistics, 24, 477-501. https://doi.org/10.1080/10618600.2014.901914
[8]	Sun, X., Du, P., Wang, X. and Ma, P. (2018) Optimal Penalized Function-on-Function Regression under a Reproducing Kernel Hilbert Space Framework. Journal of the American Statistical Association, 113, 1601-1611. https://doi.org/10.1080/01621459.2017.1356320
[9]	Sun, Y. and Wang, Q. (2020) Function-on-Function Quadratic Regression Models. Computational Statistics & Data Analysis, 142, Article ID: 106814. https://doi.org/10.1016/j.csda.2019.106814
[10]	Park, T., Shao, X. and Yao, S. (2015) Partial Martingale Difference Correlation. Electronic Journal of Statistics, 9, 1492-1517. https://doi.org/10.1214/15-ejs1047
[11]	Shao, X. and Zhang, J. (2014) Martingale Difference Correlation and Its Use in High-Dimensional Variable Screening. Journal of the American Statistical Association, 109, 1302-1318. https://doi.org/10.1080/01621459.2014.887012
[12]	Zhang, X., Yao, S. and Shao, X. (2018) Conditional Mean and Quantile Dependence Testing in High Dimension. The Annals of Statistics, 46, 219-246. https://doi.org/10.1214/17-aos1548
[13]	Lai, T., Zhang, Z. and Wang, Y. (2021) A Kernel-Based Measure for Conditional Mean Dependence. Computational Statistics & Data Analysis, 160, Article ID: 107246. https://doi.org/10.1016/j.csda.2021.107246
[14]	Fukumizu, K., Gretton, A., Schölkopf, B. and Sriperumbudur, B.K. (2009) Characteristic Kernels on Groups and Semigroups. In: Advances in Neural Information Processing Systems, Vol. 21, Curran Associates, 473-480.
[15]	Gretton, A., Bousquet, O., Smola, A. and Schölkopf, B. (2005) Measuring Statistical Dependence with Hilbert-Schmidt Norms. In: Jain, S., Simon, H.U. and Tomita, E., Eds., Lecture Notes in Computer Science, Springer, 63-77. https://doi.org/10.1007/11564089_7
[16]	Székely, G.J. and Rizzo, M.L. (2014) Partial Distance Correlation with Methods for Dissimilarities. The Annals of Statistics, 42, 2382-2412. https://doi.org/10.1214/14-aos1255
[17]	Albert, M., Laurent, B., Marrel, A. and Meynaoui, A. (2022) Adaptive Test of Independence Based on HSIC Measures. The Annals of Statistics, 50, 858-879. https://doi.org/10.1214/21-aos2129
[18]	Balasubramanian, K., Sriperumbudur, B. and Lebanon, G. (2013) Ultrahigh Dimensional Feature Screening via RKHS Embeddings. Proceedings of the 16th International Conference on Artificial Intelligence and Statistics, Vol. 31, 126-134.
[19]	Manfoumbi Djonguet, T.K., Mbina Mbina, A. and Nkiet, G.M. (2024) Testing Independence of Functional Variables by an Hilbert-Schmidt Independence Criterion Estimator. Statistics & Probability Letters, 207, Article ID: 110016. https://doi.org/10.1016/j.spl.2023.110016
[20]	Li, R., Zhong, W. and Zhu, L. (2012) Feature Screening via Distance Correlation Learning. Journal of the American Statistical Association, 107, 1129-1139. https://doi.org/10.1080/01621459.2012.695654
[21]	Wu, Y. and Yin, G. (2015) Conditional Quantile Screening in Ultrahigh-Dimensional Heterogeneous Data. Biometrika, 102, 65-76. https://doi.org/10.1093/biomet/asu068
[22]	Aneiros-Pérez, G. and Vieu, P. (2006) Semi-Functional Partial Linear Regression. Statistics & Probability Letters, 76, 1102-1110. https://doi.org/10.1016/j.spl.2005.12.007
[23]	Serfling, R.J. (1980) Approximation Theorems of Mathematical Statistics. Wiley. https://doi.org/10.1002/9780470316481
[24]	Song, L., Smola, A., Gretton, A., Borgwardt, K. and Bedo, J. (2012) Feature Selection via Dependence Maximization. Journal of Machine Learning Research, 13, 1393-1434.

Journals Menu

Follow SCIRP

	customer@scirp.org
	+86 18163351462(WhatsApp)
	1655362766

	Paper Publishing WeChat

Journals Menu

Home

About SCIRP

Service

Policies