Function-on-Partially Linear Functional Additive Models

Abstract

We consider a functional partially linear additive model that predicts a functional response by a scalar predictor and functional predictors. The B-spline and eigenbasis least squares estimator for both the parametric and the nonparametric components proposed. In the final of this paper, as a result, we got the variance decomposition of the model and establish the asymptotic convergence rate for estimator.

Share and Cite:

Huang, J. and Chen, S. (2020) Function-on-Partially Linear Functional Additive Models. Journal of Applied Mathematics and Physics, 8, 1-9. doi: 10.4236/jamp.2020.81001.

1. Introduction

Function data are infinite dimensional vectors in functional space, the study of functional data helps people to further understand data changes in finance, medicine, etc. The study of fucntional response can date back to the work Ramsay and Silverman (2005) [1] where the model the functional concurrent model considers current response relate to the current values of the covariates. Wong et al. (2018) [2] investigated a class of partially linear functional additive models (PLFAM) that predicts a scalar response by both parametric effects of a multivariate predictor and nonparametric effects of a multivariate functional predictor. Further an additive function-on-function regression is established using principal component basis functions and b-spline basis functions by Janet S. Kim et al. (2016) [3]. Luo, R., Qi, X. (2017) [4] consider functional linear regression models with a functional response and multiple functional predictors, with the goal of finding the best finite-dimensional approximation to the signal part of the response function. This paper extends function on function regression, considering functional partially linear additive models that predict a functional response by a scalar predictor and functional predictors. Specifically we consider the model

Y ( t ) = μ ( t ) γ ( t ) + j = 1 p τ x β j ( s , t ) x j ( s ) d s + ε ( t ) (1)

where Y ( t ) is the response function defined in an interval τ y , For convenience, we assume that the response has zero-mean, the predictive curves. x 1 ( s ) , x 2 ( s ) , , x p ( s ) are defined in τ x . where τ y and τ x are compact intervals. μ ( t ) is scalar predictor defined in an interval τ y . γ ( t ) is scalar coefficient, ε ( t ) is the noise function with mean zero and unknown autocovariance function A ( t , t ) and is independent of the predictor and the response. In the previous papers, these focused on function-on-function, scalar-on-function, and scalar-mixed data. There is no paper devoted to function-on-mixed data. That is to say, the functional dependent variable analyzes the regression model of the mixed data. This paper considers the actual situation is closer to the actual demand. Considering that the real world data cannot be all the functional data, there will definitely be a part of the scalar in the model. From this perspective, expand the new model. This paper uses a different estimation method from the previous. The general method adopted by these articles is to use non-parametric methods such as spline function to approximate the model generation estimation parameters, and at the same time minimize the objective function. In the past, the model was generally complicated and inconvenient to calculate. They consider fitting with B-splines in three directions at the same time, and there will be over-fitting when doing so. In this paper, the problem is optimized by using the eigenbasis and orthogonal B-spline, that is, using the special feature base of Y ( t ) and an orthogonal B-spline to fit the model. While ensuring the fitting effect, the complexity of the model is reduced, and the model is not over-fitting. We use the classical formula of probability to express the error term of the model from the perspective of variance, and the method of resampling is used to approximate the error of the model. At the same time, assume from the article hypothesis, we establish the asymptotic convergence rate for estimator.

2. Model

In this section, we consider the estimation of coefficients in a partial linear regression model of a function. In this paper, the model is estimated by using the eigenbasis spline and the B-spline function. Note ϕ k is the corresponding eigenfunctions of y ( t ) , and

0 1 ϕ n ( t ) ϕ m ( t ) d t = { 1 , n = m 0 , n m (2)

Let

h ( t ) = μ ( t ) γ ( t ) (3)

we project h ( t ) onto the eigenbasis of y ( t ) ,

H k = τ y h ( t ) ϕ k ( t ) d t (4)

then we can express h ( t ) as

h ( t ) = k = 1 H k ϕ k ( t ) (5)

For the function coefficient term part of (1), we estimate it by means of B-spline and feature base. For convenience we write model (1) as

y ( t ) = μ ( t ) γ ( t ) + τ x X ( s ) T β ( s , t ) d s + ε ( t ) (6)

where X ( s ) = ( x 1 ( s ) , , x 2 ( s ) ) T is the p-dimensional vector of predictive functions, β ( s , t ) = β 1 ( s , t ) , , β p ( s , t ) is the p-dimensional vector of coefficient functions. Let

F ( s , t ) = X ( s ) T β ( s , t ) (7)

Transform F ( s , t ) in the same way as h (t)

G k ( s ) = τ y F ( s , t ) ϕ k ( t ) d t (8)

So we can put F ( s , t ) similar expression of Karhunen-Loeve expansion

F ( s , t ) = k = 0 G k ( s ) ϕ k ( t ) d t (9)

In this paper, on spectral decomposition method projecting the corresponding function to the orthogonal eigenbasis of y, we estimate the parameter of the model can be performed without punished the complexity of the t direction. Generally, we preset the maximum variance number similar to the method in the functional principal component analysis, usually 85%. Next we will use the Orthogonal b-spline method (2010) [5] [6] to represent the parameter items in the s direction.

G k ( s ) = l = 1 M B l ( s ) θ l . k (10)

F ( s , t ) = k = 0 l = 1 M B l ( s ) θ l . k ϕ t ( t ) (11)

where B l ( s ) are or-thogonalized B-spline bases of dimensions M, θ l . k is unknown parameters, we get

y ( t ) = k = 1 H k ϕ k ( t ) + τ x k = 0 l = 1 M B l ( s ) ϕ k ( t ) d s + ε ( t ) (12)

In the t direction perform truncation K for the preset maximum variance, we get

y ( t ) k = 1 K H k ϕ k ( t ) + τ x k = 0 K l = 1 M B l ( s ) θ l , k ϕ k ( t ) d s (13)

We denote Z = ( Z 1 , Z 2 , , Z p ) , Z l as τ x B l ( s ) d s , let Φ k = ( θ 1 k , θ 2 k , , θ p k ) .

Then we can get the simplified expression of y (t)

y ( t ) k = 1 K H k ϕ k ( t ) + k = 1 k Z Φ k ϕ k ( t ) (14)

3. Parameter Estimation

We assume that with probability 1, the trajectory of x n is contained in a Hilbert space χ n , with inner product. We will focus on the case that χ j ’s are L 2 functional spaces and the inner products are f , g = f ( t ) g ( t ) d t for any f , g χ n . We use quadratic penalties for the direction s and control the roughness in the direction t by the preset number of orthogonal basis functions. so the loss function is as follow

L ( H k , Φ k ) = ( y ( t ) k = 1 K h k ϕ k ( t ) k = 1 K Z T Φ k ϕ k ( t ) ) (15)

We use the least squares penalty to punish the curvature of the functional parameter, considering the penalty term

( 2 F ( s , t ) s 2 ) 2 d s d t = ( 2 k = 1 K G k ( s ) ϕ k ( t ) s 2 ) 2 d s d t = k = 1 K ( 2 G k ( s ) s 2 ) 2 d s (16)

k = 1 K ( 2 G k ( s ) s 2 ) 2 d s = λ k = 1 K ( 2 B l ( s ) s 2 ) 2 θ l , k 2 d s = λ Φ k T P Φ k (17)

where P = l = 1 M 2 B l ( s ) s 2 d s . Then we can get the penalized criterion

y ( t ) k = 1 K H k ϕ k ( t ) + k = 1 k Z T Φ k ϕ k ( t ) 2 + λ Φ k P Φ k (18)

Similarly F ( s , t ) and h ( t ) to process y ( t ) . We have Karhunen-Loeve expansion on y(t)

y ( t ) = k = 1 K ξ k ϕ k ( t ) + e t (19)

where e t is zero-mean error, and the variables ξ k are the FPCA (Functional principal component) scores of Y. And note that ϕ k ( t ) 2 d t = 1 , so the criterion can be simplify written as

k = 1 K ξ k H k + Z T Φ k 2 + λ Φ k P Φ k (20)

So the estimation process for the parameters H k , Φ k is as follows.

1) We get the function curve y ^ ( t ) for smooth sample points, and we demean Y ^ ( t ) for get a curve y ^ c ( t ) with a mean of 0. The point K is determined by the pre-specified percentage of variance.

2) We use FPCA(Functional Principal Component) to estimate eigenbasis ϕ ^ k , and we project y ^ ( t ) onto the corresponding eigenbasis ϕ ^ k to obtain an estimation of the principal component score ξ ^ k = τ y y ^ c ( t ) ϕ ^ ( t ) d t .

3) We get a two-step estimation for H k , Φ k

a) Obtain estimation of Φ k , k = 1 , 2 , 3 , , K , of the coefficients by minimizing the penalized criterion(19) with respect to Φ k and H k , The estimated value we get for Φ k is

Φ ^ k = Z ( ξ ^ k H k ) Z Z T + λ P (21)

b) Obtain estimation of H k by minimizing

arg min k = 1 K ξ ^ k H k Z T Φ ^ k (22)

So the estimation of Φ k is Φ ^ k and H ^ k is H ^ k . We can get

y ^ ( t ) = k = 1 K ϕ ^ k ( t ) ( H ^ k + Z T Φ ^ k ) (23)

4. Error Variance Decomposition

In this section, we get error variance decomposition. Let A ( t , t ) be the variance function of ε ( t ) .we estimate A ( t , t ) .

1) obtain residuals by fit (14), e j = y ( t j ) y ^ ( t j ) .

2) Use FPCA for the residual function, to approximate the infinite covariance function using the estimated finite covariance function.

Let x n e w , μ n e w be new observation independent of X and ε ( t ) , and fit function-on-partially linear functional additive models,then we measure the uncertainty in the prediction by predicting error y ^ n e w ( t ) y n e w ( t ) .Base on Ruppert et al. (2009) [7]

var { y ^ n e w ( t ) y n e w ( t ) } = var { y ^ n e w ( t ) } + var { ε n e w ( t ) } (24)

We estimate the expression of var { ε n e w ( t ) } by estimating var { ε ( t ) } Then we estimate var { y ^ n e w ( t ) } using the classical variance formula

var { ε ^ 0 ( t ) } = E π [ var { ε ^ 0 ( t ) | π } ] + var π [ E { ε ^ 0 ( t ) | π } ] (25)

In this paper, we assume that π is a known parameter set containing estimated arbitrary parameter values and corresponding parameter variance and covariance functions. So we will expand var { ε ^ 0 ( t ) | π } of the formula (24) as follows

var { y ^ n e w ( t ) | π } = var k = 1 K ϕ ^ k ( t ) ( H ^ n e w , k + Z n e w T Φ ^ k ) (26)

If we write Z Z T Z + λ P as m λ , Z T m λ is recorded as J λ , then Φ ^ k = Z ( ξ ^ k H k ) Z T Z + λ P = m λ ( ξ ^ k H k ) and H ^ k + Z n e w T Φ ^ k = H ^ k + Z n e w T m λ ( ξ ^ k H ^ k ) = ( 1 J λ ) H ^ k + J λ ξ ^ k . So we expand the expression (25) as follow

var { y ^ n e w ( t ) | π } = k = 1 K ϕ ^ k ( t ) var ( H ^ k + Z n e w T Φ ^ k ) ϕ ^ k ( t ) + k k ϕ ^ k ( t ) cov ( H ^ k + Z n e w T Φ ^ k , H ^ k + Z n e w T Φ ^ k ) ϕ ^ k ( t ) (27)

It can be seen that all the variances and covariance functions in the expression are represented by the variance and covariance functions of the existing estimates. Then we focus on the second item of expression (24), the second term is var π [ E { ε ^ 0 ( t ) | π } ] . This paper hopes to obtain an approximation form of the finite term of the variance function under finite samples by resampling the bootstrap method. We perform the bootstrap algorithm through the following steps;

1) For q = 1 , 2 , 3 , , Q .

2) Resample the samples to get Q groups different sample.

3) Smooth the resample points and get the corresponding curve X j q ( s ) , μ q ( t ) , y q ( t ) .

4) Obtain the de-mean ( y ^ c ) q ( t ) and the corresponding feature basis function ϕ ^ k q , and obtain the corresponding principal component score ξ ^ k q .

5) Fit the function-on-partially linear functional additive models to obtain the corresponding parameter estimation.

6) Get prediction function y ^ n e w q ( t ) to any new observation curve x n e w ( t ) , y n e w ( t ) and calculate A ^ q ( t , t ) .

7) Get var { y ^ n e w q ( t ) | π } by expression (24).

Thus the approximate value of the finite term of A ( t , t ) is found A ^ ( t , t ) = q = 1 Q A ^ q ( t , t ) Q . and the variance estimate of the error

var ^ { y ^ n e w ( t ) y n e w ( t ) } = q = 1 Q var ^ { y n e w q ( t ) | π q } Q + q = 1 Q { y n e w b ( t ) y ¯ n e w ( t ) } 2 Q + q = 1 Q A ^ q ( t , t ) Q (28)

5. Basic Assumptions and Convergence Properties

In this section,we study the asymptotic properties of F ( t ) , h ( t ) , where

h ^ ( t ) = k = 1 K H ^ k ϕ ^ ( t ) (29)

F ( t ) = τ x F ( s , t ) d s = τ x k = 1 K G ( s ) ϕ ( t ) d s = τ x k = 1 K l = 1 M B l ( s ) θ l , k ϕ ( t ) d s = k = 1 k Z Φ k ϕ k ( t ) (30)

5.1. Basic Assumption

1) For the square integrable random function X ˜ ( s ) satisfies E X ˜ ( s ) q < , 2 < q < ;

2) For each k, there is E [ U ˜ i k ] c λ k 2 , where c is a constant and c can take different values in different expressions;

3) For the eigenvalue λ k , satisfied c 1 k s λ k c k s , k 1 , s > 1 ;

4) For the coefficient H k , assume there is a constant t, and t > s 3 + 1 , which make H k satisfied H k < c k s ;

5) For the parameter K, assume K = 1 n s + 3 t

6) The number of knots N n = n r , with 1 2 q + 1 r 1 2 .

7) The distribution of F ( t ) and h ( t ) is absolutely continuous and its density is bounded on τ y ;

8) The joint density function of x ( s ) is bounded on τ x ;

9) Covariance function cov ( y ( t ) , y ( t ) ) of y ( t ) is bounded on interval τ y

Conditions (1)-(5) are common in functional PCA literature and conditions (6)-(9) are common in nonparametric regression literature.

5.2. Related Lemma

Lemma 1. If F ˜ ( t ) is spline expansion of F ( t ) , it is known by Chen [8] that there is n > 2 , h = 1 N , we can get

F ˜ ( t ) F ( t ) 2 = O ( h 2 ) < O ( h n ) (31)

5.3. Convergence Properties

Consider the model (6)

y ( t ) = μ ( t ) γ ( t ) + τ x X ( s ) T β ( s , t ) d s + ε ( t ) (32)

Use the principal component of y ( t ) for the model (6) to represent the following expression

k = 1 ξ k ϕ k ( t ) = k = 1 H k ϕ k ( t ) + k = 1 Z T Φ k ϕ k ( t ) + ε ( t ) = k = 1 ( H k , Φ k ) ( 1, Z ) T ϕ k ( t ) + ε ( t ) = k = 1 D k Z 0 ϕ k ( t ) + ε ( t ) (33)

where D k = ( H k , Φ k ) and Z 0 = ( 1, Z ) . We can get an estimation by least squares estimation of D ^ k = ( H ^ k , Φ ^ k ) = Z 0 ξ ^ k Z 0 Z 0 T , let D ˜ k = ( H k , Φ ˜ k ) , where F ˜ ( t ) = k = 1 K Φ ˜ k Z T ϕ ^ ( t ) is spline expansion of F ( t ) .

Theorem 2. Under the conditions of (1)-(5), we can get

h ^ ( t ) h ( t ) 2 = O ( n ( 3 t 1 s + 3 t ) ) , n (34)

Proof. Use Cauchy-Buniakowsky-Schwarz Inequality we can get

h ^ ( t ) h ( t ) 2 = k = 1 K H ^ k ϕ ^ k ( t ) k = 1 H k ϕ k ( t ) 2 2 k = 1 K H ^ k ϕ ^ k ( t ) k = 1 K H k ϕ k ( t ) 2 + 2 k = K + 1 H k ϕ k ( t ) 2 4 k = 1 K H ^ k ( ϕ ^ k ( t ) ϕ k ( t ) ) 2 + 4 k = 1 K ( H ^ k H k ) ϕ ^ k ( t ) 2 + 2 k = K + 1 H k ϕ k ( t ) 2 8 K k = 1 K ( H ^ k ) 2 ϕ ^ k ( t ) ϕ k ( t ) 2 + 4 k = 1 K ( H ^ k H k ) 2 + 2 k = K + 1 H k 2 (35)

Assume from the article hypothesis, we have

h ^ ( t ) h ( t ) 2 = O ( n ( 3 t 1 s + 3 t ) ) , n (36)

Theorem 3. Available under the conditions of (6)-(9), and let F ˜ ( t ) is spline expansion of F ( t ) we can get

F ^ ( t ) F ( t ) 2 2 = O ( M K n + ρ 2 ) (37)

Proof.

F ^ ( t ) F ( t ) 2 2 F ^ ( t ) F ˜ ( t ) + 2 F ˜ ( t ) F ( t ) (38)

Consider the latter item in the text, because F ˜ ( t ) is a spline expansion of F ( t ) , by lemma we have

F ˜ ( t ) F ( t ) = O ( h 2 ) (39)

Now consider the former item

F ^ ( t ) F ˜ ( t ) 2 = i = 1 k Z T Φ ^ k ϕ ^ k ( t ) i = 1 k Z T Φ ˜ k ϕ ^ k ( t ) 2 i = 1 k Z T Φ ^ k Z T Φ ˜ k 2 i = 1 k Φ ^ k Φ ˜ k 2 (40)

Then consider

i = 1 k F ^ k Φ ˜ k 2 i = 1 k Z 0 ε Z 0 Z 0 T 2 i = 1 k Z 0 ε n 2 (41)

Since E [ B l ( s i ) B l ( s j ) ε i ε j ] = 0, i j ,

E Z 0 ε n = 1 n 2 E [ ε Z 0 T Z 0 ε ] = 1 n 2 l = 1 M i = 1 n E [ B l ( s i ) ε i ] 2 M n (42)

Get F ^ ( t ) F ( t ) 2 = O ( M K n + ρ 2 ) by the above formula.☐

6. Conclusion

This paper is a model extension of the function-to-function regression. The scalar variable is added to the dependent variable, which extends the application scope of the model. For real-life data, real data should include scalar data and function data, and the model used in this paper can be better explained. In this paper, the model is estimated by using the eigenbasis spline function and the orthogonal B-spline function. When the loss function is punished, the complexity of the t direction is controlled by controlling the preset principal component in the t direction, which reduces the complexity. It is more practical. At the same time, the variance decomposition of the error under the finite sample size is given. The approximation is performed by the resampling bootstrap method. Finally, the convergence properties of the estimated parameters are studied.

Conflicts of Interest

The authors declare no conflicts of interest regarding the publication of this paper.

References

[1] Billheimer, D. (2010) Functional Data Analysis, 2nd Edition Edited by J. O. Ramsay and B. W. Silverman. Biometrics, 63, 300-301.
https://doi.org/10.1111/j.1541-0420.2007.00743_1.x
[2] Wong, R.K.W., Li, Y. and Zhu, Z. (2018) Partially Linear Functional Additive Models for Multivariate Functional Data. Journal of the American Statistical Association.
[3] Kim, J.S., Staicu, A.M., Maity, A., et al. (2017) Additive Function-on-Function Regression. Journal of Computational and Graphical Statistics, 27.
https://doi.org/10.1080/10618600.2017.1356730
[4] Luo, R. and Qi, X. (2016) Function-on-Function Linear Regression by Signal Compression. Journal of the American Statistical Association.
[5] Chen, G.L. and Wang, Z.J. (2010) The Multivariate Partially Linear Model with B-Spline. Chinese Journal of Applied Probability and Statistics.
[6] Benko, M., Hardle, W. and Kneip, A. (2009) Common Functional Principal Components. Annals of Statistics, 37, 1-34.
https://doi.org/10.1214/07-AOS516
[7] Ruppert, D., Wand, M.P. and Carroll, R.J. (2009) Semiparametric Regression during 2003C2007. Electronic Journal of Statistics, 3, 1193-1256.
https://doi.org/10.1214/09-EJS525
[8] Chen, T.P. (1985) Convergence and Asymptotic Expansion of Spline Functions. Chinese Science Bulletin, 30, 1361-1364.

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.