Tensor Robust Principal Component Analysis via Hybrid Truncation Norm

Abstract

This paper mainly studies the problem of tensor robust principal component analysis (TRPCA), in order to accurately recover the low rank and sparse components from the observed signals. Most of the existing robust principal component analysis (RPCA) methods are based on nuclear norm minimizationss. These methods minimize all singular values at the same time, so they can not approach the rank well in practice. In this paper, the idea of truncated nuclear norm regularization is extended to RPCA. At the same time, in order to improve the stability of the model, the tensor truncated Frobenius norm is newly defined. Truncated nuclear norm and truncated Frobenius norm are considered at the same time called hybrid truncated model of tensor. This method minimizes min singular values. In addition, this paper also gives an effective method to determine the contraction operator, and develops an effective iterative algorithm based on alternating direction to solve this optimization problem. Experimental results show the effectiveness and accuracy of this method.

Share and Cite:

Luan, Y.J. and Jiang, W. (2022) Tensor Robust Principal Component Analysis via Hybrid Truncation Norm. Open Access Library Journal, 9, 1-22. doi: 10.4236/oalib.1109412.

1. Introduction

Tensor is a multidimensional extension of matrix and an important data format, which can express the internal structure of more complex high-order data. In fact, tensor is also the natural form of high-dimensional and multi-way real-world data. For example, in the field of image processing [1], a color image is a 3-order tensor of height × weight × channel and a multispectral image is a 3-order tensor of height × weight × channel. Therefore, tensor analysis has important practical significance and application value in the fields of machine learning [2], computer vision [3], data mining [4], and so on. The tensor, we interest, is frequently low-rank, or approximately so [5] and hence has a much lower-dimensional structure. This has stimulated the problem of low rank tensor estimation and recovery, which has attracted great attention in many different fields. The classical principal component analysis (PCA) [6] is the most widely used method for data analysis and dimensionality reduction. For the data slightly damaged by small noise, it has the characteristics of high computational efficiency and powerful function. However, a major problem with principal component analysis is that it is easily affected by seriously damaged or bizarre observations, which are ubiquitous in real-world data. So far, many principal component analysis models [7] [8] have been proposed, but almost all of them have high computational cost.

RPCA is the first polynomial time algorithm with strong performance guarantee. Suppose that we are given an observed X m × n , which can be decomposed as X = L 0 + E 0 , where L 0 is low-rank and E 0 is sparse. [6] shows that if the singular vector of L 0 satisfies some incoherent conditions, such as L 0 is low rank and E 0 is sparse enough, L 0 and E 0 can be recovered accurately with high probability by solving the convex problem (1):

min L , E L + λ E 1 (1)

where L denotes the nuclear norm (sum of the singular values of L ), and E 1 denotes the l 1 -norm (sum of the absolute values of all the entries of E ). The parameter λ is suggested to be set as 1 / max ( n 1 , n 2 ) which works well in practice. In terms of algorithm, (1) can be solved by efficient algorithm, and the cost will not be much higher than PCA. This method and its generalization have been successfully applied to the fields of background modeling [9], subspace clustering [10], video compression sensing [11] and so on.

A major disadvantage of RPCA is that it can only process two-dimensional (matrix) data. However, real data is usually multi-dimensional in nature-the information is stored in multi-way arrays known as tensors [5]. To use RPCA, we must first convert the high-dimensional data into a matrix. However, such preprocessing usually leads to information loss, resulting in performance degradation. In order to alleviate this problem, a common method is to use the multidimensional structure of tensor data to deal with it.

In this paper, tensor robust principal component analysis (TRPCA) is studied in order to accurately recover the low rank tensor damaged by sparse noise. Suppose that we are given an observed X , which can be decomposed as X = L 0 + E 0 , where L 0 is low-rank and E 0 is sparse, and both components are of arbitrary magnitudes. Note that we do not know the locations of the nonzero elements of E 0 , not even how many there are. Now we consider a similar problem to RPCA. This is the problem of tensor RPCA studied in this work.

It is not easy to generalize RPCA to tensor. The main problem of low rank tensor estimation is the definition of tensor rank [12]. At present, several definitions of tensor rank and its convex relaxation have been proposed, but each rank has its limitations. The CANDECOMP/PARAFAC (CP) rank of tensor [5] is defined as the minimum number of tensor rank 1 decomposition, which is NP-hard to calculate. Because of its unclear convex envelope, the recovery of low CP rank tensor is challenging [13]. A robust tensor CP decomposition problem is studied. Although the recovery is guaranteed, the algorithm is nonconvex. To avoid this problem, Tucker et al. presented the tractable Tucker rank [14], and its convex relaxation has also been widely used. For a k-way tensor X , The Tucker rank is a vector defined as r a n k t c ( X ) : = ( r a n k ( X { 1 } ) , r a n k ( X { 2 } ) , , r a n k ( X { k } ) ) . Motivated by the fact that the nuclear norm is the convex envelope of the matrix rank within the unit ball of the spectral norm, the Sum of Nuclear Norms (SNN) [15], defined as i X ( i ) , is used as a convex surrogate of i r a n k ( X { i } ) . [16] gives a TRPCA model based on SNN:

min L , E i = 1 k λ i L { i } + E 1 s .t . X = L + E . (2)

[17] fully studied the effectiveness of this method. However, SNN is still not the tightest convex relaxation of Tucker rank. In other words, the model (2) is basically suboptimal. Recently, with the introduction of tensor product (T-product) [18] and tensor singular value decomposition (T-SVD), Kilmer et al. [15] proposed the definitions of tensor multi rank and tubal rank. Then a new tensor nuclear norm appears and is applied to tensor completion [19] and tensor robust principal component analysis [20]. In [21], the author realized that tensor product is based on convolution like operation, which can be realized by using discrete Fourier transform (DFT). Soon after, a more general definition of tensor product was proposed, which was based on arbitrary reversible linear transformation. Lu et al. [22] accurately recover the low rank and sparse components by solving a weighted combined convex programming of tensor nuclear norm and tensor l 1 -norm that is:

min L , E L + E 1 s .t . X = L + E . (3)

However, due to the nuclear norm considers each singular value of the tensor, and each singular value has different influence on the results. Therefore, [23] proposed the partial sum of tensor tube nuclear norm (PSTNN), and established a minimization model based on PSTNN to solve the tensor RPCA problem. Although the truncated nuclear norm can be used as a convex alternative to the rank function, the model lacks strong stability. The high stability of the model means that the elements of the output data set do not change significantly. [24] proposed a hybrid model of truncated nuclear norm and truncated Frobenius norm for matrix completion. In this model, nuclear norm controls low rank attribute and Frobenius norm controls stability.

Inspired by [24], we propose a new tensor robust principal component analysis method, trying to establish a more stable and ideal model. The main contributions are summarized as follows.

1) We propose two new regularization terms, tensor truncated nuclear norm (T-TNN) and tensor truncated Frobenius norm (T-TFN). Respectively, T-TNN is defined as the sum of the last min ( m , n ) r singular values of all frontal slices of A ¯ . T-TFN is defined as the square root of the square sum of the last min ( m , n ) r singular values of all frontal slices of A ¯ . Based on these two new definitions, a tensor hybrid truncated norm (T-HTN) regularization model is proposed, which uses the combination of T-TNN and T-TFN to achieve tensor TRPCA. The model not only improves the stability, but also effectively improves the accuracy of recovery accuracy.

2) A simple two-step iterative algorithm is designed to solve the proposed T-HTN model. Then the convergence of the method is deduced to ensure the feasibility of the algorithm. In order to reduce the cost of computation, we allow the quadratic penalty parameters to change adaptively according to some update rules.

3) A large number of experiments are carried out on synthetic data and real images. The experimental results show the advantages of this method in effectiveness and stability.

The rest of this paper is organized as follows. Section II introduces some notations, and presents some new tensor norm induced by the transform-based T-product. In Section III, we propose the T-HTN regularization model and give the optimization method to solve it. The Section IV gives the experimental results of synthetic and real data. The Section V draws a conclusion.

2. Preliminaries

2.1. Notations

In this section, we give some notations used in this paper. We denote scalars by lowercase letters, e.g., a , vectors by boldface lowercase letters, e.g., a , matrices by boldface capital letters, e.g., A , and tensors by boldface Euler script letters, e.g., A . Respectively, the field of real number and complex number are denoted as and . For a 3-way tensor A n 1 × n 2 × n 3 , we denote its ( i , j , k ) -th entry as A i j k or a i j k and use the MATLAB notation A ( i , : , : ) , A ( : , i , : ) , A ( : , : , i ) to denote separately the i-th horizontal, lateral and frontal slice [25]. Ordinarily, the frontal slice A ( : , : , i ) is denoted as A ( i ) . The ( i , j ) -th tube of A is denoted by A ( i , j , : ) , which is a vector of length n 3 . For two matrices A and B in n 1 × n 2 , we represent their inner product as A , B = Tr ( A T B ) , where Tr ( ) stands for the trace, and A T is the conjugate transpose of A . The n × n identity matrix is denoted by I n . Moreover, for any two tensors A and B in n 1 × n 2 × n 3 , A , B = i = 1 n 3 A ( i ) , B ( i ) . Some norms of tensor are also used. The Frobenius norm of a tensor is defined as A F = i j k | A i j k | 2 and the l 1 norm is defined as A 1 = i j k | A i j k | . The above norms are also defined in this way in a vector or matrix.

Let L : n 1 × n 2 × n 3 n 1 × n 2 × n 3 be invertible linear transform on tensor space. L 1 is the inverse transform of L. A ¯ is obtained via linear transform L on each tube fiber of A , i.e., A ¯ = L ( A ) , and A = L 1 ( A ¯ ) .

We define a block diagonal matrix based on frontal slice as

A ¯ = bdiag ( A ¯ ) = [ A ¯ ( 1 ) A ¯ ( 2 ) A ¯ ( n 3 ) ] , (4)

where the bidiag ( ) is an operator, maps the tensor A ¯ to the block diagonal matrix A ¯ size n 1 n 3 × n 2 n 3 . On the contrary, unbdiag ( ) maps the block diagonal matrix into a tensor:

unbdiag ( bdiag ( A ¯ ) ) = A ¯ . (5)

where unbdiag ( ) denotes the inverse operator of bdiag ( ) .

The block circulant matrix bcirc ( A ) n 1 × n 2 × n 3 of A is defined as

bcirc ( A ) = [ A ( 1 ) A ( n 3 ) A ( 2 ) A ( 2 ) A ( 1 ) A ( 3 ) A ( n 3 ) A ( n 3 1 ) A ( 1 ) ]

Lemma 1 Given any real vector v n , the associated v satisfies v 1 n and

conj ( v ¯ i ) = v ¯ n i + 2 , i = 2 , , n + 1 2 . (6)

Conversely, for any given complex v ¯ n , [6] is established.

By using Lemma 1, we have

{ A ¯ ( 1 ) n 1 × n 2 conj ( A ¯ ( i ) ) = A ¯ ( n 3 i + 2 ) , i = 2, , n 3 + 1 2 (7)

2.2. Preliminaries

In this part, we will give relevant definitions and theorems.

Now, a pair of folding operators are defined as follows. For a tensor A n 1 × n 2 × n 3 , we define

unfold ( A ) = [ A ( 1 ) A ( 2 ) A ( n 3 ) ] , fold ( unfold ( A ) ) = A .

where the operator maps the tensor A to a matrix of size n 1 n 3 × n 2 and fold is its inverse operator.

Definition 1 (T-product) [26] Consider that L is an arbitrary invertible linear transform. The T-product between tensor A n 1 × n 2 × n 3 and B n 2 × n 4 × n 3 based on L is defined as

C = A L B = L 1 ( unbdiag ( bdiag ( A ¯ ) × bdiag ( B ¯ ) ) ) (8)

where L 1 is the inverse transform of L, and × denotes the standard matrix product.

T-product is similar as matrix product, except that cyclic convolution is used to replace the multiplication between elements. When n 3 = 1 , the T-product simplifies to the standard matrix product.

Definition 2 (Tensor transpose) [26] Let L be an arbitrary invertible linear transform. For a tensor A n 1 × n 2 × n 3 , the tensor transpose of A is defined as A T , satisfies L ( A T ) ( i ) = ( L ( A ) ( i ) ) T , i = 1, , n 3 .

Definition 3 (Identity tensor) Suppose that L is any invertible linear transform. Tensor I n × n × n 3 is an identity tensor if the first frontal slice of L ( I ) is a n × n sized identity matrix and whose other frontal slices being all zeros.

It is clear that A L I = A and I L A = A given the appropriate dimensions. The tensor I ¯ = f f t ( I , [ ] ,3 ) is a tensor that each frontal slice being the identity matrix.

Definition 4 (Orthogonal tensor) Consider that L is any invertible linear transform. Tensor Q n × n × n 3 is orthogonal if it satisfies Q T L Q = Q L Q T = I .

Definition 5 (F-diagonal tensor) Tensor A n 1 × n 2 × n 3 is said to be an F-diagonal tensor if each frontal slice of L ( A ) is a diagonal matrix via an arbitrary invertible linear transform L.

Definition 6 (Inverse tensor) A frontal square tensor A of size n × n × n 3 has an inverse tensor B = A 1 provided.

A L B = I and B L A = I . (9)

Theorem 1 (T-SVD) Suppose that L is an arbitrary invertible linear transform and A n 1 × n 2 × n 3 . Then, T-SVD of A is given by

A = U L S L V T , (10)

where U L U T = I , V L V T = I , and S n 2 × n 2 × n 3 is an F-diagonal tensor.

The specific process will be presented in algorithm 1

Definition 7 (Tensor multi-rank and tubal rank) For A n 1 × n 2 × n 3 , let L be an arbitrary invertible linear transform satisfying L T L = L L T = l I n 3 with l > 0 being a constant. The tensor multi-rank is a vector denoted as r a n k m ( A ) = ( r a n k ( A ¯ ) ( 1 ) , , r a n k ( A ¯ ) ( n 3 ) ) T . The tensor tubal rank is defined as the number of nonzero singular tubes of S , i.e.,

r a n k t ( A ) = # { i , S ( i , i , : ) 0 } = # { i , S ( i , i ,1 ) 0 } . (11)

Definition 8 (TNN) Assume that A n 1 × n 2 × n 3 with corresponding T-SVD A = U L S L V T ; its tensor nuclear norm is defined as the sum of the singular values of all frontal slices of A ¯ , i.e.

A = i = 1 r S ( i , i ,1 ) = S , I = 1 n 3 A ¯ , (12)

Theorem 2 (Tensor Singular Value Thresholding) For any τ > 0 and Y n 1 × n 2 × n 3 , the solution of the problem

min X τ X + 1 2 X Y F 2 , (13)

is given by D τ ( Y ) , which is defined as

D τ ( Y ) = U L S τ L V T , (14)

where S τ = L 1 ( ( S ¯ τ ) + ) with x + = max ( x ,0 ) .

Definition 9 (Tensor Frobenius norm (TFN)) For any A n 1 × n 2 × n 3 , the tensor Frobenius norm of A is defined as

A F = A , A = 1 n 3 A ¯ F . (15)

2.3. Background

In this section, we briefly review the early work related to RPCA, and then discuss some recent developments of TRPCA and its application in computer vision.

RPCA tries to separate a low rank matrix L n 1 × n 2 and a sparse matrix E n 1 × n 2 from their sum M = L + E . As shown in [2], when the underlying tensor L satisfies the matrix incoherence conditions, the solution of the following problem:

min L , E L + λ E 1 s .t . L + E = M (16)

can highly recover L and E with high probability with parameter λ = 1 / max ( n 1 , n 2 ) . In order to solve (16), various methods have been proposed [27]. Among them, Alternating Direction Method of Multipliers (ADMM, or also known as inexact augmented Lagrange multiplier) are widely used. In addition, Zhou et al. and Agarwal et al. [1] proved that even under small noise measurements, convex approximation by nuclear norm can still achieve bounded and stable results.

Unlike the matrix case, it is very difficult to define tensor rank. Many versions of tensor rank are proposed [5] [28]. Using the matricization, converting the tensor into matrix, Lu et al. [14] introduced the tensor nuclear norm as A = 1 n 3 A ¯ , where A n 1 × n 2 × n 3 . Then, TRPCA is then given by

min L , E L + E 1 s .t . X = L + E .

But the rank function may not be well approximated by nuclear norm. To overcome this problem, the T-TNN method was proposed in [29], which proposes a new definition of tensor nuclear norm, extends the truncated nuclear norm from the matrix case to the tensor case:

A = T r ( S ) = T r ( S ¯ ( 1 ) ) = A ¯ ( 1 )

Afterwards, Jiang et al. [2] propose the partial sum of the tubal nuclear norm (PSTNN) of a tensor.

A PSTNN = i = 1 n 3 A ^ ( i ) P = N

The tensor RPCA model using PSTNN was formulated as

min L , E L PSTNN + E 1 s .t . O = L + E .

The PSTNN is a surrogate of the tensor tubal multi-rank. Though PSTNN is a better approximation to the rank function, the stability issue is still challenging.

3. Tensor Robust Principal Component Analysis Based on Mixed Truncated Norm (T-HTN)

In this section, we mainly introduce the tensor truncated nuclear norm(T-TNN) and tensor truncated Frobenius norm (T-TFN) newly defined in this paper, and propose a tensor hybrid truncated norm(T-HTN) regularization model to improve effectiveness and stability. In addition, the iterative strategy is given.

3.1. T-HTN Model

As mentioned earlier, T-TNN is a better approximation of the rank function, which only considers the last min ( m , n ) r singular value [30]. Since Frobenius norm can improve stability, we consider using Frobenius norm term to enhance stability, but the traditional Frobenius norm still needs to consider all singular values, which will make our T-TNN ineffective, which urges us to only consider the last min ( m , n ) r singular values for enhancing stability in the definition of Frobenius norm. Based on these analyses, we will construct a more reasonable regularization term called “tensor truncated Frobenius norm (T-TFN)” to replace the original Frobenius norm and match it with T-TNN. Firstly, we give the T-TNN and T-TFN defined in this paper:

Definition 10 (Tensor truncated nuclear norm (T-TNN)) For A n 1 × n 2 × n 3 , the tensor truncated nuclear norm A r , is defined as the sum of the last min ( m , n ) r singular values of all frontal slices of A ¯ , i.e.,

A r , = 1 n 3 i = 1 n 3 j = r + 1 min ( n 1 , n 2 ) ( σ j ( A ¯ ( i ) ) ) = A 1 n 3 i = 1 n 3 j = 1 r ( σ j ( A ¯ ( i ) ) ) (17)

where σ j ( A ¯ ( i ) ) denotes the j-th largest singular value of A ¯ ( i ) n 1 × n 2 .

Definition 11 (Tensor truncated Frobenius norm (T-TFN)) Given a tensor A n 1 × n 2 × n 3 , the tensor truncated Frobenius norm (T-TFN) nuclear norm A r , F is defined as the square root of the square sum of the last min ( m , n ) r singular values of all frontal slices of A ¯ , i.e.,

A r , F = 1 n 3 i = 1 n 3 j = 1 min ( n 1 , n 2 ) ( σ j ( A ¯ ( i ) ) ) 2 j = 1 r ( σ j ( A ¯ ( i ) ) ) 2 . (18)

Therefore, A r r F 2 can be defined as

A r , F 2 = 1 n 3 i = 1 n j j = r + 1 min ( n 1 , n 2 ) ( σ j ( A ¯ ( i ) ) ) 2 = A F 2 1 n 3 i = 1 n 3 j = 1 r ( σ j ( A ¯ ( i ) ) ) 2 (19)

Evidently, T-TFN could perfectly correspond to T-TNN as they both only concern the influence of the last min ( m , n ) r singular values. Therefore, the two can be combined to establish a more effective hybrid truncated tensor robust principal component model (T-HTN):

min L , E : L r , + γ L r , F 2 + λ E 1 s .t . X = L + E . (20)

Since L r , + γ L r , F 2 is nonconvex, it is not easy to solve the model (19). To this end, we establish the following theorems.

Theorem 3 For any L n 1 × n 2 × n 3 with corresponding T-SVD L = A L S L B T , the following formula holds:

L r , = L max A L A T = I B T L B = I T r ( A L S L B T ) (21)

Theorem 4 For any L n 1 × n 2 × n 3 with corresponding T-SVD L = A L S L B T , the following formula holds:

L r , F 2 = L F 2 max A L A T = I A L L F 2 (22)

Through the above theorems, Equation (19) can be rewritten as

min L , E : L max A L A T = I B T L B = I T r ( A L L L B T ) + γ ( L F 2 max A T A = I A L L F 2 ) + λ E 1 s .t . X = L + E (23)

3.2. Solving Scheme

In order to solve the optimization problem (22), we will exploit an efficient iterative approach, which is divided into two steps. Firstly, we define L 0 = O as the initial value of L . At the l-th iteration, fix L l and perform the T-SVD with L l = U L Q L V T , where U n 1 × r × n 3 , V n 2 × r × n 3 , Q n 1 × n 2 × n 3 , we set A l = U ( : ,1 : r , : ) T r × n 1 × n 3 , B l = V ( : , 1 : r , : ) T r × n 2 × n 3 . Then fixing A l and B l , we can update L l + 1 and E l + 1 by solving the next convex optimization problem:

min L , E : L T r ( A l L L L B l T ) + γ ( L F 2 A l L L F 2 ) + λ E 1 s .t . X = L + E (24)

In short, we implement these two steps and alternate between them until the iterative scheme meets the tolerance error. Algorithm 2 outlines the complete procedure for (22). Now the remaining key issue is how to solve the model (23), which will be discussed in the next subsection.

3.3. Optimization

In this section, an efficient iterative optimization scheme is created to optimize (23). According to the augmented Lagrangian multiplier method, the commonly used strategy is to approximately minimize the augmented Lagrangian function by adopting an alternating scheme, which we use to solve the model (23). To make the variables separable, we introduce the auxiliary variable Z , thus (23) is equivalent to the following form:

min L , E : L T r ( A l L L L B l T ) + γ ( L F 2 A l L L F 2 ) + λ E 1 s .t . L = Z X = L + E (25)

The augmented Lagrange function of (24) can be written as:

L ( L , Z , E , Y , W ) = L T r ( A l L Z L B l T ) + γ ( L F 2 A l L L F 2 ) + λ E 1 + W , L Z + α 2 L Z F 2 + Y , X L E + β 2 X L E F 2 (26)

where Y , W is the Lagrange multiplier matrix and α , β > 0 is the penalty parameter. So, we can adopt the alternating iteration strategy, i.e., fix some variables and solve the remaining one.

Step 1: Keep L k , Z k , E , Y k , W k unchanged and update E k + 1 through L ( L k , Z k , E , Y k , W k )

E k + 1 = arg min ε L ( L k , Z k , E , Y k , W k ) = arg min ε λ E 1 + Y k , X k L k E + β 2 X k L k E F 2 = arg min ε λ E 1 + β 2 E ( X k L k + β 1 Y k ) F 2 = S λ , β 1 ( X k L k + β 1 Y k ) (27)

Step 2: Keep L , Z k , E k + 1 , Y k , W k unchanged and update L k + 1 through L ( L , Z k , E k + 1 , Y k , W k )

L k + 1 = arg min L L ( L , Z k , E k + 1 , Y k , W k ) = arg min L L + γ ( L F 2 A l L L F 2 ) + W k , L Z k + α 2 L Z k F 2 + Y k , X k L E k + 1 + β 2 X k L E k + l F 2 = arg min L L + γ ( L F 2 A l L L F 2 ) + α 2 L Z k + α 1 W k F 2 + β 2 L X k + E k + 1 β 1 Y k F 2 (28)

Obviously, Equation (27) is a quadratic function about L . Let

G ( L ) = γ ( L F 2 A l L L F 2 ) + α 2 L Z k + α 1 W k F 2 + β 2 L X k + E k + 1 β 1 Y k F 2

be a function with respect to L , then G ( L ) can be approximated by the linearization at L k :

G ( L ) = G ( L k ) + G ( L k ) ( L L k ) + τ 2 L L k F 2 .

where τ is a parameter, and

G ( L k ) = [ ( 2 γ + α + β ) I 2 γ A l L A l T ] L L + W k Y k + β ( E k + 1 X k ) α Z k

denotes the gradient of the function G ( ) at L k . Thus, (27) with respect to L can be reformulated as:

L k + 1 = arg min L L + τ 2 L L k + τ 1 G ( L k ) F 2 (29)

By Theorem 1 and Theorem 2, this could be solved as

L k + 1 = S τ 1 ( L k G ( L k ) τ ) (30)

Step 3: Keep L k + 1 , Z , E k + 1 , Y k , W k unchanged and update Z k + 1 through L ( L k + 1 , Z , E k + 1 , Y k , W k )

Z k + 1 = arg min Z L ( L k + 1 , Z , E k + 1 , Y k , W k ) = arg min Z W k , L k + 1 Z + α 2 L k + 1 Z F 2 T r ( A l L Z L B T ) (31)

Obviously, the objective function of (30) is a quadratic function. Through a simple derivation operation, we get

A l T L B l W k α L k + 1 + α Z = 0

This means that

Z k + 1 = L k + 1 + α 1 ( A l T L B l + W k ) (32)

Step 4: Update Lagrange multipliers Y k + 1 , W k + 1 :

Y k + 1 = Y k + β ( X k + 1 L k + l E k + 1 ) W k + 1 = W k + α ( L k + 1 Z k + l ) (33)

To sum up, algorithm 3 gives a complete program.

3.4. Adaptive Parameter

Previous studies have shown that the computational cost could increase significantly, since the algorithm may converge slowly when the fixed penalty parameter α and β are chosen too small or large [28] [29], and we need to constantly try different values to pick a good value of α and β . It is not easy to choose the optimal α and β . Accordingly, a dynamical α and β may be required to accelerate convergence in real applications. With the aid of the way in [31], the following adaptive update criterion is used:

α k + 1 = min ( α max , ρ 1 α k ) (34)

β k + 1 = min ( β max , ρ 2 β k ) (35)

where α max and β max is an upper bound of α k and β k , and ρ is defined as:

ρ 1 = ( ρ 0 , if α k max ( L k + 1 L k F , Z k + 1 Z k F ) Y F < k 1, otherwise (36)

ρ 2 = ( ρ 0 , if β k max ( L k + 1 L k F , Z k + 1 Z k F ) Y F < k 1, otherwise (37)

where ρ 1 > 1 and ρ 2 > 1 are constant and k > 0 is a threshold chosen in advance.

3.5. Convergence Analysis

This section discusses the convergence of the proposed algorithm.

Firstly, the optimal value of Equation (24) is expressed as follows

p = inf { L T r ( A l L Z L B l T ) + γ ( L F 2 A l L L F 2 ) + λ E 1 : L = Z X = L + Z } (38)

The augmented Lagrange function is written as

L 0 ( L , Z , E , Y , W ) = L T r ( A l L Z L B T ) + γ ( L F 2 A l L L F 2 ) + λ E 1 + W , L Z + Y , X L E (39)

In addition, in order to further analyze the convergence of the algorithm, the following three lemmas are established

Lemma 2 Let ( L , Z , E , X , Y , W ) be the optimal solution of the augmented Lagrange function (38) and satisfy X = L + E , then:

p p k + 1 W , L k + 1 Z k + 1 + Y , X k + 1 L k + 1 E k + 1 (40)

where ( L k + 1 , Z k + 1 , X k + 1 , E k + 1 ) is the solution of the k + 1 -th iteration in the algorithm 3, p k + 1 is the k-th iterative solution of (38). p = L k + 1 T r ( A l L Z k + 1 L B l T ) + γ ( L k + 1 F 2 A l L L k + 1 F 2 ) + λ E k + 1 1 .

Lemma 3 Let R k + 1 = L k + 1 Z k + 1 , W k + 1 = W k + α R k + 1 , Q k + 1 = X k + 1 L k + 1 E k + 1 , Y k + 1 = Y k + β Q k + 1 then the following inequality holds:

p k + 1 p W k + 1 , R k + 1 + Y k + 1 , Q k + 1 α Z k Z k + 1 , R k + 1 + Z Z k + 1 + β E k E k + 1 , Q k + 1 + L X k + 1 E k + 1 (41)

Lemma 4 Let ( L , Z , E , X , Y , W ) be the optimal solution of the augmented Lagrange function (38) and R k + 1 = L k + 1 Z k + 1 , Q k + 1 = X k + 1 L k + 1 E k + 1 , define:

t k = 1 α W k W F 2 + α Z Z k + 1 F 2 + 1 β Y k Y F 2 + β L L k + 1 F 2 (42)

Then t k is decreasing in each iteration and satisfies the following relationship:

t k t k + 1 α R k + 1 ( Z k Z k + 1 ) F 2 + β Q k + 1 ( E k E k + 1 ) F 2 (43)

According to the above results, the following convergence theorems are obtained:

Theorem 5 Based on the conditions in lemmas 2, 3 and 4, the iterative solution p k p , when k .

4. Experiments

In this section, we will conduct numerical experiments to confirm our main results. We studied the ability of T-HTN to recover various tube rank tensors from various sparse noises, and applied T-HTN to image denoising.

4.1. Synthetic Data Experiments

In this section, we compare the accuracy and speed of T-HTN and TNN based TRPCA [22] on synthetic datasets. Given tensor size n 1 × n 2 × n 3 and tubal rank r min { n 1 , n 2 } , we first generate a tensor L 0 n 1 × n 2 × n 3 by L 0 = A B ,where the elements of tensors A n 1 × r × n 3 , B r × n 2 × n 3 are sampling from independent identically distributed (i.e.) standard Gaussian distribution. Then, we from L by L = n 1 n 2 n 3 L 0 / L 0 F . Next, the support of E is uniformly sampled at random. For any ( i , j , k ) s u p p ( E ) , we get E i j k = B i j k , where B is a tensor with independent Bernoulli ±1 entries. Thus, we get the observation tensor M = L + E . In the experiment part, we set the parameters

λ = 1 / n 3 max { n 1 , n 2 } . We use the relative square error (RSE) to evaluate the estimated L of the underlying tensor L ^ , that is R S E ( L ^ , L ) : = L ^ L F L F .

4.1.1. Effectiveness and Efficiency of T-HTN

Firstly, we prove that T-HTN can accurately recover the underlying tensor L . Based on experience, we test the recovery performance of tensors of different sizes by setting n = n 1 = n 2 { 150 , 300 , 350 } and n 3 = 3 , with

( r t ( L ) , E 0 ) = ( 0.1 n ,0.1 n 2 n 3 ) . Then a more difficult setting

( r t ( L ) , E 0 ) = ( 0.1 n ,0.3 n 2 n 3 ) is tested. The results are shown in Table 1. and

Table 2. It is not difficult to see that T-HTN and TRPCA have the same good recovery performance, because both can accurately recover the underlying tensor, but T-HTN has obvious advantages in time.

To test the influence of the distribution of outliers on T-HTN performance, we obtain the outlier tensor element E from i.i.d Gaussian distribution N ( 0,1 ) , or uniform distribution U [ 0,1 ] . The corresponding results are given in Table 3 and Table 4. The performance of T-HTN and TRPCA is not significantly affected by the outlier’s distribution. This phenomenon can be explained by Theorem 4.1 in [22], that is when the outlier tensor E is uniformly distributed, TRPCA can accurately restore the underlying tensor L under some mild conditions. In [22], only one hypothesis is made for the random location distribution, but no hypothesis is made for the size or symbol of non-zero terms. Because the T-HTN proposed in this paper can well simulate TRPCA with low tube rank tensors, T-HTN is also robust to outlier.

Table 1. Comparison with TRPCA [22] in both accuracy and speed for different tensor sizes when ( r t ( L ) , E 0 ) = ( 0.1 n ,0.1 n 2 n 3 ) .

Table 2. Comparison with TRPCA [22] in both accuracy and speed for different tensor sizes when ( r t ( L ) , E 0 ) = ( 0.1 n ,0.3 n 2 n 3 ) .

Table 3. Comparison with TRPCA [22] in both accuracy and speed for different tensor sizes when the outlier tensor element E from i.i.d Gaussian distribution N ( 0,1 ) .

Table 4. Comparison with TRPCA [22] in both accuracy and speed for different tensor sizes when the outlier tensor element E from i.i.d Gaussian distribution U ( 0,1 ) .

In order to further verify the efficiency of the proposed T-HTN, we consider a special case: fixing the tubal rank of the underlying tensor L and changing the size of the underlying tensor L . Explicitly, we fix r t ( L ) = 10 , and vary n { 100,150,300,400,500 } with n 3 = 3 . We tested each data 10 times and calculated the average time. Figure 1 shows the relationship between average time and tensor size. Obviously, both can accurately recover the underlying tensor, but in terms of running time, TRPCA expands super-linearly and T-HTN expands approximately linear scaling.

4.1.2. Effects of the Number of Truncated Singular Values

The performance of T-HTN largely depends on the number of truncated elements of L in (23), that is, the number of singular values to be retained. Here, we discuss the influence of different truncation degrees of Frobenius norm and nuclear norm on the accuracy and speed of T-HTN. Specifically, we consider the tensor with the size of 300 × 300 × 3 and set up four different truncation methods r = r t ( L ) and sparsity s = E 0 as ( r , s ) { ( 10,4.05 e 4 ) , ( 10,8.1 e 4 ) , ( 15,4.05 e 4 ) , ( 15,8.1 e 4 ) } , where the elements outliers follow i.i.d. N ( 0,1 ) . By varying the initialized r { 5,10,15, ,50 } . Firstly, we give the effect of the number of truncated singular values on the recovery performance of the underlying tensor L .The results are shown in Figure 2. There is a phrase transition point r p t . Once the reserved singular valuer is greater than it, the RSE of L ^ will decrease rapidly. Then, we give the influence of the number of truncated singular values on the estimation performance of outlier tensor E . Figure 3 shows the RSE and l 0 -norm of E ^ finally solved. Finally, we show the effect of the number of truncated singular values on the running time of T-HTN in Figure 4. Obviously, the more singular values remain, the longer the running time, and the more complex the underling tensor becomes, which is also consistent with our intuition.

4.2. Real Data Experiments

In this section, we evaluate the efficiency of T-HTN proposed in this paper on real data sets and compare it with TRPCA. Specifically, we do tensor restoration experiments on color image data. The purpose is to restore the original image from its corrupted observation. For the estimation L ^ of the underlying tensor L , we choose the relative square error(RSE), the peak signal-to-noise ratio (PSNR), and the structured similarity index(SSIM) as the evaluation index, which is respectively defined as,

Figure 1. Computation time of TRPCA and proposed T-HTN.

Figure 2. Effects of initialized tubal rank r on the recovery performance of the underlying tensor L.

Figure 3. Effects of initialized tubal rank r on the recovery performance of the underlying tensor E.

Figure 4. Effects of initialized tubal rank r on the running time of T-HTN.

PSNR = 10 log 10 ( n 1 n 2 n 3 L 2 L ^ L F 2 ) ,

and

SSIM = ( 2 μ L μ L ^ + c 1 ) ( σ L L ^ + c 2 ) ( μ L 2 + μ L ^ 2 + c 1 ) ( σ + σ + c 2 ) .

where μ L and μ L ^ are the average values of original tensor and recovered tensor represents the covariance of L and L ^ , σ L and σ L ^ respectively denotes the standard deviation of L and L ^ . Natural scenes images follow natural statistics [32]. As shown by Hu et al. [33], information of image scenes is dominated by the top 20 singular values, which is low-rank. In Figure 5, our experimental data are from 10 common color images. The size of the selected images is 300 × 300 × 3 . When these color images are converted into matrices, they can be regarded as low rank structures. The same is true for tensor data.

Table 5. PSNR values of different truncated singular values.

Figure 5. Tested color images.

Number of Optimal Truncated Singular Values

First, we apply our algorithm to a picture data. We need to preset the number of truncated singular values. Therefore, we designed an experiment in this part, set the number of reserved singular values to r 0 { 2,4, ,18,20 } , test all possible values, and select the best value. We test 10 images and selected the number of optimal truncated singular values. The detailed results are shown in Table 5. In the worst case of our approach, it is sufficient to test r 0 from 1 to 20. Even if the best r 0 is not selected, its effectiveness will not be lost.

4.3. Conclusions

This paper introduces a new regularization term T-TFN, integrates T-TFN and T-TNN, and proposes a new model T-HTN for TRPCA. Because of the emergence of T-TFN, this new T-HTN regularization method can improve the effectiveness and stability of the model. Then, an effective iterative framework is designed to solve the new optimization model. In addition, its convergence is also derived in the mathematical field. The experimental results on synthetic data, real visual images and recommendation systems, especially the use of statistical analysis methods, illustrate the advantages of HTN over other methods. In other words, HTN method is more effective and stable.

There is still room for further research in this field, for example, how to prove the stability of the proposed method mathematically, and how to select the proportion of missing elements to ensure some theoretical analysis of effective performance.

This work is supported by Innovative talents support plan of colleges and universities in Liaoning Province (2021).

Conflicts of Interest

The authors declare no conflicts of interest.

References

[1] Lu, C., Feng, J., Chen, Y., et al. (2017) Tensor Robust Principal Component Analysis: Exact Recovery of Corrupted Low-Rank Tensors via Convex Optimization. 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, 27-30 June 2016, 5249-5257. https://doi.org/10.1109/CVPR.2016.567
[2] De Lathauwer, L. and Vandewalle, J. (2004) Dimensionality Reduction in Higher-Order Signal Processing and Rank-(R1, R2, …, RN) Reduction in Multilinear Algebra. Linear Algebra and Its Applications, 391, 31-55. https://doi.org/10.1016/j.laa.2004.01.016
[3] Vasilescu, M. and Terzopoulos, D. (2003) Multilinear Subspace Analysis of Image Ensembles. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Madison, 18-20 June 2003, II-93.
[4] Cyganek, B. (2015) Visual Pattern Recognition Framework Based on the Best Rank Tensor Decomposition. In: Tavares, J.M.R.S. and Jorge, R.N., Eds., Developments in Medical Image Processing and Computational Vision, Springer International Publishing, Berlin, 89-103.
[5] Kolda, T.G. and Ba Der, B.W. (2009) Tensor Decompositions and Applications. SIAM Review, 51, 455-500. https://doi.org/10.1137/07070111X
[6] Candès, E.J., Li, X.D., et al. (2009) Robust Principal Component Analysis? Journal of the ACM, 58, 1-37.
[7] Wright, J., Ganesh, A., Rao, S., et al. (2009) Robust Principal Component Analysis: Exact Recovery of Corrupted Low-Rank Matrices via Convex Optimization. 23rd Annual Conference on Neural Information Processing Systems 2009, Vancouver, 7-10 December 2009.
[8] Bao, B.K., Liu, G., Xu, C., et al. (2012) Inductive Robust Principal Component Analysis. IEEE Transactions on Image Processing, 21, 3794-3800. https://doi.org/10.1109/TIP.2012.2192742
[9] Javed, S., Mahmood, A., Bouwmans, T., et al. (2017) Motion-Aware Graph Regularized RPCA for Background Modeling of Complex Scenes. 2016 IEEE 23rd International Conference on Pattern Recognition (ICPR), Cancun, 4-8 December 2016, 120-125. https://doi.org/10.1109/ICPR.2016.7899619
[10] Liu, G., Lin, Z., Yan, S., et al. (2010) Robust Recovery of Subspace Structures by Low-Rank Representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35, 171-184.
[11] Golbabaee, M. and Vandergheynst, P. (2012) Compressed Sensing of Simultaneous Low-Rank and Joint-Sparse Matrices. https://doi.org/10.1109/ICASSP.2012.6288484
[12] Hillar, C.J. and Lim, L.H. (2009) Most Tensor Problems Are NP-Hard. Journal of the ACM, 60, Article No. 45. https://doi.org/10.1145/2512329
[13] Gandy, S., Recht, B. and Yamada, I. (2011) Tensor Completion and Lown-Rank Tensor Recovery via Convex Optimization. Inverse Problems, 27, Article ID: 025010. https://doi.org/10.1088/0266-5611/27/2/025010
[14] Tucker, L. (1966) Some Mathematical Notes on Three-Mode Factor Analysis. Psychometrika, 31, 279-311. https://doi.org/10.1007/BF02289464
[15] Kilmer, M.E., Braman, K., Hao, N., et al. (2013) Third-Order Tensors as Operators on Matrices: A Theoretical and Computational Framework with Applications in Imaging. SIAM Journal on Matrix Analysis and Applications, 34, 148-172. https://doi.org/10.1137/110837711
[16] Bo, H., Mu, C., Goldfarb, D., et al. (2015) Provable Models for Robust Low-Rank Tensor Completion. Pacific Journal of Optimization, 11, 339-364.
[17] Anandkumar, A., Jain, P., Shi, Y., et al. (2015) Tensor vs Matrix Methods: Robust Tensor Decomposition under Block Sparse Perturbations.
[18] Kilmer, M.E., et al. (2011) Factorization Strategies for Third-Order Tensors. Linear Algebra and Its Applications, 435, 641-658. https://doi.org/10.1016/j.laa.2010.09.020
[19] Wang, A., Zhong, J. and Li, X. (2018) Coherent Low-Tubal-Rank Tensor Completion. 2017 4th IAPR Asian Conference on Pattern Recognition (ACPR), Nanjing, 26-29 November 2017, 518-523. https://doi.org/10.1109/ACPR.2017.66
[20] Lu, C., Feng, J., Chen, Y., et al. (2020) Tensor Robust Principal Component Analysis with a New Tensor Nuclear Norm. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42, 925-938. https://doi.org/10.1109/TPAMI.2019.2891760
[21] Li, T. and Ma, J. (2019) Non-Convex Penalty for Tensor Completion and Robust PCA.
[22] Lu, C., Feng, J., Chen, Y., et al. (2020) Tensor Robust Principal Component Analysis with a New Tensor Nuclear Norm. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42, 925-938. https://doi.org/10.1109/TPAMI.2019.2891760
[23] Jiang, T.X., Huang, T.Z., Zhao, X.L., et al. (2017) Multi-Dimensional Imaging Data Recovery via Minimizing the Partial Sum of Tubal Nuclear Norm. Journal of Computational and Applied Mathematics, 372, Article ID: 112680.
[24] Ye, H., Li, H., et al. (2019) A Hybrid Truncated Norm Regularization Method for Matrix Completion. IEEE Transactions on Image Processing, 28, 5171-5186.
[25] Kilmer, M.E., Martin, C.D. and Perrone, L. (2008) A Third-Order Generalization of the Matrix SVD as a Product of Third-Order Tensors.
[26] Lu, C., Peng, X. and Wei, Y. (2019) Low-Rank Tensor Completion with a New Tensor Nuclear Norm Induced by Invertible Linear Transforms. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, 15-20 June 2019, 5989-5997. https://doi.org/10.1109/CVPR.2019.00615
[27] Lu, C., Feng, J., Lin, Z. and Yan, S. (2018) Exact Low Tubal Rank Tensor Recovery from Gaussian Measurements. Proceedings of the 27th International Joint Conference on Artificial Intelligence, Stockholm, 13-19 July 2018, 2504-2510. https://doi.org/10.24963/ijcai.2018/347
[28] He, B.S., Yang, H. and Wang, S.L. (2000) Alternating Direction Method with Self-Adaptive Penalty Parameters for Monotone Variational Inequalities. Journal of Optimization Theory and Applications, 106, 337-356. https://doi.org/10.1023/A:1004603514434
[29] Yang, J. and Yuan, X. (2012) Linearized Augmented Lagrangian and Alternating Direction Methods for Nuclear Norm Minimization. Mathematics of Computation, 82, 301-329. https://doi.org/10.1090/S0025-5718-2012-02598-1
[30] Xue, S.K., Qiu, W.Y., Liu, F., et al. (2017) Low-Rank Tensor Completion by Truncated Nuclear Norm Regularization. 2018 24th International Conference on Pattern Recognition (ICPR), Beijing, 20-24 August 2018, 2600-2605. https://doi.org/10.1109/ICPR.2018.8546008
[31] Lin, Z., Liu, R. and Li, H. (2015) Linearized Alternating Direction Method with Parallel Splitting and Adaptive Penalty for Separable Convex Programs in Machine Learning. JMLR: Workshop and Conference Proceedings, Vol. 29, 116-132. https://doi.org/10.1007/s10994-014-5469-5
[32] Fan, H., Kuang, G. and Qiao, L. (2017) Fast Tensor Principal Component Analysis via Proximal Alternating Direction Method with Vectorized Technique. Applied Mathematics, 8, 77-86. https://doi.org/10.4236/am.2017.81007
[33] Hu, Y., et al. (2013) Fast and Accurate Matrix Completion via Truncated Nuclear Norm Regularization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35, 2117-2130. https://doi.org/10.1109/TPAMI.2012.271

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.