Robust Principal Component Analysis Integrating Sparse and Low-Rank Priors

Abstract

Principal Component Analysis (PCA) is a widely used technique for data analysis and dimensionality reduction, but its sensitivity to feature scale and outliers limits its applicability. Robust Principal Component Analysis (RPCA) addresses these limitations by decomposing data into a low-rank matrix capturing the underlying structure and a sparse matrix identifying outliers, enhancing robustness against noise and outliers. This paper introduces a novel RPCA variant, Robust PCA Integrating Sparse and Low-rank Priors (RPCA-SL). Each prior targets a specific aspect of the data’s underlying structure and their combination allows for a more nuanced and accurate separation of the main data components from outliers and noise. Then RPCA-SL is solved by employing a proximal gradient algorithm for improved anomaly detection and data decomposition. Experimental results on simulation and real data demonstrate significant advancements.

Share and Cite:

Zhai, W. and Zhang, F. (2024) Robust Principal Component Analysis Integrating Sparse and Low-Rank Priors. Journal of Computer and Communications, 12, 1-13. doi: 10.4236/jcc.2024.124001.

1. Introduction

Principal Component Analysis (PCA) is a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components. The number of principal components is less than or equal to the number of original variables. This transformation is defined in such a way that the first principal component has the largest possible variance, accounting for as much of the variability in the data as possible. Each succeeding component, in turn, has the highest variance possible under the constraint that it is orthogonal to the preceding components. The resulting vectors (principal components) form an uncorrelated orthogonal basis set [1] [2] [3] . PCA has been widely used in data analysis and dimensionality reduction [4] . However, it has several drawbacks that limit its application in certain scenarios. For instance, it is sensitive to the scale of the features, where variables with larger scales can dominate the outcome, leading to a biased representation of the data unless the data is properly normalized. Furthermore, PCA is sensitive to outliers in the data, which can significantly influence the direction of the first principal components and skew the analysis [5] .

Robust Principal Component Analysis (RPCA) [6] [7] is an extension of traditional PCA designed to overcome some of the inherent limitations of PCA, particularly its sensitivity to outliers and noise. RPCA seeks to decompose a data matrix into a low-rank matrix and a sparse matrix, where the low-rank matrix captures the data’s underlying structure, and the sparse matrix captures the outliers or anomalies [8] [9] . This decomposition allows for a more robust analysis of data, especially in scenarios where the data is corrupted by noise or contains outliers that would otherwise skew the results of traditional PCA [10] .

The characteristics of sparse and low-rank play crucial roles in the efficacy of RPCA, significantly enhancing its ability to extract meaningful information from complex data sets. Each characteristic targets specific aspects of the data, allowing RPCA to address a broad range of challenges in data analysis, particularly in the presence of outliers, noise, and complex underlying structures.

The sparse prior in Robust Principal Component Analysis (RPCA) is a fundamental concept that enables the effective separation of outliers and anomalies from the predominant data structure [11] . This prior assumes that anomalies or outliers in a dataset are not pervasive but occur sparsely across the data. The sparse prior allows RPCA to robustly identify and isolate these irregular components, ensuring they do not influence the extraction of the main low-rank structure, which represents the data’s underlying patterns and relationships. The literature applies RPCA with a focus on the sparse prior for detecting moving objects in videos [12] , showcasing the practical application of the sparse prior in separating dynamic foreground elements from a static background. Variations of RPCA incorporate a weighted approach to nuclear norm minimization, highlighting the role of the sparse prior in image denoising applications [13] . Furthermore, discussions on scalable optimization techniques for RPCA underline the importance of the sparse prior in making the problem tractable for large datasets.

The low-rank prior in RPCA underpins the assumption that the data’s intrinsic structure can be captured by a matrix of significantly lower rank compared to the original data dimensions. This principle is crucial for applications like background subtraction in video surveillance, image denoising [14] [15] , and data compression, where the essence of the data can be distilled into a simpler form without losing critical information.

Thus, integrating sparse and smooth priors can address a comprehensive range of challenges in data decomposition and anomaly detection, enhancing the method’s versatility and effectiveness across various applications. Each prior targets a specific aspect of the data’s underlying structure and their combination allows for a more nuanced and accurate separation of the main data components from outliers or noise. In this paper, we propose a novel RPCA variant called Robust PCA Integrating Sparse and Low-rank Priors (RPCA-SL) and employ a proximal gradient algorithm to solve the RPCA-SL. Furthermore, a comparative evaluation shows the performance of our algorithm.

The rest of the paper is organized as follows: Section 2 reviews the related RPCA model. In Section 3, we propose a novel model, RPCA-SL, and its corresponding optimization algorithm. Section 4 illustrates the experiments for synthetic problems and multispectral imaging.

2. Related Work

Mathematically, RPCA via decomposition into low-rank and sparse matrices can be directly formulated as:

min L , S r a n k ( L ) + λ S 0 s .t . D = L + S . (1)

In this model, the D m × n represents the observed matrix, which is the sum of a low-rank matrix L m × n and a sparse matrix S m × n , where the sparse matrix captures the outliers or anomalies present in the data. The model goal is to decompose X into its low-rank L and sparse S components. The cardinality S 0 represents the number of non-zero elements in S, which is a measure of its sparseness [16] .

The direct approach to achieving this decomposition involves solving an optimization problem that minimizes the rank and the cardinality. However, this problem is computationally intractable because the rank function and the L0-norm are non-convex and, in the case of rank, also discrete.

To overcome this computational challenge, the problem is reformulated by substituting the rank function with the nuclear norm (the sum of the singular values of the matrix), and the L0-norm with the L1-norm (the sum of the absolute values of the matrix elements). This substitution results in a convex relaxation of the original problem, making it computationally feasible to solve:

min L , S L + λ S 1 s .t . D = L + S . (2)

Furthermore, the [17] extends RPCA to projected robust PCA (PRPCA):

min X , S 1 2 D P X Q S F 2 + λ 1 X + λ 2 S 1 . (3)

where P and Q are respectively certain row-smoother and column-smoother.

3. Proposed Model and Algorithm

3.1. Robust PCA Integrating Sparse and Low-Rank Priors

To further purse low rank prior, we extend the nuclear norm in PRPCA to truncated nuclear norm [18] [19] . The truncated nuclear norm is a variation of the nuclear norm used to enhance the performance of low-rank matrix recovery in various signal processing and machine learning tasks. However, the nuclear norm treats all singular values equally, which may not always be ideal, especially when some singular values are significantly larger than others, indicating the presence of important structural information in the matrix that should be preserved. The truncated nuclear norm is defined as

X r = i = r + 1 min ( m , n ) σ i ( X ) . (4)

This approach aims to focus on the lower part of the spectrum of singular values, under the premise that the smallest singular values contribute more to the noise or redundancy in the data, whereas the largest singular values carry the most significant structural information about the matrix. By truncating, or omitting, a certain number of the largest singular values from the summation, the truncated nuclear norm attempts to provide a more refined approximation of the matrix rank, which can lead to better recovery of the low-rank component by reducing the influence of the largest singular values that are assumed to represent essential features of the data.

On another hand, we extend the L1-norm in PRPCA to non-convex regularization, called half-quadratic function (HQF), for achieving robustness and sparsity. The scalar half-quadratic function is defined by

g δ ( t ) = { δ 2 ( | t | δ ) 2 2 , | t | < δ δ 2 2 , | t | δ . (5)

As shown in Figure 1, g δ ( t ) is quadratic at interval [ δ , δ ] , and a constant t δ .

Figure 1. The function values at δ = 1 and 0.5. The L1 norm is also displayed.

From Figure 1, one can see that the parameter δ can be tuned for approximating L1 norm. At last, we integrating the half-quadratic function and truncated nuclear and propos our model, Robust PCA Integrating Sparse and Low-rank Priors (RPCA-SL), as following:

min L , S 1 2 D L S F 2 + λ 1 L r + λ 2 S g δ (6)

3.2. Optimization Algorithm

In (6) optimizing the L and S simultaneously is tractable, thus we optimize (6) by employing alternating minimization. Alternating minimization is an optimization technique used to solve problems that involve minimizing a function with respect to two (or more) sets of variables, by alternately fixing one set of variables and optimizing over the other. This approach is particularly useful in scenarios where the optimization problem can be simplified or becomes more tractable when one set of variables is held constant. The Alternating minimization for (6) involves two main steps that are repeated iteratively:

Step One: Keep S constant and minimize the objective function with respect to the other variable L. Thus we have sub-problem:

min L 1 2 D L S F 2 + λ 1 L r . (7)

Step Two: Keep L constant and minimize S by following sub-problem:

min S 1 2 D L S F 2 + λ 2 S g δ . (8)

Firstly, the (8) can be solved by employing Lemma 1:

Lemma 1. For any given matrix X m × n [20] , we have

P ( X ) = arg min S 1 2 X S F 2 + S g δ . (9)

where P ( r ) is defined by

P ( r ) = { 0 , | r | < e r , | r | e . (10)

By Lemma 1, we can obtain the optima P ( 1 λ 2 ( D L ) ) .

Now we return to solve (7), before that we give following lemma:

Lemma 2. For any given matrix X m × n [18] , and matrices A r × m , B r × n , that A A T = I , B B T = I , where r min ( m , n ) , we have

T r ( A X B T ) i = 1 r σ i ( X ) . (11)

By Lemma 2, we have

L r = L max A A T = I , B B T = I T r ( A L B T ) . (12)

Thus, the (7) can be rewritten as follows:

min L 1 2 D L S F 2 + λ 1 ( L max A A T = I , B B T = I T r ( A L B T ) ) . (13)

Then two-step approach for solving (13) can be given by Algorithm 1:

The (14) can be solved by Accelerated Proximal Gradient method (APG). The APG method, also known as Accelerated Proximal Gradient Descent or FISTA (Fast Iterative Shrinkage-Thresholding Algorithm), is an optimization algorithm designed to efficiently solve problems of the sum of two functions, where one is not necessarily differentiable. The details can be found in [18] [21] [22] , and omitted here for simplicity.

The whole optimization procedure is summarized in Algorithm 2.

4. Experiments

The proposed algorithm is compared with four state-of-the-art RPCA algorithms, namely, RPCA-HQF [9] , accelerated alternating projections (AccAltProj) [23] , TNNR-APGL [18] and RCPA-CUR [20] . In our experiments, the parameters of algorithms are tuned to achieve the best performance. It includes synthetic examples where target matrices are generated and various algorithmsare used to recover low rank part. Furthermore applications to computational multispectral imaging are also discussed.

4.1. Synthetic Data

Two random matrices U 400 × r , V 300 × r whose entries satisfy the standard Gaussian distribution, are generated to construct the synthetic matrix D = U V T . For verify robustness, we add white Gaussian noise to D with different signal-to-noise ratios (SNR) 5, 10, 15, 20, 25. The noise matrix is denoted as X. The recovery matrix by various algorithm on X is denoted as M. To test the performance of all algorithms, the relative error is employed, given by

relError = D M F D F (15)

The relative error relErrors are shown in Figure 2 when r = 20. As Figure 2 showing, all the algorithms show a performance decrease as the SNR increases. The performance of all algorithms converges at the SNR = 25 db. RPCA-SL starts off with the highest metric value at the smallest SNR but shows a significant improvement as SNR increases. RPCA-CUR appears to have the best performance across all SNRs, starting at the lowest point and maintaining this relative position.

Furthermore, we vary rank r = 5, 10, 15, 20, 25. The relative error at 10 dB noise is shown in Figure 3. In contrast to the previous graph, as r increases, the performance metric also increases for all algorithms. This suggests that a higher r correlates with a higher recovery error. RPCA-SL and RPCA-HQF start with the lowest error at r = 5, indicating better initial performance, and their performance degrades more slowly compared to the others as r increases. TNNR-APGL and ACCAItProg begin with slightly higher error values at r = 5 but follow a similar trend as RPCA-SL and RPCA-HQF as r increases.

Figure 2. The relative recovery error of algorithms at varying SNR.

4.2. Computational Hyperspectral Imaging

Multispectral imaging captures a broad spectrum, generating spectral response vectors at each pixel, thereby acquiring information in the form of a third-order tensor. Low-rank models play a crucial role in multispectral imaging through the form of a linear spectral mixing model, which posits that the spectral response of an imaging scene can be well approximated as a linear combination of spectral responses of a few core materials, known as endmember. Hence, the low-rank structure can be exploited by computational imaging systems, which acquire images in a compressed format and employ computational methods to reconstruct high-resolution images [24] . However, when different materials are closely situated, the resulting spectra might be a highly nonlinear combination of the endmembers, leading to model anomalies. Herein, we propose low-rank plus sparse matrix recovery as a method to model spectral anomalies within the low-rank structure.

Figure 3. The relative recovery error of algorithms at varying rank.

Figure 4. The RGB rendering of hyperspectral image.

(a) (b)(c) (d)(e)

Figure 5. The RGB renderings of recovery HSIs by various algorithms. (a) RPCA-SL, (b) RPCA-HQF, (c) TNNR-APGL, (d) AccAltProj, (e) RPCA-CUR.

(a) (b)(c) (d)(e)

Figure 6. The spatial PSNRs of recovery HSIs by various algorithms. (a)RPCA-SL, (b) RPCA-HQF, (c) TNNR-APGL, (d) AccAltProj, (e) RPCA-CUR.

A 512 × 512 × 31 hyperspectral image from the CAVE dataset [25] is rearranged into a matrix of size 262,144 × 31, denoted as D. The RGB rendering using wavelengths 700 nm (red channel), 550 nm (green channel), 440 nm (blue channel) is shown in Figure 4. Then algorithms are employed for D, the Peak Signal-to-Noise Ratio (PSNR) value is a widely used criterion to evaluate the quality of an image. We use the PSNR value of the recovered image to evaluate the performance of different methods. The corresponding mean PSNRs at all spectral band is RPCA-SL 46.1977, RPCA-HQF 44.6328, TNNR-APGL 40.4962, AccAltProj 39.4791, and RPCA-CUR 30.4724.

In Figure 5 each rendering shows the result of the image recovery by the respective algorithm. The quality of the image recovery can be assessed visually by how well the details and colors are preserved or restored in comparison to an original image. All images display a stuffed toy and a color chart, which are commonly used to evaluate color accuracy and the preservation of spatial details. Subfigures (a) RPCA-SL and (e) RPCA-CUR seem to have more vibrant colors and greater contrast, which suggest better performance in terms of color recovery. Subfigures (b) RPCA-HQF, (c) TNNR-APGL, and (d) AccAltProj show images with varying degrees of color fidelity and brightness, which can be indicators of how each algorithm handles the recovery process.

Furthermore, Figure 6 demonstrated spatial PSNRs. On the right side of each heatmap, there is a color scale with values ranging from 15 to 35. These numbers correspond to the PSNR values, which quantify the quality of the reconstructed images. Higher PSNR values typically indicate better image quality, with less distortion or noise. The heatmaps for algorithms (a) RPCA-SL and (b) RPCA-HQF have predominantly blue colors, indicating higher PSNR values throughout most of the images, which suggests higher image quality compared to the others.

In contrast, heatmaps (c) TNNR-APGL, (d) AccAltProj, and (e) RPCA-CUR show areas of yellow and red, particularly in the central square, indicating lower PSNR values and presumably worse reconstruction quality in these regions. Algorithm (e) RPCA-CUR displays the most extensive areas of yellow and red, which implies it has the lowest PSNR values across the most significant portion of the image, suggesting it performs the worst among the algorithms in terms of image recovery quality. It’s important to note that the central squares on the heatmaps are noticeably different in color compared to the surrounding areas, indicating that the algorithms perform differently in this region.

5. Conclusion

In conclusion, this paper presents a comprehensive analysis of Principal Component Analysis (PCA) and its robust counterpart, Robust Principal Component Analysis (RPCA), emphasizing their pivotal roles in data analysis and dimensionality reduction. While PCA is instrumental in transforming correlated variables into a set of uncorrelated principal components, its limitations, such as sensitivity to feature scale and outliers, are significantly addressed by RPCA. RPCA enhances data analysis robustness by decomposing data into low-rank and sparse matrices, effectively isolating outliers and capturing the underlying data structure. The introduction of a novel RPCA variant, Robust PCA Integrating Sparse and Low-rank Priors (RPCA-SL), which employs a proximal gradient algorithm, marks a significant advancement in anomaly detection and data decomposition. This work underscores the importance of integrating sparse and low-rank priors in overcoming the challenges presented by complex data sets, thereby enhancing the versatility and effectiveness of data analysis techniques across various applications.

Acknowledgements

This work was supported partly by the National Natural Science Foundation of China under Grant No. 62276137.

Conflicts of Interest

The authors declare no conflicts of interest regarding the publication of this paper.

References

[1] Jolliffe, I.T. (2002) Principal Component Analysis. 2nd Edition, Springer-Verlag, Berlin.
[2] Abdi, H. and Williams, L.J. (2010) Principal Component Analysis. Wiley Interdisciplinary Reviews: Computational Statistics, 2, 433-459.
https://doi.org/10.1002/wics.101
[3] Ringnér, M. (2008) What Is Principal Component Analysis? Nature Biotechnology, 26, 303-304.
https://doi.org/10.1038/nbt0308-303
[4] Jafarzadegan, M., Safi-Esfahani, F. and Beheshti, Z. (2019) Combining Hierarchical Clustering Approaches Using the PCA Method. Expert Systems with Applications, 137, 1-10.
https://doi.org/10.1016/j.eswa.2019.06.064
[5] Wang, Q., Gao, Q.X., Sun, G., et al. (2020) Double Robust Principal Component Analysis. Neurocomputing, 391, 119-128.
https://doi.org/10.1016/j.neucom.2020.01.097
[6] Candes, E.J., Li, X., Ma, Y. and Wright, J. (2011) Robust Principal Component Analysis? Journal of the ACM, 58, 1-37.
https://doi.org/10.1145/1970392.1970395
[7] Vaswani, N., Bouwmans, T., Javed, S. and Narayanamurthy, P. (2018) Robust Subspace Learning: Robust PCA Robust Subspace Tracking and Robust Subspace Recovery. IEEE Signal Processing Magazine, 35, 32-55.
https://doi.org/10.1109/MSP.2018.2826566
[8] Lyu, Q., Guo, M., Ma, M., et al. (2019) External Prior Learning and Internal Mean Sparse Coding for Image Denoising. Journal of Electronic Imaging, 28, 1-10.
https://doi.org/10.1117/1.JEI.28.3.033014
[9] Wang, Z., Li, X., So, H.C. and Liu, Z. (2022) Robust PCA via Non-Convex Half-Quadratic Regularization. Signal Processing, 204, Article 108816.
https://doi.org/10.1016/j.sigpro.2022.108816
[10] Bouwmans, T. and Zahzah, E.H. (2014) Robust PCA via Principal Component Pursuit: A Review for a Comparative Evaluation in Video Surveillance. Computer Vision and Image Understanding, 122, 22-34.
https://doi.org/10.1016/j.cviu.2013.11.009
[11] Hu, Z., Wang, Y., Su, R., Bian, X., Wei, H. and He, G. (2020) Moving Object Detection Based on Non-Convex RPCA with Segmentation Constraint. IEEE Access, 8, 41026-41036.
https://doi.org/10.1109/ACCESS.2020.2977273
[12] Shijila, B., Tom, A.J. and George, S.N. (2018) Moving Object Detection by Low Rank Approximation and l1-TV Regularization on RPCA Framework. Journal of Visual Communication and Image Representation, 56, 188-200.
https://doi.org/10.1016/j.jvcir.2018.09.009
[13] Liu, J. and Osher, S. (2019) Block Matching Local SVD Operator Based Sparsity and TV Regularization for Image Denoising. Journal of Scientific Computing, 78, 607-624.
https://doi.org/10.1007/s10915-018-0785-8
[14] Chen, Y., Xiao, X. and Zhou, Y. (2020) Low-Rank Quaternion Approximation for Color Image Processing. IEEE Transactions on Image Processing, 29, 1426-1439.
https://doi.org/10.1109/TIP.2019.2941319
[15] Chen, Y., Wang, S. and Zhou, Y. (2018) Tensor Nuclear Norm-Based Low-Rank Approximation with Total Variation Regularization. IEEE Journal of Selected Topics in Signal Processing, 12, 1364-1377.
https://doi.org/10.1109/JSTSP.2018.2873148
[16] Mousavi, A., Rezaee, M., Ayanzadeh, R. (2020) A Survey on Compressive Sensing: Classical Results and Recent Advancements. Journal of Mathematical Modeling, 8, 309-344.
[17] Feng, L. and Wang, J. (2022) Projected Robust PCA with Application to Smooth Image Recovery. Journal of Machine Learning Research, 23, 1-41.
https://dl.acm.org/doi/pdf/10.5555/3586589.3586838
[18] Hu, Y., Zhang, D., Ye, J., Li, X. and He, X. (2012) Fast and Accurate Matrix Completion via Truncated Nuclear Norm Regularization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35, 2117-2130.
https://doi.org/10.1109/TPAMI.2012.271
[19] Zhao, Q., Lin, Y., Wang, F. and Meng, D. (2024) Adaptive Weighting Function for Weighted Nuclear Norm Based Matrix/Tensor Completion. International Journal of Machine Learning and Cybernetics, 15, 697-718.
https://doi.org/10.1007/s13042-023-01935-1
[20] Cai, H.Q., Hamm, K., Huang, L., Li, J. and Wang, T. (2020) Rapid Robust Principal Component Analysis: CUR Accelerated Inexact Low Rank Estimation. IEEE Signal Processing Letters, 28, 116-120.
https://doi.org/10.1109/LSP.2020.3044130
[21] Antonello, N., Stella, L., Patrinos, P. and Van Waterschoot, T. (2018) Proximal Gradient Algorithms: Applications in Signal Processing. ar-Xiv:1803.01621
[22] Parikh, N. and Boyd, S. (2014) Proximal Algorithms. Foundations and Trends® in Optimization, 1, 127-239.
https://doi.org/10.1561/2400000003
[23] Cai, H., Cai, J.-F. and Wei, K. (2019) Accelerated Alternating Projections for Robust Principal Component Analysis. Journal of Machine Learning Research, 20, 685-717.
[24] Li, L., Huang, W., Gu, I.Y.H., et al. (2004) Statistical Modeling of Complex Backgrounds for Foreground Object Detection. IEEE Transactions on Image Processing, 13, 1459-1472.
https://doi.org/10.1109/TIP.2004.836169
[25] Yasuma, F., Mitsunaga, T., Iso, D. and Nayar. S.K. (2008) Generalized Assorted Pixel Camera: Post-Capture Control of Resolution, Dynamic Range and Spectrum. Technical Report, Department of Computer Science, Columbia University, New York.

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.