Low-Rank Multi-View Subspace Clustering Based on Sparse Regularization ()
1. Introduction
Clustering plays a significant role in machine learning and artificial intelligence (AI) for several reasons, acting as a foundational technique that underpins many of the processes and applications within these fields [1] [2] .
Multi-view Subspace Clustering (MVSC) is an advanced clustering technique that is particularly suited for handling data that naturally comes from multiple sources or “views.” This approach is based on the principle that different views of the data can provide complementary information that should be integrated when performing clustering. The goal of MVSC is to find a common subspace that best represents the underlying structure of the data across all views, thereby improving the quality and accuracy of the clustering results. As shown in [3] and [4] , by leveraging multiple views of the data, MVSC can achieve higher clustering accuracy than single-view clustering methods, especially when the views are complementary. The works [5] and [6] claim that the integration of multiple views can make the clustering process more robust to noise and redundancy within individual views, as the method can exploit the clean and informative parts of each view.
As a result, the Multi-view Subspace Clustering has been applied in various areas, such as image and video analysis [7] , bioinformatics [8] [9] , and social network analysis [10] . Among them, low-rank prior is a critical technique used to capture the global structure of data across multiple views while ensuring that the representation is compact and meaningful. The low-rank prior is based on the assumption that the data from all views lie on or near a low-dimensional subspace, and this inherent structure can be exploited to improve clustering performance. The key contributions in the domain of low-rank subspace-based methods encompass several notable works, such as Latent Multi-view Subspace Clustering (LMSC) [11] , Multimodal Sparse and Low-rank Subspace Clustering (MLRSSC) [12] , Flexible Multi-view Representation Learning for Subspace Clustering (FMR) [13] and Dual Shared-Specific Multi-view Subspace Clustering (DSS-MSC) [14] . These methods, grounded in the well-established framework of low-rank representation, have demonstrated competitive clustering performance in empirical studies.
Especially, the recent work [15] introduced an efficient and effective approach termed Facilitated Low-rank Multi-view Subspace Clustering (FLMSC), they factorize the view-specific representation matrix into two small factor matrices, i.e., an orthogonal dictionary and a latent representation, which can fully explore the underlying subspace structure of multiple views. However, this approach still suffers from one issue. They focus on the F-norm to measure the error between observation data and reconstruction data. However, as well known F-norm is sensitive to outliers, as it squares the values before summing them, which can proportionately increase the impact of larger differences.
To address the aforementioned drawback, this paper develops a Low-Rank Multi-view Subspace Clustering Based on Sparse Regularization approach, which is featured by sparse regularization. The main contributions and novelty of this work can be summarized below. 1) By employing sparse regularization, this paper presents a robust low-rank multi-view subspace clustering approach termed, LMVSC-Sparse. 2) This paper develops an Alternating Direction Method of Multipliers (ADMM) algorithm to solve proposed optimization problems. 3) Comprehensive experiments are conducted on benchmark data sets, which have shown the advantage of our approach in both efficiency and effectiveness.
The rest of the paper is organized as follows: Section 2 reviews the related model. In Section 3, we propose a novel model, low-rank multi-view subspace clustering based on sparse regularization (LMVSC-Sparse) and its corresponding optimization algorithm. Section 4 illustrates the experiment’s benchmark data sets.
2. Foundational Model
Denote
a collection of data samples, where d is the feature dimension and n is the number of data samples. Then, the traditional subspace clustering based on the self-expression can be modeled by (1):
. (1)
where
denotes the subspace representation at dictionary X and E denotes the error matrix. The corresponding optimization problem based on low rank prior can be given by (2)
. (2)
in which
means nuclear norm, computed by summing the singular values of
. The nuclear norm is widely used to measure the low rank property. (2) is also called low-rank representation (LRR) [16] . LRR has being successfully applied in dimensionality reduction [17] , noise reduction [18] , recommendation systems [19] , image processing [20] , and also clustering and classification [21] [22] .
After solving (2), the affinity matrix W can be obtained by
. (3)
Then one can obtain the final clustering results by conducting spectral clustering algorithm.
The (2) can be extended to a general problem (4):
. (4)
where function f means general regularization.
Now, considering the multi-view data samples, denoted by
, where
being the vth view. The LRR (2) can be extended to (5):
. (5)
Recently, the [15] furthermore extends (5) to following (6):
(6)
where
and
are two positive balancing parameters.
Compared to (5), the (6) factorize the view specific representation
into two small matrices
and
, and employed the property
when
has orthogonal columns, i.e.
.
3. Proposed Approach
In this section, we utilize sparse regularization instead of F norm regularization. in (6). Sparse regularization is often preferred over F-norm regularization in various machine learning and signal processing applications due to its unique properties and benefits, especially when dealing with high-dimensional data or models that incorporate many parameters. There exist two benefits for sparse regularization. 1) Sparse regularization can promote sparsity of solution. This means that they encourage the model to use fewer features or parameters by driving the coefficients of less important features to zero. This is particularly useful in feature selection and for models where interpretability is important, as it highlights which features are most relevant to the prediction. 2) Sparse regularization is effective at preventing overfitting, especially in high-dimensional settings where the number of features greatly exceeds the number of observations. By encouraging a model to concentrate on fewer variables, it reduces the model’s complexity and enhances its capacity to fit noise.
Thus, we replace the F-norm in (6) by L1-norm, and obtain the Low-Rank Multi-view Subspace Clustering Based on Sparse Regularization (LMVSC-Sparse):
(7)
Based on the framework of ADMM [23] , we propose an efficient optimization algorithm to solve the minimization problem abovementioned. First, the corresponding augmented Lagrange function is formulated as follows:
(8)
where
represent the Lagrange multipliers and μ represents the penalty parameter. Apparently, it is not easy to optimize all the variables at the same time. Therefore, we adopt an iterative optimization scheme to update the variables one by one. The corresponding procedure of updating steps as shown in what follows.
1) Update the variables
When fixing the other variables, we can solve the following minimization sub-problem w.r.t. variable
:
. (9)
This can be rewritten equivalently as:
. (10)
By introducing the auxiliary variable H, and omitting the superscript for simplicity, we have:
. (11)
This is a constrained optimization problem. We adopt half-quadratic splitting (HQS) [24] algorithm for its simplicity and fast convergence. Then it is solved by following minimization
. (12)
where η is a penalty parameter that forces
and H to approach the same fixed point. Subsequently, H and Z can be updated by following two sub-problems.
Sub-problem one:
. (13)
Sub-problem two:
. (14)
For first sub-problem, let
denote the shrinkage operator
and extend it to matrices by applying it to each element. It is easy to show that above sub-problem’s solution can be given by
. (15)
For second sub-problem, it is equal to following problem
. (16)
Its solution can be given by setting the derivative of above sub-problem to zero and obtain:
(17)
Then
. (18)
Let
, we have
. (19)
By Sherman-Morrison-Woodbury equation, according the size of X, Z can also be rewritten as
(20)
2) Update rule for the variables
When fixing the other variables, we can solve the following minimization sub-problem w.r.t. variable
:
(21)
This constrained problem could be further reduced into the form as follows:
. (22)
where
. Before solving (22), we need following lemma:
Lemma 1. For any matrices
, suppose the singular value decomposition (SVD) of matrix A is
, then we consider the following constrained problem:
. (23)
has closed form as follows:
. (24)
Based on the Lemma 1, by performing the SVD decomposition of matrix
as
, the solution for (22) can be achieved by:
. (25)
3) Update rule for the variables
When fixing the other variables, we obtain the problem (26):
(26)
Before solving (26), we need following lemma [25] :
Lemma 2. For a given matrix F and a positive parameter
, the optimal solution to the following problem
. (27)
is given by
. (28)
where
is the SVD decomposition of matrix F. Meanwhile,
is defined as follows:
. (29)
Based on Lemma 2, by setting
, the closed-formed solution for variable
is shown as follows:
. (30)
where,
represents the SVD decomposition of
, and
(31)
The final affinity matrix S could be obtained as follows:
. (32)
And
.
In a nutshell, the detailed optimization process for LMVSC-Sparse is summarized in Algorithm 1.
4. Experiments
The proposed algorithm is compared with four state-of-the-art cluster algorithms, namely, Facilitated Low-rank Multi-view Subspace Clustering (FLMSC) [15] , Scalable Multi-view Subspace Clustering with UnifiedAnchors (SMVSC) [26] , Large-Scale Multi-View Subspace Clustering (LMVSC) [3] , Graph-based Multi-view Clustering (GMC) [27] .
4.1. Data and Metrics
To verify performance, the BBC [28] is used for clustering. There are 2225 documents over 5 annotated topics in this data set. In the experiments, we use as ampled subset of original BBC consisting of 685 documents and four different views, with 4659, 4633, 4665 and 4684 in each view, respectively.
For the evaluation metrics, F-score, Normalized Mutual Information (NMI), Accuracy (ACC), and Adjusted Rand index (AR) are employed. The F-score, also known as the F1-score or F-measure, considers both the precision and the recall to compute the score. Normalized Mutual Information (NMI) is a measure used to evaluate the similarity between two clustering of a dataset. It’s a measure of the mutual dependence between two variables, in this case, the clustering assignments obtained from different algorithms or methods. Accuracy (ACC) is a common evaluation metric used in the context of clustering, particularly when the ground truth labels are available. It measures the proportion of data points that are correctly assigned to their true clusters.
It’s a popular metric due to its simplicity and ability to handle varying cluster sizes and shapes. AR is a measure that assesses the similarity between two clustering by considering all pairs of samples and counting pairs that are assigned to the same or different clusters in the predicted and true clustering. It then adjusts the raw Rand Index to account for the expected similarity between clustering due to chance.
4.2. Comparison with State-of-Arts
In the first experiments, 1% elements in BBC dataset is added noise, the noise level vary from 0 to 0.5. Then we compare the F-score, NMI, ACC, and AR for various algorithms. The results are shown in Figures 1-4.
Figure 1. The F-score values at different noise level.
Figure 2. The NMI values at different noise level.
Figure 3. The ACC values at different noise level.
Figure 4. The AR values at different noise level.
From Figure 1, one might conclude that LMVSC-Sparse is the most robust method in the presence of noise, maintaining a high F-score across all tested noise levels. In contrast, GMC is the least effective method in terms of F-score, regardless of the noise level. The other methods show varying degrees of decline in their F-scores as the noise level increases, suggesting that they are more sensitive to noise than LMVSC-Sparse. These observations could be useful for selecting a method for applications where data is expected to have a certain level of noise. LMVSC-Sparse might be preferable in environments where noise is unavoidable or difficult to control.
In Figure 2, the overall trend indicates that all methods suffer a decline in clustering performance as noise increases, but to varying degrees. LMVSC-Sparse appears to be the most robust against noise, maintaining a high NMI throughout. GMC is markedly affected by noise, with a significant decrease in NMI as the noise level rises. For applications where maintaining clustering quality in the presence of noise is important, LMVSC-Sparse would likely be the preferred choice based on this data. The other methods may still be considered, but their performance will depend on the acceptable threshold for NMI in the context of the specific application. Figure 3 and Figure 4 show the similar trend as Figure 1 and Figure 2.
Also, the mean running times of algorithms are summarized in Table 1. From this table, we can conclude that, LMVSC is the fastest algorithm across all noise levels, making it suitable for applications where running time is critical. LMVSC-Sparse is the slowest, which might be a trade-off for its robust performance in terms of F-score, NMI, ACC and AR, as indicated in the previous figures. Choosing the right algorithm would depend on the balance between accuracy (as measured by F-score, NMI, ACC and AR) and efficiency (as measured by running time), alongside the specific requirements of the application or task at hand.
In the second experiments, 1%, 5%, 10%, 20%, 50% elements in BBC dataset is added noise, the noise level is 0.1. Then we compare the F-score, NMI, ACC, and AR for various algorithms. The results are shown in Figures 5-8. From these figures, we can conclude that:
1) All algorithms show a decline in F-score with increasing sparsity levels, with LMVSC-Sparse and FLMSC being the least affected. GMC’s performance drops significantly and remains low across sparsity levels.
2) For NMI again, all algorithms show a decline as sparsity increases, with LMVSC-Sparse showing the least impact. GMC performs poorly at higher sparsity levels.
3) For ACC, we see a sharp decline for all methods as sparsity increases, with LMVSC-Sparse being the most robust but still affected.
4) For AC, all algorithms experience a drop sparsity increase. LMVSC-Sparse and FLMSC tend to have better robustness compared to others.
4.3. Parameter Sensitivity Analysis
Figure 9 shows a series of heatmaps that represent a parameter analysis for different evaluation metrics and rank bounds. Each heatmap corresponds to a combination of two parameters, which are
and
. The colors in each heatmap represent different values of the metric being evaluated, with darker or brighter colors typically indicating better performance. Here’s a breakdown of the analysis:
Table 1. The running times compare (Time unit: s).
Figure 5. The F-score values at different sparsity level.
Figure 6. The NMI values at different sparsity level.
Figure 7. The ACC values at different sparsity level.
Figure 8. The AR values at different sparsity level.
In summary, the performance of all algorithms degrades with increasing sparsity, which is expected as sparser data tends to have less information for the algorithms to leverage in the clustering process. LMVSC-Sparse seems to be the most robust across all evaluated metrics, maintaining higher values than the others as sparsity increases.
1) There is a consistent trend where the F-score, NMI, and AC seem to be more sensitive to the second parameter across rank bounds.
Figure 9. Parameter analysis, the first row means F-score, second row NMI, third row ACC, fourth row AC, the first column means rank bound is 50, second column 100, third column 200.
2) For all metrics, there are specific parameter combinations that yield high values, indicating optimal regions for each rank bound setting.
3) The first parameter appears to have a less significant impact on the F-score and NMI compared to ACC and AC.
4) The optimal regions for high values seem to shift and become less extensive as the rank bound increases, which may indicate that models with higher complexity (larger rank bounds) do not necessarily perform better and can be harder to tune.
In conclusion, these heatmaps can be used to identify the optimal parameter settings for each rank bound and metric. It is important to balance model complexity with the ability to tune the parameters effectively, as overly complex models may not yield better performance and can be more challenging to optimize.
5. Conclusion
This paper introduced an innovative Low-Rank Multi-view Subspace Clustering based on Sparse Regularization (LMVSC-Sparse) method. LMVSC-Sparse incorporated sparse regularization to mitigate the impact of outliers, thus enhancing the robustness of the clustering process. The developed ADMM algorithm efficiently solved the optimization problem, ensuring both effectiveness and efficiency, as evidenced by the experimental results on benchmark datasets. The performance of LMVSC-Sparse, particularly in noisy and sparse conditions, demonstrated its superiority over other state-of-the-art MVSC methods. This robustness is critical for practical applications in fields such as image analysis, bioinformatics, and social network analysis, where data often contain noise and come from diverse sources. The results of this work not only further the understanding of multi-view clustering dynamics but also open avenues for future research in optimizing clustering methods for complex, real-world datasets.
Acknowledgements
This work was supported partly by the National Natural Science Foundation of China under Grant No. 62276137.