Weighted Low-Rank Tensor Representation for Multi-View Subspace Clustering

Multi-view clustering has been deeply explored since the compatible and complementary information among views can be well captured. Recently, the low-rank tensor representation-based methods have effectively improved the clustering performance by exploring high-order correlations between multiple views. However, most of them often express the low-rank structure of the self-representative tensor by the sum of unfolded matrix nuclear norms, which may cause the loss of information in the tensor structure. In addition, the amount of effective information in all views is not consistent, and it is unreasonable to treat their contribution to clustering equally. To address the above issues, we propose a novel weighted low-rank tensor representation (WLRTR) method for multi-view subspace clustering, which encodes the low-rank structure of the representation tensor through Tucker decomposition and weights the core tensor to retain the main information of the views. Under the augmented Lagrangian method framework, an iterative algorithm is designed to solve the WLRTR method. Numerical studies on four real databases have proved that WLRTR is superior to eight state-of-the-art clustering methods.


INTRODUCTION
The advance of information technology has unleashed a multi-view feature deluge, which allows data to be described by multiple views. For example, an article can be expressed in multiple languages; an image can be characterized by colors, edges, and textures. Multi-view features not only contain compatible information, but also provide complementary information, which boost the performance of data analysis. Recently, [1] applied multi-view binary learning to obtain the supplementary information from multiple views. [2] proposed a kernelized multi-view subspace analysis method via self-weighted learning. Due to the lack of label, clustering using multiple views has become a popular research direction [3].
A large number of clustering methods have been developed in the past several decades. The most classic clustering method is the k-means method [4][5][6]. However, it cannot guarantee the accuracy of clustering since it is based on the distance of the original features and them are easily affected by outliers and noises. Many researchers have pointed out that the subspace clustering method can effectively overcome the above problem. As a promising technique, subspace clustering aims to find clusters within different subspaces by the assumption that each data point can be represented as a linear combination of the other samples [7]. The subspace clustering-basaed methods can be roughly divided into three types: matrix factorization methods [8][9][10][11], statistical methods [12] and spectral clustering methods [7,13,14]. The matrix factorization-based subspace clustering methods perform lowrank matrix factorization on the data matrix to achieve clustering, but they are only suitable for noise-free data matrices and thus loss of generalization. Although the statistical-based subspace clustering methods can clearly deal with the influence of outliers or noise, their clustering performance is also affected by the number of subspaces, which hinders their practical applications. At present, the spectral clustering-based subspace clustering methods are widely used because they can well deal with highdimensional data with noise and outliers. Among them, two representative examples includes sparse subspace clustering (SSC) [13] and low-rank representation (LRR) [7] by obtaining a sparse or low-rank linear representation of datasets, respectively. When encountering multi-view features, SSC and LRR can not well discover the high correlation among them. To overcome this limitation, Xia et al. [15] applied LRR for multi-view clustering to learn a low-rank transition probability matrix as the input of the standard Markov chain clustering method. Taking the different types of noise in samples into account, Najafi et al. [16] combined the low-rank approximation with error learning to eliminate noise and outliers. The work in [17] used low-rank and sparse constraints for multi-view clustering simultaneously. One common limitation of them is that the above methods only capture the pairwise correlation between different views. Considering the possible high-order correlation of multiple views, Zhang et al. [3] proposed a low-rank tensor constraintregularied multi-view subspace clustering method. The study in [18] was inspired by [3] to introduce Hyper-Laplacian constraint to preserve the geometric structure of the data. Compared with most matrix-based methods [15,17], the tensor-based multi-view clustering methods have achieved satisfactory results, which demonstrates that the high-order correlation of the data is indispensable. The above methods impose the low-rank constraint on the constructed self-representative tensor through the unfolding matrix nuclear norm. Unfortunately, this rank-sum tensor norm lacks a clear physical meaning for general tensor [19].
In this paper, we proposed the weighted low-rank tensor representation (WLRTR) method for multi-view subspace clustering. Similar to the above tensor-based methods [3,18], WLRTR still stacks the self-representation matrices of all views into a representation tensor, and then applies low-rank constraint on it to obtain the high-order correlation among multiple views. Different from them, we exploits the classic Tucker decomposition to encode the low-rank property, which decomposes the representation tensor into one core tensor and three factor matrices. Considering that the information contained in different views may be partially different, and the complementary information between views contributes differently to clustering, the proposed WLRTR treats the singular values differently to improve the capability. The main contributions of this paper are summarized as follows: (1) We propose a weighted low-rank tensor representation (WLRTR) method for multi-view subspace clustering, in which all representation matrices are stored as a representation tensor with two spatial and one view modes. (2) Tucker decomposition is used to calculate the core tensor for the representation tensor and the low-rank constraints are applied to capture high-order correlation among multiple views and remove redundant information. WLRTR assigns different weights on the singular values in the core tensor to differently treat singular values. (3) Based on the augmented Lagrangian multiplier method, we design an iterative algorithm to solve the proposed WLRTR model, and conduct experiments on four challenging databases to verify the superiority of WLRTR method over eight state-of-the-art single-view and multi-view clustering methods.
The remainder of this paper is organized as follows. Section 2 summarizes the notations, basic definitions and related content of subspace clustering involved in this paper. In Section 3, we introduce the proposed WLRTR model, and design an iterative algorithm to solve it. Extensive experiments and model analysis are reported in Section 4. The conclusion of this paper is summarized in Section 5.

RELATED WORKS
In this section, we aim to introduce the notations, basic definitions through this paper and the framework of subspace clustering methods.

Notations
For a third-order tensor, we represent it using bold calligraphy letter (e.g., X ). The matrices and vectors are represented by upper case letters (e.g., X) and lower case letters (e.g., x), respectively. The elements of tensor and matrix are defined as X ijk and x ij , respectively. The l 2,1 norm of matrix X is defined as S ∈ R n 1 ×n 2 ×n 3 is the core tensor and U i ∈ R ni×ni (i 1, 2, 3) is the orthogonal factor matrix. The Tucker decomposition will be exploited to depict the low-rank property of the representation tensor.

Subspace Clustering
Subspace clustering is an effective method for processing highdimensional data clustering. It divides the original feature space into several subspaces and then imposes constraints on each subspace to construct the similarity matrix. Suppose X [x 1 , x 2 , /, x n ] ∈ R d×n is a feature matrix with n samples, and d represents the dimension of one sample. The subspace clustering model based on LRR is expressed as follows: where ||Z|| * denotes the nuclear norm (sum of all singular values of Z). This model has achieved the promising clustering effect, because the self-representation matrix Z represents the correlation between samples, which is convenient to obtain the final similarity matrix C |Z|+|Z T | 2 . However, the above LRR method is only suitable for single-view clustering. For the with V views, the effective single-view clustering method is usually extended to multi-view clustering: The LRR-based multi-view method not only improves the accuracy of clustering, but also detects outliers from multiple angles [20]. However, with the increase of feature views, the above models will inevitably suffer from information loss when fusing high-dimensional data views. It is urgent to explore efficient clustering methods.

WEIGHTED LOW-RANK TENSOR REPRESENTATION MODEL
In this section, we first introduce an existing tensor-based multiview clustering method, and then propose a novel weighted lowrank tensor representation (WLRTR) method. Finally, the WLRTR is solved by the augmented Lagrangian multiplier (ALM) method.

Model Formulation
In order to make full use of the compatible and complementary information among multiple views, Zhang et al. [3] used LRR to perform tensor-based multi-view clustering. The main process is to stack the self-representation matrix of each view as a frontal slice of the third-order representation tensor which is imposed low-rank constraint. The whole model is formulated as follows where the tensor nuclear norm is directly extended from the matrix nuclear norm: ||Z|| * ξ m 1, ξ m > 0. However, this rank-sum tensor norm lacks a clear physical meaning for general tensor [19]. In addition, the meaningful information contained in each view is not completely equal, so it is unreasonable to use the same weight to penalize the singular values of Z (m) in the tensor nuclear norm. In order to overcome these limitations, we propose a novel weighted low-rank tensor representation (WLRTR) method, which uses Tucker decomposition to simplify the calculation of the tensor nuclear norm and assigns different weights to the core tensor to take advantage of the main information in different views. The proposed WLRTR is formulated as: where ω c/|σ(Z)| + ϵ, c and ϵ are constants. α and β are two nonnegative parameters. The WLRTR model consists of three parts: the first term obtains the core tensor through Tucker decomposition; the second term weights the core tensor to preserve the main feature information, and uses the l 1 norm to encode the low-rank structure of the self-representing tensor; since the errors are specific with respect to samples, the third term uses the l 2,1 norm to encourage columns sparse and eliminate noise and outliers.

Optimization of WLRTR
Correspondingly, the augmented Lagrangian function of constrained model in Eq. 5 is obtained by where Θ and Π are the Lagrange multipliers. ρ > 0 is the penalty parameter. Then, each variable is updated iteratively by fixing the other variables. The detailed iteration procedure is shown as follows: Update self-representation tensor Z: When other variables are fixed, Z can be updated by Frontiers in Physics | www.frontiersin.org January 2021 | Volume 8 | Article 618224 By setting the derivative of Eq. 7 with respect to Z to zero, we have Update auxiliary variable Y (v) : Update auxiliary variable Y (v) with fixed residual variables is equivalent to optimizing The closed-form of Y (v)* can be calculated by setting the derivative of Eq. 9 to zero Update core tensor S: By fixing other variables, the subproblem of updating S can be written as follows According to [21], the Eq.(11) can be rewritten as where The closed solution S * is as follows S * sign(O)max(|O| − ωα 2, 0).
Update error matrix E: Similar to the subproblems Z ,Y (v) and S, the subproblem E is expressed as: where F represents the matrix that vertically concatenates the matrix X (v) − X (v) Y (v) + (1/ρ)Θ v along the column. The j-th column of optimal solution E * can be obtained by Update Lagrangian multipliers Θ, Π and penalty parameter ρ: The Lagrangian multipliers Θ, Π and the penalty parameter ρ can be updated by where λ > 0 is to facilitate the convergence speed [22]. In order to increase ρ, we set β 1.5. ρ max is the max value of the penalty parameter ρ. The WLRTR algorithm is summarized in Algorithm 1.

EXPERIMENTAL RESULTS
In this section, we conduct experiments on four real databases and compare with eight state-of-the-art methods to verify the effectiveness of the proposed WLRTR. In addition, we reported a detailed analysis of the parameter selection and convergence performance of the proposed WLRTR method.

Experimental settings
(1) Datasets: We evaluate the performance of WLRTR on three categories of databases: news stories (BBC4view, BBCSport), face images (ORL), handwritten digits (UCI-3views). BBC4view contains 685 documents and BBCSport consists of 544 documents, which belong to 5 clusters. We use 4 and 2 features to construct multi-view data, respectively. ORL includes 400 face images with 40 clusters. We use 3 features for clustering on ORL database, i.e., 4096d (dimension, d) intensity, 3304d LBP, and 6750d Gabor. UCI-3views includes 2000 instance with 10 clusters. For UCI-3views database, we adopted the 240d Fourier coefficients, the 76d pixel averages and the 6d morphological features to construct 3 views. Some examples of ORL and UCI-3views are shown in Figure 1. Table 1 summarizes the statistic information of these four databases. (2) Compared methods: We compared WLRTR with eight state-of-the-art methods, including three single-view clustering methods and five multi-view clustering methods. Single-view clustering methods: SSC [13], LRR [7] and LSR [23], which use nuclear norm, l 1 norm and least squares regression to learn a self-representing matrix, respectively. Multi-view clustering methods: RMSC [15]: RMSC utilized the low-rank and sparse matrix decomposition to learn the shared transition probability matrix; LT-MSC [3]: LT-MSC is the first tensor-based multi-view clustering by the tensor nuclear norm constraint to learn a representation tensor; MLAN [24]: MLAN performed clustering and local structure learning using adaptive neighbors simultaneously; GMC [25]: GMC is a graph-based multi-view clustering method; AWP [26]: AWP is a multi-view clustering via adaptively weighted procrustes. SMSC [27]: SMSC used non-negative embedding and spectral embedding for multi-view clustering. (3) Evaluation metrics: We exploit six popular clustering metrics, including, accuracy (ACC), normalized mutual information (NMI), adjusted rank index (AR), F-score, Precision and Recall to comprehensively evaluate the clustering performance. The closer the values of all evaluation metrics are to 1, the better the clustering results are. We run 10 trials for each experiment and report its average performance.

Experimental Results
Tables 2-5 report the clustering performance of all comparison methods on the four databases. The best results are highlighted in bold and the second best results are underlined. From four tables, we can draw the following conclusions: Overall, the proposed WLRTR method has achieved better or comparable clustering results on all databases over all competing methods. Especially on the BBC4view database, WLRTR method outperforms all competing methods on six metrics. As for the ACC metric, the proposed WLRTR is higher than all methods on all datasets. In particular, WLRTR method shows better results than single-view clustering methods: SSC, LRR, LSR in most cases. This is because the multi-view clustering methods fully capture the complementary information among multiple views. The above conclusions have verified the effectiveness of the proposed WLRTR method.
On the ORL database, the proposed WLRTR and LT-MSC methods have the best clustering effect among all the comparison methods. This shows that the tensor-based clustering methods can well explore the high-order correlation of multi-view features. Compared with LT-MSC method, WLRTR has improved ACC,   AR and F-score metrics by 5.1, 2.6, and 1.3%, respectively. The main reason is that WLRTR takes the different contribution of each view to the construction of the affinity matrix into consideration, and assigns weights to it to retain important information. On the other hand, WLRTR uses Tucker decomposition technology to impose low-rank constraints on the core tensor instead of directly calculating the tensor nuclear norm based on the matrix. On UCI-3views and BBCSport databases, although MLAN is better than WLRTR in some metrics, the clustering results of MLAN on different databases are unstable, and even lower than all single-view clustering methods on ORL database. In addition, we can find that the results of the recently proposed GMC method on the four databases cannot achieve satisfactory performance. The reason may be the graph-based clustering methods: MLAN and GMC usually use the original features to construct the affinity matrix, however, the original features usually are destroyed by noise and outliers.

Model Analysis
In this section, we conduct the parameter selection and convergence analysis of the proposed WLRTR method.

Parameter Selection
We perform experiments on ORL and BBCSport databases to investigate the influence of three parameters, i.e., α, β and c for the proposed WLRTR method, where parameters α and β are empirically selected from [0.001, 0.1] and c is selected from [0.01, 0.2]. The influence of α and β on ACC is shown in the first column of Figure 2. After fixing c, we find that when α is set to a larger value, WLRTR can achieve the best result, which shows that noise has a greater impact on clustering. Similarly, we fix the parameters α and λ to analyze the influence of c on ACC. As shown in the second column of Figure 2, for the ORL database, when c 0.2, ACC reaches the maximum value. For the BBCSport database, the value of ACC has a peak at c 0.04, and when c  becomes larger, the value of ACC decreases. The results show that c is very important for the weight distribution of the core tensor.

Numerical Convergence
This subsection investigates the numerical convergence of the proposed WLRTR method. Figure 3 shows the iterative error curves on the ORL and BBCSport databases. The iterative error is calculated by ∞ . One can be seen that the error curves gradually decrease with the increase of iterations and the error are close to 0 after 25 iterations. In addition, the error curves stabilized only after a few fluctuations. The above conclusions show that the proposed WLRTR method has strong numerical convergence and the similar conclusions can be obtained on the BBC4view and UCI-3views databases.

CONCLUSION AND FUTURE WORK
In this paper, we developed a novel clustering method called weighted low-rank tensor representation (WLRTR) for multi-view subspace clustering. The main advantage of WLRTR is to encode the low-rank structure of the tensor through Tucker decomposition and l 1 norm, avoiding the error in calculating the tensor nuclear norm with the sum of nuclear norms of the unfolded matrices, and assigning different weights to the core tensor to exploit the main feature information of