1. Introduction
Classification is an important research field of machine learning, in which the support vector classifier is a common and important tool. There are many excellent designs of support vector classifiers. For example, the proximal support vector machine (PSVM) [1] has considered the relationship between samples within a class, which makes congener samples have aggregation effect. Subsequently, a great deal of vector classifiers were developed on the basis of PSVM classifier, see [2] [3] [4] [5] [6]. Moreover, deferent from PSVM with single model, the twin support vector machine (TSVM) [7] has designed two relatively small SVM-type models, which makes samples can be quickly classified by a pair of nonparallel separating planes. Inspired by TSVM, there also have developed a lot of twin-type vector classifiers, see [8] [9] [10] [11]. For more innovative designs of support vector classifiers, see [12] [13] [14] [15]. However, with the progress of modern science and technology, data to be processed is often expressed as matrices rather than vectors in application. When using vector classifiers on matrix data, the input matrices have to be reshaped into vectors, which will destroy the structure of rows or columns and thus lose important classification information unique to matrix data. Additionally, the reconstruction of matrix will inevitably lead to the superposition of rows or columns, resulting in a high-dimensional vector. It is not conducive to conduct large-scale testing in matrix classification tasks.
Realizing the above problems, many works have been proposed to extend the classification strategy of support vector machines to matrix space, which can get the eminent results in matrix classification problems. For example, L. Luo et al. have proposed low rank support matrix machine (LRSMM) [16] on the basis of the soft-margin support vector machine (Soft-SVM) [17], which has considered the correlation of rows and columns of matrix samples by using nuclear norm as the convex approximation of matrix rank. X. Gao et al. have proposed linear twin multiple rank support matrix machine (LTMRSMM) [18] on the basis of TSVM [7], which has considered the practical situation that matrix data are multiple rank. Q. Zheng et al. have proposed the robust support matrix machine [19] on the basis of the bilinear support vector machine [20], which has considered the spatial-temporal structural information of the input matrices and thus can eliminate non-standard noises of samples. For more cutting-edge works, see [21] [22] [23] [24] [25]. The designs of these SMM-type models provide us a new idea to solve matrix classification problems.
In this paper, to generalize the classification method of PSVM [1] to matrix space, we propose the proximal support matrix machine (PSMM) on the basis of PSVM [1] and LRSMM [16], and we give the novel model as follows:
where
are given matrix sample and category label for each i and
are two given penalty parameter. In design, our model has absorbed the advantages of PSVM and LRSMM, which has comprehensively considered the relationship between samples within a class and the structure of rows or columns of matrix samples. Meanwhile, since the novel model is essentially a convex problem, it encourages us to establish an alternating direction method of multipliers (ADMM) algorithm [26] to deal with the PSMM model. Finally, we mainly focus on image classification, which is a very significant problem of matrix classification. We conduct a series of comparative experiments on minst digital database [27], MIT face database, INRIA person database [28], the students face database [29] and Japan female facial expression database [30]. The final experimental results show that our method performs better than PSVM [1], TSVM [7], LRSMM [16] and LTMRSMM [18] in demanding image classification tasks.
The remainder of this paper is arranged as follows. In Section 2, we give the notations and lemma used in our paper. In Section 3, we recall the models of PSVM [1] and LRSMM [16] related to our novel model. In Section 4, the PSMM model is proposed and then an ADMM algorithm is established to solve it. In Section 5, some experimental analyses are presented to verify the performance of the PSMM classifier. In Section 6, our work is summarized.
2. Preliminaries
As a matter of convenience, we give the notations and lemma in this section, which will be used in this paper.
For a given matrix
with
and the singular values
, we define the singular value decomposition of the matrix
by
(1)
where
and
is a diagonal matrix with diagonal entries
. And, the Frobenius norm of the matrix
is denoted as
and the nuclear norm of
is denoted as
whose subdifferential is denoted by
. We reshape
into a vector
with a traditional and common matrix reconstruction approach:
and define the Frobenius norm
.
Lemma 1. (see [31], Theorem 2.1) Given a matrix
with
and the singular value decomposition as defined in (1). For
, the following problem
has unique closed-form solution, denoted by
, as follows:
where
is a diagonal matrix with
as diagonal entries and
for any
.
In this paper, we give a matrix data set
with
and
, and denote the label vector as
. For the convenience of calculation, we define a linear mapping
by
and denote
as the Hadamard product between two matrices. Additionally, we also denote the n-order identity matrix as
and define an all one vector
.
3. Related Work
The design inspiration of PSMM proposed in this paper mainly comes from PSVM [1] and LRSMM [16]. In order to show the rationality and advantages of the PSMM model in design, we review and analyze the models of PSVM [1] and LRSMM [16] in this section.
3.1. Proximal Support Vector Machine (PSVM)
Different from Soft-SVM model [17], the PSVM model is described in
space of
. Additionally, PSVM defines a pair of proximal hyperplanes
and constrains samples of the same class to be as close to the same proximal plane as possible. This is because the higher the fitting degree of proximal planes to the samples, the more obvious the boundary between the two types of samples, and then the better the classification effect of classification hyperplane. The strategy of PSVM can be presented as the following model, for details, see [1]:
(2)
where
for each i is the reconstructed vector sample,
is the decision variable and
is a given penalty parameter. A new input
has to be reshaped into a vector
, and then it can be assigned by the decision function:
, where
is the symbolic function.
Compared with Soft-SVM [17], PSVM [1] has considered the relationship between samples within a class, that samples of the same class are required to be close together, so that congener samples have aggregation effect. The advantage of this consideration is that the distribution trend of congener samples can be grasped, which makes PSVM have a good performance in prediction. Meanwhile, the PSVM model (2) is an unconstrained quadratic programming problem in essence, so that the cost of PSVM classifier is lower than that of Soft-SVM classifier. Thus, PSVM has more advantage than Soft-SVM in large database classification, for details, see [1].
3.2. Low Rank Support Matrix Machine (LRSMM)
Different from vector data, each matrix data has an identifying intrinsic structure, which reveals significant category information in matrix classification. As an important structural information, the correlation of rows or columns of matrix is closely related to the rank of the regression matrix
. Meanwhile, it is an effective measure for filtering the redundant information of samples to impose the low rank constraint on
in application. However, matrix rank minimization problem is a typical NP-hard problem. Realizing this difficulty, inspired by the idea that the nuclear norm is the best convex approximation of matrix rank, LRSMM was developed on the basis of Soft-SVM [17], which has achieved the low-rank target of samples by minimizing the nuclear norm of
. The LRSMM model is presented as follows, for details, see [16]:
(3)
where
are decision variables,
and
are two given penalty parameters. A new input
can be assigned by the decision function:
.
4. Proximal Support Matrix Machine (PSMM)
In this section, we propose a novel matrix classification method PSMM on the basis of PSVM [1] and LRSMM [16], which has generalized the classification method of PSVM to matrix space and also has inherited the design in LRSMM of constraining matrix samples to be low-rank. The formulation of our novel model is given as follows:
(4)
where
is the decision variable,
and
are two given penalty parameters. In order to take into consideration the relationship between congener samples, PSMM also defines a pair of proximal hyperplanes
. And, the formula
is used to calculate the Euclidean distance between the ith sample and the proximal hyperplane. In (4), we can achieve the target that samples of the same class should gather near the same proximal plane by minimizing the sum of these Euclidean distances. Additionally, to meet the need that matrix samples should be low-rank in application, the nuclear norm of the regression matrix
is introduced into our objective function in the inspiration of LRSMM [16].
The design of the PSMM model (4) has many advantages. On the one hand, since the nuclear norm of
is the only nonsmooth part in the PSMM model, PSMM algorithm presented below will be simpler than LRSMM algorithm [16] in framework design. And, in the following experiments, it can be found that the cost of PSMM classifier is much lower than that of LRSMM classifier. On the other hand, compared with PSVM [1], PSMM has taken into account the intrinsic structure of matrix data, so that the performance of PSMM classifier is better than that of PSVM classifier in the following matrix classification tasks. Thus, to a certain degree, the novel model (4) is a synthesis of PSVM model (2) and LRSMM model (3).
4.1. Model Solution
Since the PSMM model (4) is a convex problem, we can devise an ADMM-type algorithm [26] to solve the proposed novel model. Firstly, the PSMM model (4) can be equivalently written in the following form:
(5)
And, its corresponding augmented Lagrangian function is given as follows:
(6)
where
is the Lagrangian multiplier of the problem (5) and
is a given parameter of the penalty term. Consequently, we can take advantage of the augmented Lagrangian function (6) to establish the iterative framework of PSMM algorithm, which is based on the framework of ADMM. Given the state variables
at iteration k, the proposed framework for updating variable status is given as follows:
(7)
(8)
(9)
Note that we have to solve the two optimization problems (7) and (8) on each update. Thus, it is necessary to determine their explicit solutions, which can reduce the cost of our algorithm to a great extent.
Solving (7) can be transformed into solving the following model:
(10)
That is because the optimal partial variable of (10), uniformly denoted by
, is actually the optimal solution of the problem (7). Since the model (10) is essentially a smooth convex optimization problem only with equality constraints, we can solve this problem by its dual problem. The Lagrange function of (10) is given as follows:
where
is the Lagrange multiplier of (10). For each k, we define the following matrices:
Consequently, the Karush-Kuhn-Tucher (KKT) conditions for the model (10) are given as follows:
(11)
According to duality theory and (11), after simple calculation, we can get the dual problem of (10) as follows:
(12)
where
has the following expression:
and the optimal solution of (12) is denoted by
. Since the dual problem (12) is essentially an unconstrained quadratic programming problem, by the optimality theorem, we find that solving (12) is equivalent to solve the following system:
i.e.,
(13)
If the coefficient matrix of (13) is invertible, then the optimal solution
of (12) has the explicit exact expression as follows:
Therefore, by the KKT system (11), the optimal partial variable of (10)
has the following explicit exact expression:
which means that the iterative format (7) can be written in a more concise form:
Additionally, the problem (8) can be equivalently written as follows:
(14)
By Lemma 1, we can deduce the closed-form solution of (14), so that the iterative format (8) has more straightforward expression as follows:
4.2. Convergence
Since the model (5) is a convex problem only with equality constraints, then the convergence property of PSMM iterative framework can be guaranteed by [26] [32], so that we can get the following results:
Definition 1. For given parameters
and
, we say
is the proximal stationary (P-stationary) point of (5) if there exists a Lagrangian multiplier
such that
(15a)
(15b)
(15c)
(15d)
Theorem 2. Suppose
be the limit point of the sequence
generated by the PSMM iterative framework, then
is a P-stationary point and thus a locally optimal solution to the problem (5).
Remark. Since
minimizes
, then we have the following relation:
(16)
(17)
Similarly, since
minimizes
, then we also have the following relation:
(18)
1) It is not difficult to find that the iteration formula (9) always produces a gap
between the state variables
and
at iteration
, so that there always has a residual between
and the locally optimum
on each update. Meanwhile, from the perspective of feasibility, the locally optimal point of (5) must meet the feasible condition (15d). Nevertheless,
and
are generally unequal on each update. To this end, we denote
as the primal residual at iteration
. Aditionally, combining (9) and (16), we can obtain the following equality:
which means that there always exists a residual
for the condition (15a) on each iteration. We denote
as the dual residual at iteration
.
2) Conversely, by (17), it is easy to know that
always satisfies the condition (15b) at iteration
. Similarly, combining (9) and (18), we can obtain the following equality:
which means that
and
also always satisfy the condition (15c) at iteration
.
4.3. Stopping Criterion
Based on the above analysis, we know that the PSMM algorithm can obtain the locally optimal solution when the following conditions are satisfied:
Therefore, inspired by [26], we set a stopping criterion that the decision variables stop updating and output when
The tolerances
can be predetermined by the following criterion:
where
and
are absolute and relative tolerances, respectively, and their settings depend on the application, for details, see [26]. Furthermore, the specific procedure of PSMM algorithm is shown as follows:
5. Experiments
To verify the binary classification effect of our method, we conduct the experimental analysis of PSMM classifier in this section. There are five selected image data sets, including minst digital database [27], MIT face database, INRIA person database [28], the students face database [29] and Japan female facial expression database (JAFFE) [30]. And, we conduct comparative experiments on these data sets with PSVM [1], TSVM [7], LRSMM [16], LTMRSMM [18]. We summarize the main information of the selected data sets in Table 1. Some pictures of these databases are shown in Figures 1-5. All experiments are implemented in Matlab R2019a on a workstation with AMD A10-7300 Radeon R6 1.90 Hz, 10 Computer Cores 4C+6G, 4 GB RAM, and 64 bit Windows Server 2009 system.
Table 1. Summary of five image data sets.
Figure 1. Some pictures from minst digital database.
Figure 2. Some pictures from MIT face database.
Figure 3. Some pictures from the students face database.
5.1. Classification on Minst Digital Database
Minst digital database [27] comes from National Institute of Standards and Technology, which includes handwritten characters from zero to nine. Due to
Figure 4. Some pictures from the INRIA person database.
Figure 5. Some pictures from the JAFFE database.
the differences in writing habits of volunteers, different digits may have similar contours, resulting in wrong judgment. The purpose why we chose this database is to distinguish handwritten numerals with little difference in contour.
Through the observation of a large number of handwritten character samples, we found five digital pairs, and the two numbers of any digital pair are similar in morphology. They are (0, 6), (1, 7), (2, 7), (3, 8) and (5, 6), respectively. And, their comparisons are shown in Figure 6. We conducted binary classification experiments on the two numbers in the specific digital pair. The experimental results show that PSMM performs better than PSVM [1], TSVM [7], LRSMM [16] and LTMRSMM [18] in distinguishing handwritten digits with similar contours, for details, see Table 2.
Figure 6. Comparison pictures of handwritten digits with similar contours.
Table 2. Experimental results on minst digital database.
5.2. Classification on Portrait Database
Through the results on minst digital database, we found that the novel classifier has a good ability to distinguish two categories with little difference. It prompted us to further apply it to more complex portrait classification tasks, such as face recognition, gender judgment, pedestrian detection and emotion recognition, which require classifiers to have the ability to distinguish details. To test the performance of PSMM classifier in these complex tasks, we selected the following four portrait databases.
1) MIT face database is affiliated to Massachusetts Institute of Technology, with a total of 7087 samples, including 2706 face samples and 4381 no face samples. All images are gray images stored in BMP format, with width and height of 20. We chose it for face recognition experiments which is to judge whether there is a face in picture.
2) INRIA person database [28] was collected to detect whether there exist people in the image, including 2416 pedestrian pictures and 12,180 landscape pictures. All images are color images stored in JPG or PNG format, and we have normalized the samples into 61 × 128 gray level matrices to unify the specifications of samples.
3) The students face database [29] contains 400 photos of medical students in Stanford University, which consists of 200 males and 200 females. All images are gray images stored in JPG format, with the width and height of 200. In the process of gender judgment, hair style will be a disturbance, because girls may have short hair and boys may have long hair. Therefore, this database we chose will further test the ability of classifiers to distinguish facial details.
4) Japan female facial expression database [30] has 213 facial expression images, which are composed of 7 facial expression images of 10 women. The seven expressions are afraid, surprised, happy, sad, angry, disgusted, neutral, respectively. All images are gray images stored in TIFF format, with the width and height of 256. We conducted binary classification experiments between each emotion and the rest to observe the sensitivity of PSMM classifier to facial expression differences.
Through a large number of preliminary experiments, we have determined the appropriate training set size of each database. And, the stability and progressiveness of the novel algorithm have been verified by properly increasing the number of test samples and massive repeated experiments, for details, see Tables 3-6.
Table 3. Experimental results on MIT face database.
Table 4. Experimental results on INRIA person database.
Table 5. Experimental results on the students face database.
5.3. Parameters Selection
The penalty parameter C affects the fitting degree of the proximal plane to samples. If the value of C is too small, the prediction ability of proximal planes will be lost, resulting in under fitting. Conversely, C with too large value will produce over fitting phenomenon, which will lead to the poor generalization ability of
Table 6. Experimental results on the JAFFE database.
Figure 7. Effects of different C on the performance of PSMM classifier in the task of judging the existence of human face.
Figure 8. Effects of different C on the performance of PSMM classifier in the task of judging the existence of pedestrian.
Figure 9. Effects of different C on the performance of PSMM classifier in the task of judging the gender of students.
Figure 10. Effects of different C on the performance of PSMM classifier in the task of distinguishing afraid expression.
Figure 11. Effects of different C on the performance of PSMM classifier in the task of distinguishing surprised expression.
Figure 12. Effects of different C on the performance of PSMM classifier in the task of distinguishing happy expression.
Figure 13. Effects of different C on the performance of PSMM classifier in the task of distinguishing sad expression.
Figure 14. Effects of different C on the performance of PSMM classifier in the task of distinguishing angry expression.
Figure 15. Effects of different C on the performance of PSMM classifier in the task of distinguishing disgusted expression.
Figure 16. Effects of different C on the performance of PSMM classifier in the task of distinguishing neutral expression.
our model. Thus, predetermining the value of C is of vital importance for PSMM classifier. To observe the influence of different C on the classification accuracy of the novel model, we select C from {1 × 10−3, 2.5 × 10−3, 5 × 10−3, 1 × 10−2, 2.5 × 10−2, 5 × 10−2, ∙∙∙, 1 × 101, 2.5 × 101, 5 × 101} and fix the values of other parameters, i.e.,
. The results of different data sets are shown in Figures 7-16.
6. Conclusion
In this paper, a novel SMM-type method called PSMM is proposed for the matrix classification problems, which has absorbed the advantages of PSVM and LRSMM. In design, the novel method has considered both the relationship between samples within a class and the structure of rows or columns of matrix data, which makes PSMM have good properties to meet the challenges of complex image classification problems. Finally, to verify the performance of our design, we conduct a large number of comparative experiments. It can be seen from the experimental results that PSMM performs better than PSVM, TSVM, LRSMM, LTMRSMM in the demanding image classification tasks. Moreover, since we only considered the linearly binary classification situation and the selection of penalty parameter C in this paper, there still have some relevant topics worthy of in-depth study in the future. For example, how to extend the PSMM algorithm to multiple classification situation, how to introduce appropriate kernel function to create the nonlinear version of PSMM and how to select a stable parameter combination. We will take these topics as our future directions.
Acknowledgements
Sincerely thank Professor Yulan Liu for the support and guidance to this research, and gratefully acknowledge Guangdong University of Technology for providing a good learning platform.