Two-Stage Procrustes Rotation with Sparse Target Matrix and Least Squares Criterion with Regularization and Generalized Weighting

Abstract

Share and Cite:

Yamashita, N. (2023) Two-Stage Procrustes Rotation with Sparse Target Matrix and Least Squares Criterion with Regularization and Generalized Weighting. Open Journal of Statistics, 13, 264-284. doi: 10.4236/ojs.2023.132014.

1. Introduction

Exploratory factor analysis is a method of multivariate data analysis and a popular Psychometric tool that reconstructs the observed correlation structure by a reduced number of common factors behind the multiple variables [1] [2] . In factor analysis, factor rotation is widely used for transforming the initial loading matrix to a simple and interpretable matrix, which facilitates to denominate the common factors by abstracting the names of variables. Namely, factor rotation is a post-hoc transformation of a factor loading matrix to simplifying the loading matrix. Consider the situation where we have an N (objects) × P (variables) data matrix denoted as X, and wish to explain the variation of P variables by r (< P) latent variables called common factors. Hereafter, X is assumed to be column-wise centered. By using the sample covariance matrix S = N−1X'X, factor analysis is commonly formulated by maximizing the following log-likelihood function

$l\left(\Lambda ,\Psi \right)=\text{tr}S{\left(\Lambda {\Lambda }^{\prime }+\Psi \right)}^{-1}-\text{log}|S{\left(\Lambda {\Lambda }^{\prime }+\Psi \right)}^{-1}|$ (1)

over Λ and Ψ subject to some constraints, where Λ (P × r) and Ψ (P × P) denote the loading matrix and covariance matrix of unique factors, respectively [1] [3] . A similar but different formulation of factor analysis was recently presented [4] [5] , and the following indeterminacy on which the proposer procedure is based on also holds in another formulation.

Factor analysis is said to have rotational indeterminacy with respect to the nonsingular transformation, and it is shown as follows by using an arbitrary r × r nonsingular square matrix U that satisfies diag(U'U) = Ir;

$\Lambda U{\prime }^{-1}\left(U\prime U\right){U}^{-\text{1}}{\Lambda }^{\prime }={\Lambda }^{\ast }\Phi {\Lambda }^{\ast }\prime$ . (2)

We thus have l(Λ, Ψ) = l(Λ*, Ψ), where Λ* = ΛU'−1 and Φ = U'U expresses the correlation matrix of transformed factors. Factor rotation aims to transform the initial loading matrix Λ to the one with a simple structure [6] by right multiplying Λ by U'−1.

Various methods of factor rotation for obtaining U that simplifies Λ have been developed for decades [7] . Among them, the paper focuses on Procrustes rotation that rotates Λ to minimize the difference between ΛU'−1 and a predetermined target matrix T that has a simple structure. Procrustes rotation is a family of rotational procedures that rotates a loading matrix to approximate a specified target matrix, and it can consider various types of simple structures as a form of T, while the other rotational methods assume specific types of simplicity. For example, Varimax rotation [8] maximizes the sum of within-column variances of the squared loading matrix as a measure of simplicity. The discrepancy between ΛU'−1 and T is often defined in the least square sense, and the Procrustes rotation of Λ toward T minimizes

${f}_{PR}\left(U\right)={‖T-\Lambda U{\prime }^{-1}‖}^{\text{2}}$ (3)

over U [9] [10] . Zhang et al. [11] discussed another benefit of Procrustes rotation; users can incorporate their prior knowledge on factor structure and correlation and thus Procrustes rotation can be viewed as an intermediate between exploratory and confirmatory factor analysis.

The minimization of (3) requires a fixed target matrix T, which manifests the prespecified simple structure to be attained by the rotated Λ. However, in many practical cases, a suitable T is difficult to specify. One can set T by referring to the previous research or prior knowledge on the factor structure, but they are not available in many cases. Promax rotation [12] , one of the most common rotational procedures, overcomes the issue of specifying T by referring to the Varimax-rotated loading matrix. Namely, Promax fixes T = {tjl} as

${t}_{jl}=\left\{{\lambda }_{jl}^{\left(V\right)}×{|{\lambda }_{jl}^{\left(V\right)}|}^{\alpha -\text{1}}\right\}$ (4)

with α being a positive integer, and ${\lambda }_{jl}^{\left(V\right)}$ is the (j, l)-th element of the Varimax-rotated Λ. In other words, in Promax rotation, the simple structure obtained by Varimax rotation is enhanced by exponentiation by α, keeping the signs of T’s elements, and Λ is rotated to approximate it. Table 1 shows an example of Promax. An artificial factor loading matrix in Panel B is obtained as a random rotation of the true simple structure in Panel A. It is rotated by Promax, and the target matrix used in the Promax and its rotational results are shown in Panel C and D, respectively. The example indicates that Λ is successfully simplified by approximating the target matrix.

The primary issue on Promax is that the suitability of a target matrix totally depends on the attained simple structure by Varimax. Thus, Λ can not be simplified by Promax when Varimax fails to simplify Λ, as is often the case with practical cases. Further, as shown in Panel C in Table 1, the target matrix is often

Table 1. Target matrices and rotational results of Promax and Simplimax (k = 7, 15, 11) rotation applied to an artificial loading matrix. Blank cells show elements equaling to zero.

filled with non-zero entries. Referring to Thurstone’s simple structure [6] [13] , an ideal simple structure should possess several zero elements to emphasize the correspondence of factors and variables. Thus, T should be a sparse matrix. In general, a target matrix specified by (4) is not sparse; it contains the elements equaling to zero only when Varimax-rotated Λ contains exact zeros, which is quite rare in many cases.

A possible strategy to solve the above problem of the target matrix is not to fix the target matrix, but to treat it as an unknown parameter and sequentially update the optimal target matrix and rotation matrix. Kiers [14] proposed a rotation procedure called Simplimax to estimate a sparse target matrix that is considered suitable for simplifying Λ. It is formulated by the minimization of

${f}_{SIMP}\left(U,T\right)={‖T-\Lambda U{\prime }^{-1}‖}^{\text{2}}$ (5)

over U and T subject to diag(U'U) = Ir. In Simplimax, to solve the trivial solution T = ΛU'−1 and to obtain a sparse T the following constraint is imposed on T;

$\text{card}\left(T\right)=Pr-k$ (6)

where card(T) denotes the cardinality of T and k is a positive integer satisfying 0 < k < Pr. In order words, T is constrained to have at least k zero elements, and thus Λ is rotated to a sparse target matrix. Further, in Simplimax, 1) the estimation of U that minimizes fSIMP(U, T) with T kept fixed, and 2) the estimation of T that minimizes the same function with U kept fixed, are repeated until the decrease in the value of the objective function converges. Recently, Yamashita and Adachi [13] proposed a procedure for estimating the target matrix by modifying Thurstone’s simple structure by improving Simplimax.

Table 1 also shows the target matrix in Simplimax with k = 11 (Panel E) and its rotational result (Panel F). It can be seen that T has 11 zero elements shown as blank cells because of the constraint.

The well-known problem with Simplimax is its computational inefficiency: it is empirically known that the minimization of fSIMP(U, T) under the constraint (6) leads to a considerable number of local minima [15] [16] . Therefore, in the application of Simplimax, we need to start Simplimax’s algorithm from many initial values to avoid local minima, which increases the computational load. For example, in the example shown above, the optimization algorithm was started from 100 random initial values, and the final solution was the one that minimizes fSIMP(U, T) the most among the 100 solutions obtained. Among the 100 solutions, none of them were equivalent to the final solution. In other words, the percentage of local minima in this application was 99%. Note that the equivalence of the two different solutions is defined in the section Simulation Study.

Further, Simplimax is said to be sensitive to specification of the cardinality of target matrix, the hyperparameter k. In the example in Table 1, the correct specification for k is 11, since the true simple structure has seven non-zero elements, and the loading matrix is simplified under the setting. However, when the cardinality is misspecified, the attained simplicity is poor, in that Simplimax with k = 15 and k = 7 could not recover the true simple structure. Their rotational results are shown in the Panels G and H in Table 1, respectively. In practical cases, the true simple structure and its true cardinality is often unknown. Thus, misspecification often occurs leading to poor performance in simplification of the loading matrix.

The paper further addresses the issue on the use of target matrix that is common for the existing Procrustes rotation procedures. From the perspective of facilitating interpretation of factor loadings, it may not always be appropriate to minimize the difference between a target matrix and the rotated factor loading matrix by minimizing Equation (3). To illustrate this, given the target matrix T and the initial loading matrix Λ shown in Figure 1, consider estimating U that minimizes the objective function ||T − ΛU'−1||2. The initial loading matrix has the (2, 1)-th element equaling 0.5 and 0.0 as the (3, 3)-th element. U should be estimated so that the (2, 1)-th element in Λ is close enough to the corresponding element in T, equaling to 0.5, while the (3, 3)-th element of Λ is close enough to 0.0 in T. In other words, the minimization considered here involves two types of approximations: approximations that bring non-zeros closer to zero, and approximations that bring non-zeros closer to non-zeros. The former approximation is noted as approximation A, and the latter is approximation B in the figure. Considering the ease of interpretation of the rotated factors, it is generally desirable that the rotated Λ be closer to sparse. Therefore, with regard to the two types of approximation, more emphasis should be placed on the approximation B, that is the approximation that makes the non-zeros in Λ closer to zeros in T. However, in the objective function of (3), the two types of approximations are

Figure 1. Examples of the ordinal least squares (upper panel) and the least squares with generalized weighting (lower panel) to an artificial rotation problem.

treated as equivalent. In the above example, whether we match (2, 1)-th elements in Λ to the corresponding elements in T, or (3, 3)-th element in Λ to the corresponding element in T, both are evaluated equivalently in (3) as a reduction in the function value of (0.5 − 0.1)2 = 0.16 and (0.0 − 0.4)2 = 0.16, respectively.

The problems of Procrustes rotation, including Promax and Simplimax, described so far can be summarized into three points: first, it is difficult to estimate a sparse target matrix with computational efficiency; second, Simplimax works well only when the cardinality of the target matrix is correctly specified; third, the ordinal least squares criterion does not emphasize the approximation of non-zero elements to zero elements in the target.

The article aims to propose a new factor rotation method to improve the Procrustes rotation. It is called two-stage Procrustes rotation, which solves the above three problems. The proposed method consists of the following two stages. In the first stage, a sparse target matrix is estimated by minimizing the objective function of (3) with a regularization of T with respect to U and T, without using cardinality constraints such as Simplimax. In this study, the Lasso penalty is used as a regularization term; the first stage minimizes

${f}_{STG1}\left(U,T\right)={‖T-\Lambda U{\prime }^{-1}‖}^{\text{2}}+\lambda {\sum }_{j,l}|{t}_{jl}|$ (7)

where λ > 0 is a tuning parameter that controls the strength of the regularization [17] . The regularization serves to shrink some elements towards zero, and therefore the estimated T is a sparse matrix. The minimization of (7) allows the sparse target matrix to be obtained more efficiently with lesser computational cost than Simplimax. In the subsequent stage, using the T estimated in the first stage, we estimate the rotation matrix V that minimizes

${f}_{STG2}\left(V\right)={‖\left(T-\Lambda U{\prime }^{-\text{1}}V{\prime }^{-\text{1}}\right)\cdot W‖}^{\text{2}}$ (8)

subject to the constraint that diag(V'V) = Ir, and rotate Λ as ΛU'−1V'−1. Here, W denotes a pre-specified weight matrix with the dimension of p × r, and $\cdot$ is a Hadamard product. The element-wise weights by W allow to control which elements in T should be emphasized in the approximation, as exemplified in the lower panel of Figure 1. The approximation B is emphasized by the weight matrix W, while the approximation A does not contribute to the reduction of the function value because the corresponding weight in W is small. The elements of W are required to be non-negative and one can set the weight matrix as

$W=\left\{{w}_{jl}\right\}=\left\{\text{exp}{\left(|{t}_{jl}|\right)}^{m}\right\}$ (9)

where m is a negative integer, and thus approximating the nonzero elements closer to the zero elements are emphasized in the rotation of Λ. The weight function attains its maximum 1 when |tjl| = 0 and wjl → 0 when |tjl| → ∞. The function sets the larger weight when |tjl| = 0 and the smaller weight when |tjl| is significantly large. Therefore, the approximation type B is emphasized, as shown in Figure 1. This is expected to lead to further simplification of Λ than the existing rotational procedures. We set m = −5 in the simulation study and the real data examples.

The novelty of the proposed method is that it consists of two stages, where the first stage uses penalized estimation in target specification and the second stage minimizes weighted least squares. To the best of the author’s knowledge, any similar procedure has not been proposed in the context of factor rotation. Further, the proposed method can be seen as an integration of two distinct approaches for obtaining simple structure in factor analysis: sparse estimation in factor analysis [18] and factor rotation.

The remainder of the paper is organized as follows. The next section formally introduces the proposed method and derives the optimization algorithm for both stages. Two simulation studies are reported in the third section, where the proposed procedure and other existing ones are applied to artificial factor loading matrices. The fourth section illustrates the proposed procedure using the three real data problems. The fifth section summarizes the previous sections and concludes the paper.

2. Proposed Method

The aim of the proposed method is to estimate the target matrix that minimizes fSTG1(U, T) in (7), and then obtain the rotation matrix minimizing fSTG2(V), and its objective is to further simplify a loading matrix superior to other existing rotational methods. The section formally defines the two-stage Procrustes rotation and proposes the optimization algorithm for each of the two stages.

2.1. Stage 1: Estimation of Sparse Target Matrix by Least Squares with Regularization

In the first stage, fSTG1(U, T) defined in (7) is minimized subject to diag(U'U) = Ir. It should be noted that no constraint is imposed on T, unlike Simplimax rotation, where the cardinality constraint is imposed on the target matrix. However, a sparse target matrix is obtained because of the regularization term in (7). No closed-form solution is available in the minimization of (7), and therefore it is minimized by the following iterative algorithm:

Step 1. Randomly Initialize U and T.

Step 2. Minimize fSTG1(U, T) over U with fixed T and update the current U by the minimizer.

Step 3. Minimize fSTG1(U, T) over T with fixed U and update the current T by the minimizer.

Step 4. Finish if the decrement of fSTG1(U, T) is converged, otherwise go to Step 2.

The iterative algorithm is called alternating least squares, in that it alternately minimizes a least squares function, and it guarantees that the function value monotonically decreases at Step 2 and 3.

First, consider minimizing fSTG1(U, T) over T with fixed U. Jennrich [19] proposed a general algorithm that minimizes a least squares criterion over a non-singular matrix under the constraint that it has unit-column length, which is called Gradient Projection (GP) algorithm. GP algorithm is applicable or the problem considered in Step 2, and the U minimizing fSTG1(U, T) subject to diag(U'U) = Ir with given T is obtained.

Next, in Step 3, fSTG1(U, T) is minimized over T with given U. By using the subdifferential of fSTG1(U, T), the (j, l)-th element of the optimal T noted as tjl is obtained by

${t}_{jl}=\text{sign}\left({\left[\Lambda U{\prime }^{-1}\right]}_{jl}\right){\left(|{\left[\Lambda U{\prime }^{-1}\right]}_{jl}|-\frac{\lambda }{2}\right)}_{+}$ (10)

where [ΛU'−1]jl denotes the (j, l)-th element of ΛU'−1 and

${\left(a\right)}_{+}=\left\{\begin{array}{l}a\left(a\ge 0\right)\\ 0\left(a<0\right)\end{array}$ (11)

$a\in ℝ$ [17] . After the convergence of the above algorithm, Λ is rotated by U and a sparse target matrix T is obtained.

2.2. Stage 2: Estimation of Rotation Matrix by Least Squares with Generalized Weighting

In the second stage, the rotated matrix ΛU'−1 is further rotated toward T. As mentioned above, the non-zero elements in Λ should be approached to the zeros in T with greater importance than the approximation of the non-zeros in T. This is accomplished by the weight matrix W in ${f}_{STG2}\left(V\right)={‖\left(T-\Lambda U{\prime }^{-\text{1}}V{\prime }^{-\text{1}}\right)\cdot W‖}^{\text{2}}$ we newly introduced, and we call it least squares criterion with generalized weighting. In the minimization (8), the two types approximation noted in the section Introduction are asymmetrically treated; the approximation A that the non-zero elements in Λ is closer to the zero element in T is emphasized. A similar criterion is considered in Gower & Dijksterhuis [10] as a generalized form of weighting in the Procrustes problem.

fSTG2(V) is minimized over V subject to diag(V'V) = Ir. Similar to Step 2 in the first stage, it can be accomplished by the GP algorithm. After the convergence in the second stage, ΛU'−1 is further rotated by V; that is, Λ is rotated by U'−1V'−1 = (UV)−1'. The whole rotation matrix is thus expressed as UV, which should have unit-column length, but it does not hold in general. The column length of UV is adjusted so as to satisfy diag((UV)'(UV)) = Ir by

$UV\to UVD$ (12)

with D = diag((UV)'(UV))−1/2. It should be noted that the simplicity of Λ(UV)−1' remains unchanged after the adjustment because it indicates that the columns of the rotated loading matrix are scaled by the diagonal matrix D.

Before applying the proposed procedure to a factor loading matrix, the value of λ which controls the strength of regularization on T has to be specified. Unlike the tuning parameter k in Simplimax, the true value for λ used in the first stage of the proposed procedure does not exist. For example, if the true simple structure that should be recovered by rotation has 10 zero elements, the value for k should be set at 10, but λ cannot be specified in the same way. Thus, it is required to try the several values for λ and compare the rotated loading matrices in terms of their simplicity. Fortunately, as empirically shown in the section Simulation Study, the rotational result of the proposed procedure is robust to the setting for λ. Further, the proposed procedure yields fewer local minima compared with the existing procedure; the exploratory search for λ within a specific range can be accomplished with a reasonable computational cost.

The simplicity attained by the proposed method can be evaluated by the following two indices. The first is the LS index proposed by Lorenzo-Seva [20] , which measures a matrix’s simplicity from 0 (lowest simplicity) to 1 (highest simplicity). The second index is the number of close-to-zero elements in the target matrix estimated in the first stage, which is defined as the number of elements in the target matrix satisfying the condition

$0<|{t}_{jl}|<\tau$ (13)

with τ = 0.1 hereafter. The number of close-to-zero elements is regarded as a measure of the unsuitability of the target matrix in simplifying the loading matrix by the following reason. Close-to-zero elements cannot be ignored in interpretation since they are not equal to zero but smaller than the threshold τ, thus confusing the interpretation of the factors. In other words, the close-to-zero elements worsen the clear contrast of the zeros and non-zeros with higher loadings in absolute in the target and the correspondence of the variables and factors, which makes the rotational results unsatisfactory. As demonstrated in the simulation studies in the section Simulation Study and Real Data Examples, in many cases, the value of two indices suddenly decreases as λ increases. Based on the observation, the article recommends determining the value of λ at the point where the LS index is reasonably high, and, at the same time, the number of close-to-zero elements is small. The real data examples specified the best λ in such a way, and we confirmed that the proposed procedure performs fairly well compared with the existing procedures.

3. Simulation Study

Two simulation studies were conducted to examine the performance of the proposed rotational procedure. In the first experiment, how the second stage in two-stage Procrustes rotation, the major novelty of the method, works in simplifying a loading matrix is evaluated. How the value of λ affects the resulting simplicity is also examined. In the second experiment, the two-stage Procrustes rotation was compared with the existing rotational procedures in terms of their performance in simplifying artificial loading matrices.

3.1. Experiment 1

The first experiment was designed as follows. Given a randomly generated true simple structure ΛT (P × r), an artificial factor loading matrix Λ was generated by

$\Lambda ={\Lambda }_{T}{P}^{-\text{1}}\prime$ (14)

where P (r × r) is a randomly generated non-singular matrix satisfying diag(P'P) = Ir, and Λ was thus generated as a result of random transformation of ΛT by U. The true simple structure ΛT was constructed by

${\Lambda }_{T}=\left[\begin{array}{ccc}1& 0& 0\\ -1& 0& 0\\ 1& 0& 0\\ -1& 0& 0\\ 1& -1& 0\\ -1& 1& 0\\ 0& -1& 0\\ 0& 1& 0\\ 0& -1& 0\\ 0& 1& 0\\ 0& -1& 1\\ 0& 1& -1\\ 0& 0& 1\\ 0& 0& -1\\ 0& 0& 1\\ 0& 0& -1\end{array}\right]\cdot E$ (15)

where E is a P × r matrix whose elements were randomly generated from the uniform distribution U(0.5, 1.0). (15) indicates that the true simple structure considered in the experiment has several cross loadings, which commonly occurs in the applications of factor analysis.

Next, 1) two-stage Procrustes rotation with only the first stage and 2) the one with both stages were applied to Λ. There, λ was varied from 0 to 1 in 0.05 increments to investigate how the λ values affect the rotational performance in simplifying Λ. The rotational algorithm was started from 100 random initial starts. The rotational results were evaluated with respect to the following two indices. The first index is the LS index, which is used as a measure of simplicity of a matrix. The second index is the number of elements less than 0.1 in absolute in rotated Λ, and it evaluates how many elements the rotated factor loading matrix has that are close to zero and thus ignorable in interpretation. We call the second index the number of ignorable elements.

In Figure 2(a), the average, 25, and 75 percentile values of the LS index for the rotated loading matrices are plotted against the values of λ. The figure shows that the proposed method with the second stage, where the least squares criterion with generalized weighting is minimized with the fixed target matrix, improves the attained simplicity, in that LS values are higher in the second stage. Importantly, as the value of λ increases, the resulting simplicity attained in the

Figure 2. (a) Average, 25, and 75 percentile values of LS index in the first and second stages of the proposed method. (b) Average, 25, and 75 percentile numbers of the elements less than 0.1 in absolute in the first and second stages of the proposed method. Average, 25, and 75 percentile numbers of the zero elements in the target matrix is also plotted.

first stage gradually decreases. However, the simplicity in the second stage remains high until λ reaches about 0.6. The same tendency is also observed in Figure 2(b), in which average, 25, and 75 percentiles of the numbers of ignorable elements in the rotated matrix are plotted against λs. The number of ignorable elements in the second stage is always larger than in the first stage. The above results suggest that the second stage effectively serves to simplify Λ. The attained simplicity measured by the above two indices indicate that the proposed method’s performance is considered robust when a smaller value of λ is selected. By contrast, Simplimax’s performance is said to be sensitive to whether the number of zero element in the target matrix is correctly specified, as demonstrated in the section Introduction and investigated in the next experiment. In Figure 2(b), the number of zeros in the target matrix is also displayed, and the target is said to become sparser as λ gets larger, indicating that the regularization used in the first stage servers to make the target sparse. The resulting sparseness is controlled by λ.

3.2. Experiment 2

The objective of the second experiment was to compare the performance in simplifying artificial loading matrices and show that the proposed procedure is superior to other existing Procrustes rotational procedures. The design of the second experiment was the same as the first experiment. An artificial loading matrix was generated based on (14) and (15), and it was rotated by the following four rotational procedures; two-stage Procrustes rotation with both stages, Simplimax, Promax, and Geomin [21] . Among the existing rotational procedures, Simplimax and Promax were selected as well-known Procrustes rotation procedures, and Geomin was also applied in that it is known to produce satisfactory results in many cases [22] . The value of λ was fixed at 0.4, referring to Figure 2(a) and Figure 2(b), in that the attained simplicity was stable, and the number of zero elements in the target matrix is constant within the range λ < 0.6. For the number of zero elements in the target matrix in Simplimax, the experiment considered the following three cases; equal to the true number of zero elements (k = 28), approximately 20% fewer than true k (k = 22), and approximately 20% more than true k (k = 34). The above cases were used to examine Simplimax’s performance when the number of zero elements in the target was misspecified. Promax was applied with Kaiser’s normalization [8] , and thus the length of the row vectors of Λ was adjusted by left-multiplying

$\text{diag}{\left(\Lambda \Lambda \prime \right)}^{-1/2}$ (16)

to Λ before rotation.

For two-stage Procrustes, Simplimax, and Geomin, the optimization algorithms were started from 100 different initial values as in the first experiment. In order to evaluate the computational efficiency of the two procedures, the rate of local minima was computed. For both Simplimax and two-stage Procrustes rotation, a final rotation matrix R is said to be a local minimum when it satisfies

${‖R-\stackrel{^}{R}P‖}^{\text{2}}>{10}^{-4}$ (17)

where $\stackrel{^}{R}$ the solution that minimizes the rotational criterion the most within 100 solutions, and P is a permutation matrix with a suitable dimension which minimizes the left side of (17).

Figure 3(a) and Figure 3(b) show the resulting simplicity attained by the proposed method, Simplimax with three different settings, Promax, Geomin. In both figures, the two-stage Procrustes rotation attains the highest simplicity in

Figure 3. Boxplots of (a) LS index of rotated loading matrices by two-stage Procrustes, Simplimax (k = 28, 22, 34), Promax, and Geomin, (b) the number of the elements less than 0.1 in absolute in rotated loading matrices, and (c) percent of local minima in two-stage Procrustes, Simplimax (k = 28, 22, 34), and Geomin.

terms of LS index and the number of ignorable elements. Importantly, the simplicity attained by the proposed method is higher than the one by Geomin; the averages (s.d.) of LS index were 0.835 (0.001) and 0.812 (0.007) in the proposed method and Geomin, respectively. The Simplimax with k = 28 is also comparable to the proposed method.

By the Simplimax with k = 22 and 34, the cases with the true cardinality of the target matrix were misspecified, and Λ was not simplified, indicating that Simplimax is sensitive to the misspecification of k and only works well when k is correctly set. Further, Figure 3(c) shows the frequency of local minima that occurred in the rotational procedures within 100 different random starts. Two-stage Procrustes rotation yields any local minimum in any cases except for some outliers, although Simplimax yields approximately 70% to 100% of local minima.

The second experiment showed that the proposed method is superior to the existing procedures as summarized in the following. First, the attained simplicity attained by the proposed method is better than Geomin and Promax, as shown in Figure 3(a) and Figure 3(b). The value of LS index attained by Simplimax is as high as the one by the proposed method, only when the cardinality of the true simple structure is correctly specified. Simplimax is thus sensitive to misspecification of cardinality, while the proposed procedure is relatively stable to the choice of λ as shown in the first experiment. Further, the proposed method yields fewer local minimum, while they frequently occur in Simplimax.

4. Real Data Examples

The section exemplifies that the proposed procedure works fairly well in simplifying the factor loading matrix by comparing the resulting simplicity obtained by the proposed procedure and other existing procedures.

4.1. Thurstone’s Box Problem

The first example is Thurstone’s box problem, where the 26 (variables) × 3 (factors) loading matrix is obtained by Cureton and Mulaik [23] , which is often used as a benchmark in evaluating the performance of rotational procedures. Two-stage Procrustes rotation was applied to the loading matrix in the following manner. First, to set the best λ for specifying an appropriate target matrix, two-stage Procrustes rotation was parallelly applied to the loading matrix for λ = 0.01, 0.02, …, 0.50. LS index as a measure of the simplicity of the rotated loading matrix and number of close-to-zero elements which is defined in the section Proposed Method were plotted against λs in Figure 4. Around λ = 0.26, the number of close-to-zero elements attains its minimum 0, which stands for the estimated target matrix in the first stage that only contains zeros and non-zeros that are sufficiently large in absolute. The target matrix is considered to be suitable for simplifying Λ. Therefore, the value of the LS index for the rotated loading matrix in the second stage is high, around λ = 0.26. The value of λ was thus set at 0.26 based on the above observation.

For comparison, Simplimax with k = 27, which was derived from the true simple structure, Promax with Kaiser’s normalization, and Geomin were also applied.

Table 2 shows the rotated loading matrices by the four rotational procedures and their simplicity measured by the LS index. The proposed procedure attained

Figure 4. Values of LS index of the rotated loading matrix and the number of close-to-zero elements of the estimated target matrix for λ = 0.01, 0.02, …, 0.50 in Thurstone’s box problem.

Table 2. Rotated loading matrices of Thurstone’s box dataset. Elements less than 0.1 in absolute are shaded, and those more than 0.3 in absolute are bolded.

the highest LS index, and recovered the true simple structure; the three rotated factors corresponds to x, y, and z, respectively. The rotational result is comparable to the one by Geomin, while Simplimax and Promax both failed to recover the true simple structure. The first example empirically shows that the proposed procedure satisfactorily works in simplifying the loading matrix, and its performance is comparable to Geomin.

4.2. Car Purchase Data

The second example is about the questionnaire on the purchase of a car. Ninety consumers responded to the fourteen questions listed in Table 3 about what is important to him/her when purchasing a car. They answered the questions using a five-point Likert scale from 1 (not at all) to 5 (very important). The full dataset is available online [24] . The parallel analysis of the sample correlation matrix suggested the five common factors, and therefore the five-factors solution was obtained by maximum likelihood method.

Table 3. Rotated loading matrices of car purchase dataset. Elements less than 0.1 in absolute are shaded, and those more than 0.3 in absolute are bolded.

In line with the first example, the initial factor loading matrix was rotated by the two-stage Procrustes rotation for λ = 0.01, 0.02, …, 0.50, and the values of LS index for the rotated matrices and the numbers of a close-to-zero elements in the target matrices are plotted in Figure 5. The figure suggests that λ = 0.31 is the best, in that the number of close-to-zero element is low with the higher values of LS index than in λ > 0.37. The other procedures, Simplimax, Promax with Kaiser’s normalization, and Geomin were also applied to the initial loading matrix. Note that, for Simplimax, the number of zero elements in the target matrix was set at 56, indicating that each row should have only one non-zero element because the true simple structure for the example is unknown.

Table 3 shows that the proposed procedure yielded the simplest loading matrix, while Simplimax and Promax totally failed to simplify the loading matrix. The rotated loadings by the proposed procedure attained the highest LS index value. Although the rotational result of the proposed procedure is comparable to the one by Geomin, a distinct feature of the proposed method is that the estimated target matrix would help interpret the rotated factors. Table 4 shows the target matrix, which contains exact zero elements because of the L1 penalty employed in the first stage. The averaged difference between the rotated loadings and the target matrix computed as

$\frac{\sqrt{{‖T-AU{\prime }^{-1}V{\prime }^{-1}‖}^{2}}}{Pr}$ (17)

was 0.054, which indicates that the difference of the two matrices is considerably small, and their interpretation is mutually consistent. In other words, one can refer to the estimated target matrix as an archetype of the simple structure extracted by the proposed procedure. The estimated target shown in Table 4 contains forty-six zero elements, which is approximately 65.7% of the all elements, and they emphasize the correspondence between the variables in row and the factors in columns. The first factor positively and highly loads on maintenance, resale value, and price, which can be interpreted as cost-oriented motivation. The second factor expresses the contrast between color and exterior looks. The third factor positively loads on fuel efficiency and after sales service, which expresses the cost after purchase. The fourth and fifth factors are interpreted as reviews by other customers and functionality, respectively. A similar interpretation is also possible in the Geomin rotated loading matrix, but it is much easier by

Figure 5. Values of LS index of the rotated loading matrix and the number of close-to-zero elements of the estimated target matrix for λ = 0.01, 0.02, …, 0.50 in car purchase data.

Table 4. The estimated target matrix in car purchase data. The blank cells show the elements that is equal to zero.

the proposed method in that its resulting simplicity is better than the one by Geomin, and it yields the target matrix that helps the interpretation.

5. Conclusions

The article dealt with the problem of Procrustes rotation, used as Promax and Simplimax, that the approximation of non-zero elements in the initial loading matrix to zero elements in the target matrix is prioritized over the other type of approximation. The problem results in an unsatisfactory result of rotation even if the target matrix is estimated to be sparse. In addition, Simplimax’s computational inefficiency in estimating the optimal target matrix under the cardinality constraint is also considered. The article proposed the two-stage Procrustes rotation to modify the existing Procrustes rotation procedures, consisting of the following two stages. The first stage aims to estimate a target matrix under the constraint that it possesses several elements equaling to zero. The novelty of the first stage is that L1 regularization is used in the optimization. Therefore, the cardinality constraint as used in Simplimax, which is a cause of severe local minima, is unnecessary. Therefore, the first stage enables obtaining a sparse target matrix with fewer computational costs than Simplimax. The second stage further rotates the loading matrix rotated in the first stage given the estimated sparse target matrix by minimizing the least squares criterion with generalized weighting. The criterion newly introduced in the article emphasizes the aforementioned type of approximation, which should be prioritized in simplifying the loading matrix.

The first simulation study revealed that the second stage boosts the attained simplicity. In the second simulation study, the resulting simplicity and the computational efficiency are superior to the one attained by the existing procedures, including Simplimax and Geomin, and the latter is known to works well in many situations. The two real data examples showed the similar results; that proposed procedure is better than other procedures in terms of simplicity, and it provides useful insights on the factor structure that other procedures do not provide.

Recent studies have revealed that rotational procedure is useful in other multivariate analysis techniques as well as factor analysis, such as correspondence analysis [25] [26] , canonical correlation analysis [27] [28] [29] , and principal component analysis [2] , for example. The proposed procedure would be a help for the potential users of those techniques in facilitating interpretation.

There still remain topics to be discussed in future studies. First, a more detailed procedure that determines the optimal λ, the tuning parameter in the first stage, is desired. The setting for λ would be a troublesome step for the users, which is a potential limitation of the proposed method. Even if the performance proposed procedure is relatively robust to the choice of λ compared with k in Simplimax, as shown in the section Simulation Study, the incorrect setting for λ could slightly degenerate the rotational performance. Second, the penalized estimation in the first stage would not work properly when the loading matrix cannot be simplified by rotation. In such case, the existing rotational procedure would also fail to simplify the matrix. Third, the two stages in the proposed procedure should be merged into a single step in order to simplify the whole procedure. For example, one might consider minimizing the criterion

${\stackrel{˜}{f}}_{STG2}\left(U,T\right)={‖\left(T-\Lambda U{\prime }^{-\text{1}}\right)\cdot W‖}^{\text{2}}+\lambda \underset{j,l}{\sum }|{t}_{jl}|$ (19)

under the suitable constraints over U and T simultaneously. The paper did not consider the problem and utilized the two stages because the minimization over T is difficult in that W is a function of T. Even though the two-staged procedure works fairly well, and the feasibility and performance of the rotation procedure based on the minimization (19) should be investigated. The above three topics are the future direction for the research in rotational procedures.

Acknowledgements

The author thanks the careful review and useful comments by the editor and the anonymous reviewers.

Conflicts of Interest

The authors declare no conflicts of interest.