Computational Identification of Confirmatory Factor Analysis Model with Simplimax Procedures

Jingyu Cai; Henk A. L. Kiers; Kohei Adachi

doi:10.4236/ojs.2021.116062

Open Journal of Statistics > Vol.11 No.6, December 2021

Computational Identification of Confirmatory Factor Analysis Model with Simplimax Procedures

Jingyu Cai^1*, Henk A. L. Kiers², Kohei Adachi¹
¹Graduate School of Human Sciences, Osaka University, Osaka, Japan.
²Faculty of Behavioural and Social Sciences, University of Groningen, Groningen, The Netherlands.
DOI: 10.4236/ojs.2021.116062 PDF HTML XML 171 Downloads 1,179 Views

Abstract

Confirmatory factor analysis (CFA) refers to the FA procedure with some loadings constrained to be zeros. A difficulty in CFA is that the constraint must be specified by users in a subjective manner. For dealing with this difficulty, we propose a computational method, in which the best CFA solution is obtained optimally without relying on users’ judgements. The method consists of the procedures at lower (L) and higher (H) levels: at the L level, for a fixed number of zero loadings, it is determined both which loadings are to be zeros and what values are to be given to the remaining nonzero parameters; at the H level, the procedure at the L level is performed over the different numbers of zero loadings, to provide the best solution. In the L level procedure, Kiers’ (1994) simplimax rotation fulfills a key role: the CFA solution under the constraint computationally specified by that rotation is used for initializing the parameters of a new FA procedure called simplimax FA. The task at the H level can be easily performed using information criteria. The usefulness of the proposed method is demonstrated numerically.

Keywords

Confirmatory Factor Analysis, Cardinality of Loadings, Simplimax Rotation

Share and Cite:

Cai, J. , Kiers, H. and Adachi, K. (2021) Computational Identification of Confirmatory Factor Analysis Model with Simplimax Procedures. Open Journal of Statistics, 11, 1044-1061. doi: 10.4236/ojs.2021.116062.

1. Introduction

In factor analysis (FA), the variation of p observed variables is assumed to be explained by m common factors and p unique factors, with m < p and the two types of factors mutually uncorrelated. The m common factors serve to explain the variations of all p variables. On the other hand, each of the p unique factors has a one-to-one correspondence to each variable: a unique factor explains specifically the variation of the corresponding variable that remains unaccounted for by the common factors [1] [2] [3].

The parameters to be estimated in FA are a factor loading matrix L = (λ_ij) (p × m), a unique variance matrix Y = (ψ_ii_¢) (p × p), and a factor correlation matrix F = (f_jk) (m × m). Here, the loadings in L stand for how the variables load on the common factors, Y is the diagonal matrix whose diagonal element ψ_ii expresses the variance of the ith unique factor, and F contains the correlation coefficients among the m common factors. Upon making certain distributional assumptions for the factors, the covariance matrix S among p observed variables is modeled as

$Σ = Λ Φ Λ^{'} + Ψ$ . (1)

(e.g., Adachi, 2019 [1]; Mulaik, 2010 [2] ). Thus, a loss function $f (Λ, Ψ, Φ | S)$ can be defined, which stands for the discrepancy between (1) and its sample counterpart S (p × p): FA can be formulated as minimizing $f (Λ, Ψ, Φ | S)$ over L, Y, and F, for a given S.

FA can be classified into two types; confirmatory (CFA) and exploratory (EFA) [2]. In CFA, some loadings in L are constrained to take zero values, while no constraints are imposed on the loadings of EFA. In this paper, we focus on CFA. An example of the CFA model with a particular constraint is illustrated in Figure 1. There, a constrained loading matrix L is shown on the left, and the corresponding CFA model is depicted on the right as a diagram, in which only pairs of variables and factors with unconstrained loadings are linked by paths. In Figure 1, the binary matrix B is also presented which specifies the constraints in L and the links in the right diagram. The elements in B = (b_ij) (p × m) are defined generally as

$b_{i j} = {\begin{cases} 0, iff λ_{i j} = 0 \\ 1, otherwise \end{cases}$ , (2)

in other words, b_ij = 1 if variable i is linked to factor j; otherwise, b_ij = 0. In this sense, we call B a link matrix. Any CFA constraint can be expressed as $Λ = B • Λ$ . Here, · denotes the element-wise Hadamard product with $B • Λ = (b_{i j} λ_{i j})$ . It should be kept in mind that specifying a CFA model amounts to selecting a particular link matrix B. Thus, CFA can be formally expressed as

$\min_{Λ, Ψ, Φ} f (Λ, Ψ, Φ | S)$ s.t. $Λ = B • Λ$ for a specified $B \in S_{B}$ (3)

Figure 1. Illustration of constrained loadings and the corresponding CFA model.

with “s.t.” the abbreviation for “subject to” and $S_{B}$ denoting a set of considered matrices B.

A problem in CFA (3) is that the link matrix B = (b_ij) defined as (2) must be selected by users. That is, it must be decided in a subjective manner, which elements in B are to be zeros/ones, in other words, which pairs of variables and factors are linked as in Figure 1 whether the CFA model with a particular constraint is accepted or not is checked afterwards by referring to the goodness-of-fit of the solution [4] [5] [6]. However, even if the model is found acceptable, better CFA models with other link matrices B may exist. It implies that all possible B in $S_{B}$ must be considered for finding the best B. However, this is unfeasible, unless pm is very small, as the number of all possible B is enormous. That number can be calculated as 2^pm, since each of the pm elements in B takes zero or one as in (2): for example, for p = 12 and m = 3 one finds 2^pm @ 6.87 × 10¹⁰. In short, the problems in CFA can be summarized next:

[P1] The link matrix B (specifying a CFA model) must be selected subjectively by users.

[P2] An enormous number of possible matrices B in $S_{B}$ must be considered for finding the best B.

To the best of our knowledge, the problems [P1] and [P2] in CFA have not been considered in the existing papers. In order to deal with those problems, we propose an FA procedure for computationally and optimally identifying a suitable CFA model. Here, the model identification includes estimating the model parameter values. The outline of our proposed procedure is described in the next section. Then, we detail the procedure in Section 3, report its assessment in a simulation study in Section 4, give numerical examples in Section 5, and conclude this paper in Section 6.

2. Outline of the Proposed Procedure

First, we outline our approach to the CFA model identification under the condition that the number of zero loadings is fixed to a particular integer in Section 2.1. Then, in Section 2.2, the approach is extended to the cases with the number of zero loadings not being fixed. We then summarize the prospects for the following sections.

2.1. Model Identification for a Specified Number of Nonzero Loadings (Cardinality)

For dealing with the difficulties [P1] and [P2] in CFA, we can consider the two procedures introduced in the next paragraphs, on the condition that $Card (Λ) = Card (B)$ , that is, the number of nonzero values in L and B the matrix between parentheses equals a specified integer c hence

$Card (Λ) = c$ , or equivalently, $Card (B) = c$ . (4)

Clearly, pm - c equals the number of zeros in B or L.

The first procedure considered can be formulated as

$\min_{Λ, Ψ, Φ} f (Λ, Ψ, Φ | S)$ s.t. $Λ = B • Λ$ , after B is estimated optimally s.t. (4). (5)

This minimization is rewritten as performing CFA under the constraint indicated by link matrix B, with B estimated by another method in advance. Thus, the problem [P1] is overcome. Further, we can also consider that [P2] is dealt with, supposed that the value c in (4) and B are suitable.

The above procedure consists of two stages: first B is estimated, then CFA is performed, see (5). In contrast, the second procedure considered is formulated with a single stage as follows:

$\min_{B, Λ, Ψ, Φ} f (B, Λ, Ψ, Φ | S)$ s.t. $Λ = B • Λ$ and (4). (6)

Here, B has been added to the subscripts of “min”: the link matrix B, which indicates the pairs of variables and factors to be linked, is estimated jointly with the other parameters L, Y, and F.

Between (5) and (6) or (6¢), we can find the following difference: in minimizing loss function $f (Λ, Ψ, Φ | S)$ , the B value is kept fixed in (5), but allowed to change in (6). The difference implies that the resulting loss function value in (6) cannot exceed that value in (5):

$\min_{B, Λ, Ψ, Φ} f (B, Λ, Ψ, Φ | S) \leq \min_{Λ, Ψ, Φ} f (Λ, Ψ, Φ | S)$ . (7)

This inequality shows that (6) can provide a better solution than (5). We will propose an iterative algorithm for (6), which will be started by the optimum for (5). As empirically shown later, in almost all cases, the solutions of (5) and (6) are equivalent: we can obtain the final solutions only by (5), without performing the iterative algorithm for (6). However, it is worth to perform the latter step, as (6) can provide a better solution in a few cases.

In this paper, we use the maximum likelihood (ML) method for estimating the parameters in (5) and (6). This implies that the loss function to be minimized is given as the negative of the log likelihood. It is explicitly expressed as

$f (Λ, Ψ, Φ | S) = log | Σ | + tr S Σ^{- 1} = log | Λ Φ Λ^{'} + Ψ | + tr S {(Λ Φ Λ^{'} + Ψ)}^{- 1}$ , (8)

following from the normality assumptions for factors [7]. For minimizing (8), we use Rubin & Thayer’s [8] EM algorithm for FA, whose properties are discussed in Adachi [9]. This algorithm is detailed in Appendix A1.

2.2. Selection of the Best Cardinality

We should notice that the above approach is conditional upon c in (4). Thus, it remains to select the best value for c. This can be attained by the following procedure:

Select the value c with the lowest IC(c) among $c = c_{min}, \dots, c_{max}$ . (9)

Here, c_min/c_max expresses the reasonable minimum/maximum of c, and IC(c) denotes the value of an information criterion statistic [10], with the statistic being a function of c. The information criteria consist of some statistics which can be defined in ML-based procedures, and they show smaller values for better solutions. Explicit expressions of IC(c) are introduced later. We can rewrite (9) as follows: the procedure of (6) following (5) (in the last subsection) is performed for $c = c_{min}, \dots, c_{max}$ , so that we have multiple solutions, among which the solution with the least IC(c) is selected as the best one. The largest interval of [c_min, c_max] is obviously [1, pm]. It can be reasonably reduced as

$c_{\min} = p$ and $c_{\max} = p m - \frac{m (m - 1)}{2}$ (10)

This is because, solutions with c smaller than p would surely have one or more zero rows of loadings. On the other hand, it is considered in the c_max value that m(m - 1)/2 elements in L can be set to zeros without a change in the values of loss function (8).

Now, let us discuss how our proposed procedure is related to the set $S_{B}$ containing all possible B, using $S_{B} (c)$ for the subset of $S_{B}$ that contains the matrices B with c nonzeros. The number of B in $S_{B} (c)$ is given by

$N_{B} (c) = {}_{p m}C_{c}$ , (11)

i.e., the number of the combinations of the c elements being ones among all pm elements in B. Thus, the number of all B contained in $S_{B}$ is given by

$N_{B} = \sum_{c = c_{\min}}^{c_{\max}} N_{B} (c)$ (12)

with (10) and (11). For example, when p = 12 and m = 3, the value in (12) is approximately 1.25 × 10⁹, which is enormous (though less than 2^pm @ 6.87 × 10¹⁰ presented in the last section where (10) was not yet considered). However, it is not required to assess all N_B matrices B in $S_{B}$ in our proposed procedure, as the performance of (6) following (5) allows us to find the optimal B in $S_{B} (c)$ for a particular c, with this performance made for $c = c_{min}, \dots, c_{max}$ as in (9). That is, our proposed procedure can find the optimal B among all N_B matrices B in $S_{B}$ by the $c_{\max} - c_{\min} + 1$ runs of (6) following (5), but not by assessing all B. The example of $c_{\max} - c_{\min} + 1 = 22$ for p = 12 and m = 3 demonstrates how easily we can arrive at the optimal B.

The remaining parts of this paper are organized as follows: in Sect 3.1 and Sect 3.2, we detail (5) and (6) in turn. There, it is described that Kiers’ [11] simplimax rotation is used for the optimal estimation of B subject to (4) in (5) and the idea of this rotation also fulfills a key role in (6). Thus, the procedures for (5) and (6) are particularly called simplimax-based CFA (SbCFA) and simplimax FA (SimpFA), respectively. In the section after treating SbCFA (5) and SimpFA (6), we detail the whole process of the proposed procedure, i.e., SimpFA (6) following SbCFA (5) with the cardinality selection by (9). The proposed procedure is studied in a simulation study and illustrated with real data examples, as reported in the sections before concluding this paper.

3. Simplimax-Based Confirmatory Factor Analysis

In this section, we detail the procedure formulated as (5) with $f (Λ, Ψ, Φ | S)$ defined as (8). A key point is that we will use exploratory FA (EFA) followed by Kiers’ [11] simplimax rotation for estimating B optimally subject to (4) in (5). That is, in the procedure, firstly EFA is performed for a data set, secondly simplimax is applied to the EFA solution for estimating the link matrix B, and, finally, the resulting B is used for minimizing (8) subject to $Λ = B • Λ$ . These three stages are described respectively in the following subsections. The last one can be regarded as confirmatory FA (CFA) on the basis of the B given by simplimax. We thus call the procedure in this section simplimax-based CFA (SbCFA).

3.1. Exploratory Factor Analysis

Let T denote any m × m nonsingular matrix satisfying

$diag (T T^{'}) = I_{m}$ , (13)

where diag(TT¢) is the diagonal matrix whose diagonal elements are those of TT¢. EFA has rotational indeterminacy as shown next:

$Σ = Λ T^{- 1} T T^{'} T^{- 1}^{'} Λ^{'} + Ψ = Λ_{T} Φ {Λ^{'}}_{T} + Ψ$ (14)

with F = TT¢ and Λ_T = ΛT⁻¹. Here, the latter matrix can also be regarded as the loading matrix. Thus, even if F = TT¢ is fixed to I_m by choosing T that meets TT¢ = I_m, the (8) value remains unchanged when Λ is replaced by ΛT⁻¹. By taking account of this property, $f (Λ, Ψ, Φ = Ι_{m} | S)$ , i.e., (8) with F = I_m, is minimized over Λ and Y in EFA. This is attained with the EM algorithm described in Appendix A1. We use Λ_E for Λ resulting from EFA.

The indeterminacy (14) implies that Λ_T = Λ_ET⁻¹ and F = TT¢ can also be the EFA solutions of factor loading and correlation matrices, respectively, with $f (Λ_{E}, Ψ, Φ = Ι_{m} | S) = f (Λ_{T}, Ψ, Φ | S)$ . This fact is often exploited by obtaining T that allows Λ_T = Λ_ET to be interpretable. The procedures for obtaining T are generally referred to as factor rotation.

3.2. Simplimax Rotation

This subsection concerns the factor rotation intermediating between the first EFA and the final CFA stages. Here, the loading matrix is unconstrained in EFA, but constrained to meet $Λ = B • Λ$ in CFA as in (3). The latter constrainedness leads to

$\min_{Λ, Ψ, Φ} f (B • Λ, Ψ, Φ | S) \geq f (Λ_{E}, Ψ, Φ = Ι | S) = f (Λ_{T}, Ψ, Φ | S)$ , (15)

where $f (B • Λ, Ψ, Φ | S)$ stands for the loss function in (3) in which the CFA constraint $Λ = B • Λ$ substituted. Inequality (15) implies that the value of the EFA loss function $f (Λ_{E}, Ψ, Φ = Ι | S) = f (Λ_{T}, Ψ, Φ | S)$ gives the lower limit of $f (B • Λ, Ψ, Φ | S)$ to be minimized in CFA (3). This suggests that the best attainable CFA solution would be one which provides the $\min_{Λ, Ψ, Φ} f (B • Λ, Ψ, Φ | S)$ value close to the EFA counterpart the $f (Λ_{E}, Ψ, Φ = Ι | S) = f (Λ_{T}, Ψ, Φ | S)$ value.

Now if T can be chosen in the factor rotation such that Λ_T = Λ_ET⁻¹ is similar to a matrix that can be written as $B • Λ$ for particular matrices B and L, then we can expect that such B and L will be good candidates for giving a CFA solution with a fit close to that of EFA. This can be attained by the simplimax rotation [11], which is formulated as minimizing the simplimax function

$s p x (B, Λ, T) = {‖ B • Λ - Λ_{E} T^{- 1} ‖}^{2}$ (16)

over B, L, and T subject to (4) and (13). Thus, this rotation serves as a suitable bridge between the first and final stages.

For the constrained minimization of (16), two steps are iterated alternately. In one of them, the simplimax function (16) is minimized over T under (13) for a given $B • Λ$ , using Browne’s [12] algorithm. In another step, (16) is minimized over B = (b_ij) and Lunder (8) for a given Λ_ET⁻¹. As this problem is also related to the procedure in the next section, we express the minimization in a generalized form as

$\min_{B, Λ} s p x (B, Λ | H) = {‖ B • Λ - H ‖}^{2}$ for a given p×m matrix $H = (h_{j k})$ , (17)

with H = Λ_ET⁻¹ in the present context. As explained in Appendix A3, (17) can be attained for

$b_{i j} = {\begin{cases} 1, if h_{i j}^{2} \geq h_{[c]}^{2} \\ 0, otherwise \end{cases}$ and $Λ = H$ , (18)

with $h_{[c]}^{2}$ denoting the cth largest value of all elements in $H • H = (h_{i j}^{2})$ . The steps in the simplimax rotation algorithm is listed in Appendix A2.

3.3. Confirmatory Factor Analysis Following Simplimax Rotation

The final stage is simply to perform CFA using B obtained by the simplimax rotation. Thus, SbCFA is formulated by making (5) concrete as

$\min_{Λ, Ψ, Φ} f (Λ, Ψ, Φ | S)$ s.t. $Λ = B • Λ$

after B is estimated s.t. $Card (B) = c$ by simplimax (5¢)

with $f (Λ, Ψ, Φ | S)$ defined as (8). The minimization in (5) or (5¢) is attained with the EM algorithm, which is described in Appendix A1.

3.4. Simplimax Factor Analysis

We will now describe simplimax FA (SimpFA). This is the procedure minimizing (6), using the solution from SbCFA as a starting configuration. Specifically, we minimize the negative of the log likelihood (see (8)), with $B • Λ$ is substituted into L, over B, L, Y, and F subject to card(å) = c (see (4)). We call this procedure simplimax FA (SimpFA), as this is a new FA procedure, and the minimization of the simplimax function (17) fulfills a key role as will be seen in the next paragraph.

The EM algorithm described in Appendix A1 can also be used for SimpFA. The algorithm for SimpFA differs from that for the other FA procedures, in that the former includes the step for minimizing (8) over both B and L with Y and F fixed. An innovative feature in the algorithm is to use Kiers’ [13] [14] majorization method for dealing with difficulties in treating (8) as a function of B and L. In that method, an auxilary function that majorizes (8) is minimized. For this minimization,

$s p x (B, Λ | W) = {‖ B • Λ - W ‖}^{2}$ , (19)

i.e., the simplimax function in (17) with H = W, fulfills a key role. Here, $W = Λ_{C} + Q^{'} {Λ^{'}}_{C} Ψ^{- 1} - tr C^{'} Ψ^{- 1} Λ_{C}$ with Q and C defined in A.1 and L_C is the current L value (before update), see Appendix A4 for a derivation.

3.5. Multiple Runs of SimpFA Following SbCFA

Here, we describe the procedures in SimpFA following SbCFA, in which SbCFA provides the initial values of the parameters in SimpFA providing the final solution. We take a multiple run approach for SimpFA following SbCFA, in order to reduce the possibility of selecting a local minimizer as the optimal solution: the algorithm of SbCFA followed by SimpFA is run multiple times by starting SbCFA with mutually different initial values, and the best solution is selected among the multiple SimpFA solutions. The procedure is listed as follows:

Stage 1. For S, perform EFA to provide L_E and set l = 1.

Stage 2. For each of $l = 1, \dots, 100$ , perform the following sub-stages:

Stage 2.1. Initialize T to T_l and perform simplimax. Express the resulting B as B_l.

Stage 2.2. For S, perform CFA using B_l as B. Express the resulting L, Y, and F as ${\tilde{Λ}}_{l}$ , ${\tilde{Ψ}}_{l}$ , and ${\tilde{Φ}}_{l}$ , respectively, with their set ${\tilde{Θ}}_{l} = {{\tilde{Λ}}_{l}, {\tilde{Ψ}}_{l}, {\tilde{Φ}}_{l}}$ .

Stage 2.3. For S, perform SimpFA with L, Y, and F initialized at ${\tilde{Λ}}_{l}$ , ${\tilde{Ψ}}_{l}$ , and ${\tilde{Φ}}_{l}$ , respectively. Express the resulting L, Y, and F as L_l, Y_l, and F_l, respectively, with their set $Θ_{l} = {Λ_{l}, Ψ_{l}, Φ_{l}}$ .

Stage 3. Select ${\hat{Λ}, \hat{Ψ}, \hat{Φ}}$ with $f (\hat{Λ}, \hat{Ψ}, \hat{Φ} | S) = \min_{l} f (Λ_{l}, Ψ_{l}, Φ_{l} | S)$ as the optimal solution.

Here, the initial value T_l for T in Stage 2.1 is chosen as follows: T_l is obtained by the varimax rotation for L_E if l = 1; otherwise, T_l is set to $diag {(T_{0} {T^{'}}_{0})}^{- 1 / 2} T_{0}$ with the elements in T₀ chosen randomly: the resulting T_l can be substituted into T. In Figure 2, the above stages are graphically illustrated, so that they can be captured visually.

3.6. Selection of Cardinality by Information Criteria

The final solution ${\hat{Λ}, \hat{Ψ}, \hat{Φ}}$ resulting in Stage 3 (the last subsection) depends on the cardinality c value in (8). We thus use ${{\hat{Λ}}_{c}, {\hat{Ψ}}_{c}, {\hat{Φ}}_{c}}$ for the solution ${\hat{Λ}, \hat{Ψ}, \hat{Φ}}$ for a particular value of c. The best value for c can be selected with the procedure (9). This can be rewritten using ${{\hat{Λ}}_{c}, {\hat{Ψ}}_{c}, {\hat{Φ}}_{c}}$ as

Choose the ${{\hat{Λ}}_{c^{*}}, {\hat{Ψ}}_{c^{*}}, {\hat{Φ}}_{c^{*}}}$ for the value $c^{*} = \arg \min_{c_{\min} \leq c \leq_{\max}} I C (c)$ (20)

with c_min and c_max defined as (10).

Figure 2.Graphical illustration of Stages 1 - 3.

As IC(c) in (20), we consider using either of Akaike’s [15] information criterion (AIC) or Schwarz’s [16] Bayesian information criterion (BIC), which are particularly popular in the information criteria. AIC and BIC can be calculated as

$A I C (c) = 2 f ({\hat{Λ}}_{c}, {\hat{Ψ}}_{c}, {\hat{Φ}}_{c} | S) + 2 κ (c)$ , (21)

$B I C (c) = 2 f ({\hat{Λ}}_{c}, {\hat{Ψ}}_{c}, {\hat{Φ}}_{c} | S) + \log n κ (c)$ , (22)

respectively, for a particular value of c. Here, n is the number of observations, and k(c) = c + α with α the number of unique variances plus the number of inter-factor correlations. Thus, (21) or (22) is substituted into IC(c) in (20). Whether we should use (21) or (22) is assessed in the next section.

4. Simulation Study

We performed a simulation study to assess the proposed procedure in the last section, with respect to [1] how often the c value in (8) is selected correctly by AIC and BIC; [2] how similar the SbCFA and SimpFA solutions are; [3] how well parameter values are recovered. The procedures for synthesizing and analyzing data in the study are described in the first subsection, then the results for [1] [2] and [3] are reported in the following ones.

4.1. Data Synthesis and Analysis

With p = 12, and m =3, we set the true {L, Y, F} as in Table 1 and sampled n = 300 rows of an n × p data matrix X from the p-variate normal distribution whose average is 0_p and covariance matrix is defined as $Σ = Λ Φ Λ^{'} + Ψ$ , see (1). The resulting X provided the sample covariance matrix S = n⁻¹X¢X. We replicated this procedure 200 times to have 200 matrices S.

For each S, we carried out the procedure of SimpFA following SbCFA with the selection of the best c by (20), using both AIC and BIC. It gives the SimpFA solutions ${{\hat{Λ}}_{c^{*}}, {\hat{Ψ}}_{c^{*}}, {\hat{Φ}}_{c^{*}}}$ , which are classified into two types according to whether c was chosen by AIC or BIC.

4.2. Correctness of Selected Cardinality

Let c_true denote the true Card(L). As found in Table 1, c_true = 15. On the other

Table 1. True parameters in L, Y1_p, and F with blank cells indicating zero elements.

hand, c^* expresses Card(L) selected by the proposed procedure (20). We obtained the deviation c^* − c_true for each of the 200 data set. The averages (standard deviation) of these deviations over all data sets are 2.43 (1.40) when using AIC and 0.34 (0.62) when using BIC. This result shows that AIC tends to overestimate Card(L). Moreover, the mean absolute bias |c^* − c_true| for BIC was smaller than that for IC for 184 data sets among the 200 ones. This result shows that BIC is substantially better than AIC. Accordingly, we take only the BIC-based solution into consideration from here.

4.3. Equivalence of SbCFA and SimpFA Solutions

It was found that 98 percent of the 200 solutions of SimpFA were equivalent to those of the SbCFA ones used for initializing the SimpFA parameters: no iteration in the SimpFA algorithm was required. Also in the remaining two percent of the runs, SbCFA and SimFA solutions were almost equivalent, with the average of the differences between the two solutions over the 200 data sets being 0.001. Here, the difference is defined as $({‖ {\hat{Λ}}_{c^{*}} - {\tilde{Λ}}_{c^{*}} ‖}_{1} + {‖ {\hat{Ψ}}_{c^{*}} - {\tilde{Ψ}}_{c^{*}} ‖}_{1} + {‖ {\hat{Φ}}_{c^{*}} - {\tilde{Φ}}_{c^{*}} ‖}_{1}) / {p m + p - m (m - 1)}$ with ${‖ Γ ‖}_{1} = Σ_{i} Σ_{j} | γ_{i j} |$ denoting the L₁ norm of a matrix G = (γ_ij) and ${{\tilde{Λ}}_{c^{*}}, {\tilde{Ψ}}_{c^{*}}, {\tilde{Φ}}_{c^{*}}}$ being the SbCFA counterpart of ${{\hat{Λ}}_{c^{*}}, {\hat{Ψ}}_{c^{*}}, {\hat{Φ}}_{c^{*}}}$ .

The above results do not show that the SimpFA (following SbCFA) is useless, as SimpFA serves for showing the equivalence of its solution to the SbCFA one, which allows us to find the optimality of the SbCFA solution.

4.4. Recovery of Parameters

We assess how well the true parameter values are recovered by the SimpFA solution ${\hat{Θ}}_{c^{*}} = {{\hat{Λ}}_{c^{*}}, {\hat{Ψ}}_{c^{*}}, {\hat{Φ}}_{c^{*}}}$ , which is provided by (20) with IC(c) = BIC(c). As the indices standing for the badness in the recovery for L, Y, and F,we obtained ${‖ {\hat{Λ}}_{c^{*}} - Λ_{true} ‖}_{1} / (p m)$ , ${‖ {\hat{Ψ}}_{c^{*}} - Ψ_{true} ‖}_{1} / p$ , and ${‖ {\hat{Φ}}_{c^{*}} - Φ_{true} ‖}_{1} / M$ , respectively. Further, for assessing the incorrectness in identifying the true zero and nonzero loadings, we recorded MIR₀ = the number of the true zero loadings estimated as nonzero/(pm − c), and MIR_# = the number of the true nonzero loadings estimated as zero/c, for each data set, with M = m(m − 1). Here MIR stands for misidentification rate.

The statistics of the resulting five index values over the 200 data sets are presented in Table 2. It shows that the proposed procedure recovered ${{\hat{Λ}}_{c^{*}}, {\hat{Ψ}}_{c^{*}}, {\hat{Φ}}_{c^{*}}}$ , i.e., the links of variables to factors and parameter values, fairly well, except that the 95 percentile for factor correlations can be considered large. These results allow us to conclude that the true CFA model can be recovered satisfactorily, though the estimates of factor correlations might deviate from the true counterparts in a few cases.

5. Real Data Demonstration

In order to demonstrate how useful our proposed procedure is, we apply it to two data sets that have already been analyzed by CFA with zero constraints selected by users. The solutions of the latter user-based CFA are compared with those for our proposed procedure.

The first data set is Carlson and Mulaik’s [17] personality trait data matrix of n = 280 (participants) × p = 15 (personality traits). Its correlation matrix (Mulaik, 2010, p. 198) has been analyzed by Mulaik [2] using CFA with his selected constraints, and its solution is shown in Mulaik (2010, p. 437) and also in Table 3 together with the solution of the proposed procedure. Here, the latter solution is the same as the one of SbCFA before SimpFA: iteration was not required in the SimpFA algorithm. Comparing the BIC values in Table 3, we can find that the solution of the proposed procedure was better than Mulaik’s [2] one.

Another data set is Kojima’s housing preference one with n = 1120 (participants) and p = 13 (features). It describes to what degree the participants wish to

Table 2. Percentiles, average (Avg) and standard deviation (SD) of each index over200 data sets.

Table 3. Solution for personality trait data, with blank cells indicating zero elements.

live in the houses featured by the variables. The correlation matrix for this data set is presented in Table 4, as it is described in Japanese and not easily available. This matrix has been analyzed by Kojima using CFA with his selected constraints, and its solution was shown in Kojima and also in Table 5 with the solution of the proposed procedure. Here, this solution differs from the SbCFA counterpart: iteration was required in the SimpFA algorithm. In Table 5, BIC shows that the solution of the proposed procedure was better.

In both of the above examples, the proposed procedure outperformed in the BIC values, but its solutions are similar to the counterparts of CFA with users’ selected constraints (Table 3 and Table 5). This similarity shows that the selected constraints were rational. However, the proposed procedure is fully computational and does not require any users’ effort for considering constraints. This property would be helpful as soon as there is some uncertainty as to which loadings to constrain to 0.

6. Conclusions

A problem in the confirmatory factor analysis (CFA) is that users must select

Table 4. Correlation matrix for Kojima’s (2013) housing preference data.

Table 5. Solution for housing preference data, with blank cells indicating zero elements.

what pairs of variables and factors are linked, in other words, what loadings are to be zero or nonzero, in a subjective manner. To deal with this problem, we proposed the procedure of SimpFA following SbCFA for computating an optimally suitable CFA model and its solution without relying on any user’s judgment. The simulation study showed that the true CFA model and parameter values can be recovered fairly well by the proposed procedure. Real data examples demonstrated that it can outperform CFA with users’ selected constraints in terms of the BIC statistic.

In Section 4.3, we found the SbCFA solutions to be equivalent to SimpFA in almost all cases. In particular, the good performance of SbCFA was somewhat surprising. To theoretically study, reasons for such a SbCFA performance are considered as a subject for a future study.

Appendix

A1. EM Algorithm for Factor Analysis

In Rubin and Thayer’s (1982) EM algorithm for FA, anauxiliary function of (8) is defined as

$ϕ (Λ, Ψ, Φ | S) = log | Ψ | + tr Ψ^{- 1} (S - 2 C Λ^{'} + Λ Q Λ^{'}) + log | Φ | + tr Q Φ^{- 1}$ . (A1)

Here, C = SA and Q = A¢SA + U, with A and U being computed as

$A = {(Λ Φ Λ^{'} + Ψ)}^{- 1} Λ Φ$ and $U = Φ^{1 / 2} {[I_{m} + (Φ^{1 / 2} Λ^{'}) Ψ^{- 1} (Λ Φ^{1 / 2})]}^{- 1} Φ^{1 / 2}$ (A2)

using the current L,Y, and F values. In the EM algorithm, (A1) rather than (8) is minimized iteratively. This minimization is known to allow (8) to be minimized (Rubin & Thayer, 1982). This fact also holds true, when L is constrained as $Λ = B • Λ$ . Thus, the minimization of (8) in EFA, SbCFA (5) or (5¢), and SimpFA (6) or (6¢) is attained with the EM algorithm.

The EM algorithm for EFA, SbCFA, and SimpFA can be summarized as follows:

Step 1. Initialize L,Y, and F, with B also initialized in SimpFA.

Step 2. Obtain C and Q in (A.1) through (A.2).

Step 3. Minimize (A.1) over Y with L and F fixed.

Step 4. Minimize (A.1) over L with Y and F fixed, with “over L“ replaced by “over B and L under (4)” only in SimpFA.

Step 5. Minimize (A.1) over F with Y and L fixed.

Step 6. Transform F and L so that F is a correlation matrix and finish if convergence is reached; otherwise. Go back to Step 2.

Here, the minimization in Step 3 is attained when the ith diagonal elements in Y are updated as $ψ_{i i} = s_{i i} - 2 {λ^{'}}_{i} c_{i} + {λ^{'}}_{i} Q λ_{i}$ ( $i = 1, \dots, p$ ), with ${λ^{'}}_{i}$ the ith row of L, ${c^{'}}_{i}$ that of C, and s_ii the (i,i) element of S. The task of Step 5 is attained simply by updating F as F = Q. This is because F may be regarded simply as a covariance matrix during the iteration and finally be transromed into a correlation matrix. Thus, ifconvergenceisreachedin Step 6, we must transform F and L as $Φ_{R} = diag {(Φ)}^{- 1 / 2} Φ diag {(Φ)}^{- 1 / 2}$ and $Λ_{R} = Λ diag {(Φ)}^{1 / 2}$ , then regard the resulting F_R and L_R as the solutions of F and L, respectively. Here, we should notice that the transformation does not change the value of (8), which is shown by $Λ Φ Λ^{'} =$ $Λ D (D^{- 1} Φ D^{- 1}) {(Λ D)}^{'}$ with D an m × m diagonal matrix. Convergence in Step 6 is defined here as the decrease in (6) from the previous round being less than 10⁻⁶. The details in Steps 1 and 4 differ among EFA, SbCFA, and SimpFA, as described in the next paragraphs.

The initialization in Step 1 is detailed here. In EFA, the eigenvalue decomposition of S defined as S = LD²L¢ is used, with D² the p × p diagonal matrix whose diagonal elements are arranged in descending order, and LL¢ = I_p. That is, the initial L and Y are set to L_mD_m and $diag (S - L_{m} Δ_{m}^{2} {L^{'}}_{m})$ , respectively, in EFA, with $Δ_{m}^{2}$ the first m × m diagonal block of D² and L_m the p × m matrix containing the first m columns of L. In SbCFA, the initial Y is set to the one obtained in the preceding EFA, while the initial L and F are set to the matrices $B • Λ$ and TT¢, respectively, that are obtained by the preceding simplimax rotation. In SimpFA, L, Y, and F are intialized at their SbCFA solution and B at its simplimax rotation solution.

The minimization in Step 4 is attained by the update of L with L = CQ⁻¹ in EFA and the row-wise $λ_{i} = {B^{'}}_{i} {(B_{i} Q {B^{'}}_{i})}^{- 1} B_{i} c_{i}$ ( $i = 1, \dots, p$ ) in SbCFA. Here, B_i is the m_i × m binary matrix satisfying ${1^{'}}_{m_{i}} B_{i} = {b^{'}}_{i}$ with ${b^{'}}_{i}$ the ith row of the link matrix B = (b_ij) defined as (2) and $m_{i} = {b^{'}}_{i} 1_{m}$ : for example, if ${b^{'}}_{i} = [1, 0, 1]$ ,

then $B_{i} = [\begin{array}{l} 1 & 0 & 0 \\ 0 & 0 & 1 \end{array}]$ . The Step 4 in SimpFA is detailed in Appendix A4.

A2. Algorithm for Simplimax Rotation

The algorithm for the simplimax rotation can be summarized as follows:

Step 1. Initialize T

Step 2. Update B and L as (18) with H = L_ET⁻¹

Step 3. Update T with Browne’s (1972) algorithm

Step 4. Finish if convergence is reached; otherwise, go back to Step 2.

How T is initialized is described in the section for the whole process of the proposed procedure. The convergence is defined that the decrease in (16) from the previous round is less than 10⁻⁶ in this paper.

A3. Derivation of (18)

The simplimax function $s p x (B, Λ | H) = {‖ B • Λ - H ‖}^{2}$ in (17) can be rewritten as

$s p x (B, Λ | H) = \sum_{(i, j) \in ℵ} {(b_{i j} λ_{i j} - h_{i j})}^{2} + \sum_{(i, j) \in ℵ^{⊥}} h_{i j}^{2} \geq \sum_{(i, j) \in ℵ^{⊥}} h_{i j}^{2}$ . (A3)

with H = (h_ij). Here, $ℵ$ denotes the set of the index pairs (i, j) for b_ij = 1, while $ℵ^{⊥}$ is the set of (i, j) for b_ij = 0, and we have used $\sum_{(i, j) \in ℵ^{⊥}} {(b_{i j} λ_{i j} - h_{i j})}^{2} = \sum_{(i, j) \in ℵ^{⊥}} h_{i j}^{2}$ . The inequality in (A.3) shows that the lower limit of $s p x (B, Λ | H)$ is $\sum_{(i, j) \in ℵ^{⊥}} h_{i j}^{2}$ which is attained if b_ijλ_ij = h_ij. Furthermore, the limit $\sum_{(i, j) \in ℵ^{⊥}} h_{i j}^{2}$ is minimal when $ℵ^{⊥}$ contains the (i, j) for $h_{i j}^{2} \leq {[h_{i j}^{2}]}_{q}$ , with $q = p m - c$ and ${[h_{i j}^{2}]}_{q}$ the qth smallest $h_{i j}^{2}$ value among $h_{i j}^{2}$ , $i = 1, \dots, p$ ; $j = 1, \dots, m$ . This can be rewritten as (18). That is, (17) is attained for (18).

A4. Majorization Algorithm for SimpFA Loadings

Let us rewrite (A.1) as $ϕ^{*} (Λ) + c o n s t$ , where const is a part independent of $Λ = B • Λ$ and

$ϕ^{*} (Λ) = - 2tr Ψ^{- 1} C Λ^{'} + tr Ψ^{- 1} Λ Q Λ^{'} = tr Ψ^{- 1} Λ Q Λ^{'} - 2tr C^{'} Ψ^{- 1} Λ$ . (A4)

This minimization over $B • Λ$ is found to be the task of Step 4 in Appendix A1 for SimpFA. Using $Δ = Λ - Λ_{C}$ or $Λ = Λ_{C} + Δ$ with L_C the current L value, (A4) can be rewritten as

$\begin{matrix} ϕ^{*} (Λ) = tr Ψ^{- 1} (Λ_{C} + Δ) Q {(Λ_{C} + Δ)}^{'} - 2tr C^{'} Ψ^{- 1} (Λ_{C} + Δ) \\ = ϕ^{*} (Λ_{C}) + tr Ψ^{- 1} Δ Q Δ^{'} + tr 2 V Δ \end{matrix}$ (A4¢)

with $V = Q^{'} {Λ^{'}}_{C} Ψ^{- 1} - C^{'} Ψ^{- 1} Λ_{C}$ . Kiers (1990, Theorem 1) shows that $tr Ψ^{- 1} Δ Q Δ$ in (A4¢) satisfies the inequality $tr Ψ^{- 1} Δ Q Δ^{'} / {‖ Δ ‖}^{2} \leq α$ , i.e., $tr Ψ^{- 1} Δ Q Δ^{'} \leq α {‖ Δ ‖}^{2}$ , where α is the greatest eigenvalue of $Ψ^{- 1} \otimes Q$ , with $\otimes$ denoting the Kronecker product. Using that inequality, a function majorizing (A4¢) can be defined as

$η (Λ) = ϕ^{*} (Λ_{C}) + β tr {‖ Δ ‖}^{2} + 2tr V Δ = g (Λ_{C}) + β {‖ Δ - V ‖}^{2} - β {‖ V ‖}^{2}$ , (A5)

with β ≥ α and β ≥ 0: (14) satisfies $ϕ^{*} (Λ_{C}) = η (Λ_{C}) \geq η (Λ) \geq ϕ^{*} (Λ)$ if L is the minimizer of $ϕ^{*} (Λ_{C})$ . Since only $β {‖ Δ - V ‖}^{2}$ is a function of L with β ³ 0 on the right side of (A5), the task of Step 4 is attained by minimizing $η^{*} (Λ) = {‖ Δ - V ‖}^{2}$ . From $Δ = Λ - Λ_{C}$ , we can rewrite $η^{*} (Λ)$ as $η^{*} (Λ) = {‖ Λ - Λ_{C} - V ‖}^{2} = {‖ Λ - W ‖}^{2}$ , which is the simplimax function (19), with $W = Λ_{C} + V$ . Thus, B and L to be obtained in Step 4 are given by (18) with H = (h_ij) and $h_{[c]}^{2}$ replaced by W = (w_ij) and $w_{[c]}^{2}$ , respectively. Here, $w_{[c]}^{2}$ is the cth largest value of all elements in $W • W = (w_{i j}^{2})$ .

Conflicts of Interest

The authors declare no conflicts of interest.

References

[1]	Adachi, K. (2019) Factor Analysis: Latent Variable, Matrix Decomposition, and Constrained Uniqueness Formulations. WIREs Computational Statistics, 11, e1458. https://onlinelibrary.wiley.com/doi/abs/10.1002/wics.1458 https://doi.org/10.1002/wics.1458
[2]	Mulaik, S.A. (2010) Foundations of Factor Analysis. 2nd Edition, CRC Press, Boca Raton.
[3]	Yanai, H. and Ichikawa, M. (2007) Factor Analysis. In: Rao, C.R. and Sinharay, S., Eds., Handbook of Statistics, Vol. 26, Elsevier, Amsterdam, 257-296. https://doi.org/10.1016/S0169-7161(06)26009-7
[4]	Arbuckle, J.L. (2008) AMOS 17 User’s Guide. SPSS Inc., Chicago.
[5]	Bollen, K.A. (1989) Structural Equations with Latent Variables. Wiley, New York. https://doi.org/10.1002/9781118619179
[6]	Jöreskog, K.G. (1969) A General Approach to Confirmatory Maximum Likelihood Factor Analysis. Psychometrika, 34, 183-202. https://doi.org/10.1002/9781118619179
[7]	Lawley, D.N. (1941) Further Investigation of Factor Estimation. Proceedings of the Royal Society of Edinburgh A, 61, 176-185. https://doi.org/10.1017/S0080454100006178
[8]	Rubin, D.B. and Thayer, D.T. (1982) EM Algorithms for ML Factor Analysis. Psychometrika, 47, 69-76. https://doi.org/10.1007/BF02293851
[9]	Adachi, K. (2013) Factor Analysis with EM Algorithm Never Gives Improper Solutions When Sampleand Initial Parameter Matrices Are Proper. Psychometrika, 78, 380-394. https://doi.org/10.1007/s11336-012-9299-8
[10]	Konishi, S. and Kitagawa, G. (2008) Information Criteria and Statistical Modeling. Springer, New York. https://doi.org/10.1007/978-0-387-71887-3
[11]	Kiers, H.A.L. (1994) Simplimax: Oblique Rotation to an Optimal Target with Simple Structure. Psychometrika, 59, 567-579. https://doi.org/10.1007/BF02294392
[12]	Browne, M.W. (1972) Oblique Rotation to a Partially Specified Target. British Journal of Mathematical and Statistical Psychology, 25, 207-212. https://doi.org/10.1007/BF02294392
[13]	Kiers, H.A.L. (1990) Majorization as a Tool for Optimizing a Class of Matrix Functions. Psychometrika, 55, 417-428. https://doi.org/10.1007/BF02294758
[14]	Kiers, H.A.L. (2002) Setting Up Alternating Least Squares and Iterative Majorization Algorithms for Solving Various Matrix Optimization Problems. Computational Statistics and Data Analysis, 41, 157-170. https://doi.org/10.1016/S0167-9473(02)00142-1
[15]	Akaike, H. (1974) A New Look at the Statistical Model Identification. IEEE Transactions on Automatic Control, 19, 716-723. https://doi.org/10.1109/TAC.1974.1100705
[16]	Schwarz, G. (1978) Estimating the Dimension of a Model. Annals of Statistics, 6, 461-464. https://doi.org/10.1214/aos/1176344136
[17]	Carlson, M. and Mulaik, S.A. (1993) Trait Ratings from Descriptions of Behavior as Mediated by Components of Meaning. Multivariate Behavioral Research, 28, 111-159. https://doi.org/10.1207/s15327906mbr2801_7

Journals Menu

Follow SCIRP

	+1 323-425-8868
	customer@scirp.org
	+86 18163351462(WhatsApp)
	1655362766

	Paper Publishing WeChat

Journals Menu

Home

About SCIRP

Service

Policies