%\VignetteEncoding{UTF-8} %\VignetteIndexEntry{Factor Model Fit and Simple Structure Diagnostics} %\VignetteEngine{utils::Sweave} %\VignettePackage{GPArotation} %\VignetteDepends{GPArotation} %\VignetteKeyword{factor model fit} %\VignetteKeyword{simple structure} \documentclass[english, 10pt]{article} \usepackage{hyperref} \usepackage[margin=0.7in, letterpaper]{geometry} \bibstyle{apacite} \bibliographystyle{apa} \usepackage{natbib} \begin{document} \SweaveOpts{echo=TRUE, results=hide} \begin{center} \section*{Factor Model Fit and Simple Structure Diagnostics \\ ~~\\ The \texttt{GPArotation} Package} \end{center} \begin{center} Author: Coen A. Bernaards \end{center} \section{Introduction} Evaluating an exploratory factor analysis solution involves two distinct questions. First, does the $k$-factor model adequately reproduce the observed correlations? Second, once a factor solution is accepted, does rotation produce clean simple structure with loading patterns that are easy to interpret? This vignette covers both questions using the \texttt{GriffithMulaik} and \texttt{CCAI} datasets included in \texttt{GPArotation}. Model fit is assessed via RMSEA and SRMR. Simple structure quality is assessed via AUC, FSI, hyperplane counts, and the Hoffman, Gini, and Bentler (\cite{bentler1977}) indices, all of which are reported automatically by \texttt{summary()}. \section{Model Fit: How Well Does the Factor Model Reproduce the Data?} We use the Griffith and Mulaik (\cite{griffith1998}) interpersonal personality data (\texttt{GriffithMulaik}, $n = 523$, 24 variables, 6 hypothetical interpersonal factors) to illustrate the complete model evaluation workflow. The data are described in \cite{mulaik2018}. <<>>= library("GPArotation") data("GriffithMulaik", package = "GPArotation") n.obs <- 523 @ \subsection{Choosing the Number of Factors} \subsubsection{Fit Statistics} Note that \texttt{n.obs} must be specified in the \texttt{factanal} call for fit statistics to be available. It is stored automatically in the \texttt{GPArotation} object and used by \texttt{calc\_fitstats} to compute RMSEA and SRMR. <>= for (k in 3:8) { fa.k <- factanal(factors = k, covmat = GriffithMulaik, n.obs = n.obs, rotation = "none") res.k <- quartimax(fa.k) fit.k <- GPArotation:::calc_fitstats(res.k) cat(sprintf("k = %d RMSEA = %.4f [%.4f, %.4f] SRMR = %.4f\n", k, fit.k$RMSEA, fit.k$RMSEA.l, fit.k$RMSEA.u, fit.k$SRMR)) } @ RMSEA values below 0.05 indicate close fit and below 0.08 acceptable fit (\cite{hu1999}). SRMR below 0.08 is generally considered acceptable. For these data $k = 6$ achieves RMSEA below 0.05 and SRMR below 0.025, consistent with Mulaik's theoretical prediction of six interpersonal factors. \subsection{Residual Analysis} Global fit indices summarize overall model adequacy but can mask localized problems. A residual correlation matrix inspection identifies which specific item pairs are poorly reproduced by the factor model. The residuals here are \emph{correlation residuals}. The difference between the observed correlation $r_{ij}$ and the model-implied correlation $\hat{r}_{ij}$: \[ e_{ij} = r_{ij} - \hat{r}_{ij}, \quad \hat{R} = L \Phi L^T \] A \textbf{positive residual} means the model underestimates the relationship between two items. They share more variance than the factors account for. This may indicate a missing factor, shared method variance, or item redundancy. A \textbf{negative residual} means the model overestimates the relationship. Residuals near zero indicate the factor model reproduces that pair well. As a rough guide: \begin{itemize} \item $|e_{ij}| < 0.05$ --- excellent local fit \item $0.05 \leq |e_{ij}| < 0.10$ --- acceptable \item $|e_{ij}| \geq 0.10$ --- worth investigating \item $|e_{ij}| \geq 0.20$ --- serious local misfit \end{itemize} Patterns in the residuals are informative. Clustered large residuals around a particular item suggest that item may need its own factor or has method variance not captured by the model. Large residuals between items from different factors suggest those factors may be more correlated than the model captures. \subsection{Residuals Are a Property of the Extraction, Not the Rotation} An important property of factor rotation is that it does not change the model-implied correlation matrix $\hat{R}$. For orthogonal rotation, $\Phi = I$ and $T$ is orthogonal ($TT' = I$), so: \[ \hat{R} = LT(LT)' = LTT'L' = LL' = AA' \] For oblique rotation, $\Phi = T'T$, so: \[ \hat{R} = AT^{-T} \cdot T'T \cdot T^{-1}A' = AA' \] Both reduce to $AA'$, the model-implied correlation matrix that depends only on the unrotated loadings $A$, not on the rotation matrix $T$. This means the residual matrix $R - \hat{R}$ and all derived fit statistics (SRMR, RMSEA) are identical across all rotation methods applied to the same unrotated solution. <>= fa.un <- factanal(factors = 6, covmat = GriffithMulaik, n.obs = n.obs, rotation = "none") res.vm <- Varimax(fa.un) res.ob <- oblimin(fa.un, randomStarts = 100) res.t1 <- tandemI(fa.un) fit.vm <- GPArotation:::calc_fitstats(res.vm) fit.ob <- GPArotation:::calc_fitstats(res.ob) fit.t1 <- GPArotation:::calc_fitstats(res.t1) cat(sprintf("Varimax SRMR = %.4f RMSEA = %.4f\n", fit.vm$SRMR, fit.vm$RMSEA)) cat(sprintf("Oblimin SRMR = %.4f RMSEA = %.4f\n", fit.ob$SRMR, fit.ob$RMSEA)) cat(sprintf("TandemI SRMR = %.4f RMSEA = %.4f\n", fit.t1$SRMR, fit.t1$RMSEA)) @ All three produce identical SRMR and RMSEA values. Fit statistics therefore inform the choice of \emph{how many} factors to extract, not \emph{how} to rotate them. To compare rotation methods, use the criterion-free simple structure measures (AUC, FSI, and hyperplane counts) which do vary across rotation methods and are reported automatically by \texttt{print()} and \texttt{summary()}. <>= GPArotation:::audit_residuals(res.ob) @ Items with high mean absolute residuals may cross-load on an additional factor or have method variance not captured by the model. For the \texttt{GriffithMulaik} data, \texttt{IMPARTL} is flagged with two pairs exceeding 0.10. It loads weakly on multiple factors, suggesting it does not discriminate cleanly between the six interpersonal dimensions. \begin{center} <>= plot(res.ob, "residuals") @ \end{center} The heatmap shows the full residual matrix. Blue cells indicate the model overestimates the correlation, red cells indicate underestimation. The sorted bar chart shows the distribution of all off-diagonal absolute residuals. \section{Simple Structure: How Clean Is the Rotated Solution?} Once an adequate model is accepted, rotation is evaluated by how well it achieves simple structure. GPArotation reports five criterion-free simple structure measures automatically via \texttt{print()} and \texttt{summary()}. \subsection{Per-Factor Measures: AUC and FSI} \texttt{print()} reports AUC (\cite{liu2023}) and FSI (\cite{lorenzoseva2003}) for each factor. AUC measures how rapidly a factor's variance accumulates when loadings are sorted by size. AUC values near 1 indicate one or two dominant loadings with the rest near zero. FSI measures the kurtosis of the squared loadings. FSI values near 1 indicate a sharp contrast between large and small loadings. <>= print(res.ob) @ \subsection{Overall Solution Measures} \texttt{summary()} adds three overall measures (Hoffman, Gini, Bentler), the hyperplane count, and the structure matrix for oblique solutions. <>= summary(res.ob) @ The hyperplane count reports how many loadings fall below a cutoff of 0.10 in absolute value. For a perfectly simple solution with $p$ items and $k$ factors, the theoretical maximum is $p(k-1)$. \subsection{Comparing Rotation Methods} These measures are criterion-free, so they can be used to compare simple structure quality across different rotation methods applied to the same unrotated solution. <>= res.gm <- geominQ(fa.un, randomStarts = 100) auc.ob <- GPArotation:::calc_AUC(res.ob)$AUC_mean auc.vm <- GPArotation:::calc_AUC(res.vm)$AUC_mean auc.gm <- GPArotation:::calc_AUC(res.gm)$AUC_mean hp.ob <- GPArotation:::calc_hyperplane(res.ob)$HP_pct hp.vm <- GPArotation:::calc_hyperplane(res.vm)$HP_pct hp.gm <- GPArotation:::calc_hyperplane(res.gm)$HP_pct cat(sprintf("%-12s AUC = %.3f Hyperplane = %.1f pct\n", "Varimax", auc.vm, hp.vm)) cat(sprintf("%-12s AUC = %.3f Hyperplane = %.1f pct\n", "Oblimin", auc.ob, hp.ob)) cat(sprintf("%-12s AUC = %.3f Hyperplane = %.1f pct\n", "GeominQ", auc.gm, hp.gm)) @ Oblique rotations achieve higher AUC and hyperplane counts than Varimax because the underlying interpersonal factors are correlated; an orthogonal constraint forces loadings to spread across factors. \subsection{Unrotated vs Rotated: Does Rotation Help?} Simple structure measures should always improve after rotation: <>= sim.un <- GPArotation:::calc_simplicity(fa.un) sim.ob <- GPArotation:::calc_simplicity(res.ob) cat(sprintf("%-12s Hoffman = %.3f Gini = %.3f Bentler = %.3f\n", "Unrotated", sim.un$Hoffman, sim.un$Gini, sim.un$Bentler)) cat(sprintf("%-12s Hoffman = %.3f Gini = %.3f Bentler = %.3f\n", "Oblimin", sim.ob$Hoffman, sim.ob$Gini, sim.ob$Bentler)) @ All three indices increase after rotation. If Bentler remains low despite high AUC and hyperplane counts, this typically indicates high factor intercorrelations rather than poor simple structure. \section{Summary} A complete EFA diagnostic workflow in GPArotation: \begin{enumerate} \item Extract a $k$-factor solution via \texttt{factanal}. \item Compare scree plots from raw and reduced correlation matrices (\cite{mulaik2018}). \item Check global fit via \texttt{GPArotation:::calc\_fitstats} across a range of $k$. \item Rotate with \texttt{randomStarts} to avoid local minima. \item Inspect residuals via \texttt{audit\_residuals} and \texttt{plot(..., "residuals")} for local misfit. \item Compare rotation methods using AUC and hyperplane counts. \item Use \texttt{summary()} for the full diagnostic scorecard. \end{enumerate} \begin{thebibliography}{} \bibitem[\protect\citeauthoryear{Bentler}{Bentler}{1977}]{bentler1977} Bentler, P.M. (1977). \newblock Factor simplicity index and transformations. \newblock \textit{Psychometrika}, \textbf{42}, 277--295. \newblock \href{https://doi.org/10.1007/BF02294054} {doi: 10.1007/BF02294054} \bibitem[\protect\citeauthoryear{Griffith \& Mulaik}{Griffith \& Mulaik}{1998}]{griffith1998} Griffith, D. and Mulaik, S.A. (1998). \newblock Personality trait factors from the interpersonal domain. \newblock \textit{Poster presented at the Society for Multivariate Experimental Psychology (SMEP)}. \bibitem[\protect\citeauthoryear{Hu \& Bentler}{Hu \& Bentler}{1999}]{hu1999} Hu, L. and Bentler, P.M. (1999). \newblock Cutoff criteria for fit indexes in covariance structure analysis. \newblock \textit{Structural Equation Modeling}, \textbf{6}(1), 1--55. \newblock \href{https://doi.org/10.1080/10705519909540118} {doi: 10.1080/10705519909540118} \bibitem[\protect\citeauthoryear{Liu et al.}{Liu et al.}{2023}]{liu2023} Liu, X., Wallin, G., Chen, Y., and Moustaki, I. (2023). \newblock Rotation to sparse loadings using $L^p$ losses and related inference problems. \newblock \textit{Psychometrika}, \textbf{88}(2), 527--553. \newblock \href{https://doi.org/10.1007/s11336-023-09911-y} {doi: 10.1007/s11336-023-09911-y} \bibitem[\protect\citeauthoryear{Lorenzo-Seva}{Lorenzo-Seva}{2003}]{lorenzoseva2003} Lorenzo-Seva, U. (2003). \newblock A factor simplicity index. \newblock \textit{Psychometrika}, \textbf{68}(1), 49--60. \newblock \href{https://doi.org/10.1007/BF02296652} {doi: 10.1007/BF02296652} \bibitem[\protect\citeauthoryear{Mulaik}{Mulaik}{2018}]{mulaik2018} Mulaik, S.A. (2018). \newblock Fundamentals of common factor analysis. \newblock In P. Irwing, T. Booth, and D.J. Hughes (Eds.), \textit{The Wiley Handbook of Psychometric Testing} (pp. 211--252). \newblock Wiley. \end{thebibliography} \end{document}