Introduction to Factor Analysis

Factor analysis is used to describe the covariance relationships among many variables in terms of a few underlying, but unobservable random quantities called factors. If variables can be grouped (not the observations) by their correlations, then all variables within a particular group are highly correlated among themselves. This means, each group of variables represents a single underlying hidden factor that is responsible for the observed correlations.

For example, correlations from a group of test scores in Mathematics, Statistics, Chemistry and Physics might correspond to a factor named “intelligence”. A second group of variables representing physical fitness scores might correspond to another factor named “fitness”.

We can use factor analysis to identify the hidden factors behind observed variables which cannot be measured directly. The variables that cannot be measured directly are called latent variables or common factors. However, the latent variables can be examined indirectly by measuring observed variables which are called indicators or manifest variables.

After extracting factors using factor analysis, meaningful names should be given to these factors. A well-defined factor should have at least three high-loading variables.

When considering the theory behind the factor analysis, we can see a model with some underlying assumptions.
For example, suppose we have p number of variables, \(X_{1}\), \(X_{2}\), …. , \(X_{p}\), and assume that m latent (common) factors, \(F_{1}\), \(F_{2}\), …, \(F_{m}\), are hidden behind these p variables. Then, the first variable \({X}_{1}\) can be explained by these factors as follows:

\[X_{1}-\mu{_{1}}={l}_{11}F_{1}+{l}_{12}F_{2}+...+{l}_{1m}F_{m}+\epsilon{_{1}}\]
Now the second variable \({X}_{2}\) will write as

\[X_{2}-\mu{_{2}}={l}_{21}F_{1}+{l}_{22}F_{2}+...+{l}_{2m}F_{m}+\epsilon{_{2}} \] Then, the other variables also can be written in the same way. Finally the \(p^{th}\) variable will write as

\[X_{p}-\mu{_{p}}={l}_{p1}F_{1}+{l}_{p2}F_{2}+...+{l}_{pm}F_{m}+\epsilon{_{p}}\]
Here \(\mu{_{i}}\) is the mean of the variable \(X_{i}\), and \({l}_{ij}, i=1,2,...,p, j=1,2,...,m\) indicates the correlation of each variable with the underlying factor which is usually called as Factor loading. For example, \({l}_{ij}\) is the correlation between the variable\({X}_{i}\) with factor \({F}_{j}\).
The symbol \(\epsilon{_{i}}\) represents the error variance, and it is also known as the \(i^{th}\) specific factor which cannot be explained by the common factors. The above equations can be written in the matrix notation as follows:
\[\mathbf{X}-\mathbf{\mu}=\mathbf{L}\mathbf{F}+\mathbf{\epsilon}\]
where \(\mathbf{\mu}\)=\(\left( \mu_{1}, \mu _{2},...,\mu _{p}\right)^{'}\), \(\mathbf{F}=\left (F_{1},F_{2},...,F_{m} \right)\) and \(\mathbf{\epsilon}=\left (\epsilon_{1},\epsilon_{2},...,\epsilon_{m} \right)\) and

\(\mathbf{L}=\begin{pmatrix} l_{11} & l_{12} &. & l_{1m} \\ l_{21} & l_{22} &. & l_{2m}\\ .& .& .& .\\ l_{p1}&l_{p2} &. &l_{pm} \end{pmatrix}\).

Assumptions on \(\mathbf{F}\) and \(\mathbf{\epsilon}\)
(a) Common factors have zero mean and they are uncorrelated.
i.e. \(E\left(\mathbf{F}\right )=\mathbf{0}\) and \(Cov\left(\mathbf{F}\right )=E\left(FF^{'}\right )=\mathbf{I}\)
(b) \(E\left(\mathbf{\epsilon }\right )=\mathbf{0}\) and
\[Cov\left(\mathbf{\epsilon}\right )=E\left(\epsilon\epsilon^{'}\right )=\mathbf{\Psi}=\begin{pmatrix} \psi_{1} &0 &0 &0 \\ 0& \psi_{2}& 0& 0\\ .& .& .& .\\ 0 & 0 & 0 &\psi_{p} \end{pmatrix}\]
(c) Specific factors \(\mathbf{\epsilon }\) and common factors \(\mathbf{F}\) are uncorrelated. i.e. \(Cov\left(\mathbf{\epsilon, F}\right )=E\left(\epsilon F^{'}\right )=\mathbf{I}\).

Model \(\mathbf{X}-\mathbf{\mu}=\mathbf{L}\mathbf{F}+\mathbf{\epsilon}\) with the above assumptions are called Orthogonal factor model. According to the properties of this model, it can be shown that the covariance matrix \(\mathbf{\Sigma}\) of this model is \(\mathbf{\Sigma}=\mathbf{L}\mathbf{L}'+\mathbf{\psi}\), and
\[Var(X_{i})=\sigma_{ii}=l_{i1}^{2}+l_{i2}^{2}+...+l_{im}^{2}+\psi_{i}=h_{i}^{2}+\psi_{i}\] for each \(i=1,2,...,p\). Here \(h_{i}^{2}\) is called the \(i^{th}\) communality and \(\psi_{i}\) is called the uniqueness or \(i^{th}\) specific variance. According to this result we can see that a portion of the variance of the \(i^th\) variable is explained by the m common factors.
Since the latent factors, \(F_{1}\), \(F_{2}\), …, \(F_{m}\), and the variance of the specific factors \(\psi_{1}\), \(\psi_{2}\), …, \(\psi_{p}\),cannot be measured directly, we have to estimate them.
Several methods of estimation are availble in literature that are used in Factor analysis. Two such methods are principal component method and maximum likelihood method. Principal component method is a non-parametric method, and use principal component analysis to obtain factors. To apply maximum likelihood method the data should have a multivariate normal distribution; and hence it is a parametric method. First we consider the principal component method.

Principal component (PC) method

Principal component method is based on the eigen analysis of the covariance or correlation matrix of data. If the variation among the variables are high we use the correlation matrix for eigen analysis. When you use correlation matrix for you automatically select the standardized variables for the analysis.
In this method, we select number of principal components which descibe most of the covariance or correlation structure of the data. The variance of each principal component is equal to the respective eigen value of the covariance or correlation matrix. Jolliffe’s cutoff value and the scree plot can be used to select the number of principal components(PC).
According to Joliffe’s method, cutoff value of eigenvalue is 0.7 if the correlation matrix \(\mathbf{R}\) is used, and \(0.7\lambda\) if the covariance matrix \(\mathbf{S}\) is used for the analysis, where \(\lambda\) is the average of eigenvalues.
Then to estimate latent factors and specific factors, the selected eigenvalue-eigenvector pairs are used. For example, if we have selected two PC’s based on the cutoff value, estimate the \(\mathbf{L}\) matrix having factor loadings as follows:
\[\mathbf{\hat{L}}=\left(\sqrt{\lambda_{1}}\mathbf{e_{1}}, \sqrt{\lambda_{2}}\mathbf{e_{2}}\right)\]
Here \(\lambda_{1}\) and \(\lambda_{2}\) are the first and the second largest eigen values of the covariance or correlation matrix, and \(\mathbf{e_{1}}\) and \(\mathbf{e_{2}}\) are the corresponding eigen vectors. The elements of the above estimated latent factor loading matrix are the \(l_{ij}\) values. Then, the variance of each specific factor can be estimated as follows:
(i) \(\psi_{i}=S_{ii}-h_{i}^{2}\) if the sample covariance matrix \(\mathbf{S}\) is used, where \(S_{ii}\) is the \(i^{th}\) diagonal element of \(\mathbf{S}\).
(ii) \(\psi_{i}=1-h_{i}^{2}\) if the sample correlation matrix \(\mathbf{R}\) is used.
Then, the proportion of total sample variance due to \(j^{th}\) factor=\(\frac{\hat{\lambda_{j}}}{tr\mathbf{S}}\), if the sample covariance matrix \(\mathbf{S}\) is used for the analysis.
Similarly, the proportion of total sample variance due to \(j^{th}\) factor=\(\frac{\hat{\lambda_{j}}}{p}\), if the sample correlation matrix \(\mathbf{R}\) is used for the analysis.

Maximum Likelihood (ML) method

Suppose the \(\mathbf{X}\) variables have a multivariate normal distribution with mean vector \(\mathbf{\mu}\) and covariance matrix \(\mathbf{\Sigma}\). i.e. \(\mathbf{X}\sim N\left(\mathbf{\mu},\mathbf{\Sigma}\right)\).
Then to estimate \(\mathbf{L}\) matrix of factor loadings and \(\mathbf{\psi}\) matrix of specific variances, we maximize the following loglikelihood function derived from the above multivariate normal distribution by using an iterative algorithm:
\[LogL\left(\mathbf{\mu},\mathbf{L},\mathbf{\psi}\right)=-\frac{nplog(2\pi)}{2}+\frac{nlog\left(\left|\mathbf{\Sigma^{-1}}\right|\right )}{2}-\frac{\left(\mathbf{X}-\mathbf{\mu}\right)'\mathbf{\Sigma^{-1}}\left(\mathbf{X}-\mathbf{\mu}\right)}{2}\] where \(\mathbf{\Sigma}=\mathbf{L}\mathbf{L}'+\mathbf{\psi}\).

In addition to the ML and PC methods, other methods, namely least squares method, Alpha factoring method and Image factoring method, are available to estimate \(\mathbf{L}\) and \(\mathbf{\psi}\).

Factor rotation

If the original loadings of the \(\mathbf{L}\) may not be readily interpretable, it is usual practice to rotate them until a simple structure is achieved. The covariance (or correlation) matrix remains unchanged if we apply orthogonal transformation to rotate factor loading matrix \(\mathbf{L}\).
Many popular orthogonal factor rotation methods can be obtained by maximizing the following function:
\[V\left(\mathbf{L},\mathbf{R}\mid\gamma\right)=\frac{1}{p}\sum_{j=1}^{m}\left[\sum_{i=1}^{p}\left(\tilde{l_{ij}}/\tilde{h_{i}}\right)^{4}-\frac{\gamma}{p}\left(\sum_{i=1}^{p}\left ( \tilde{l_{ij}}/\tilde{h_{i}} \right)^{2}\right)^{2}\right ]\] where \(\tilde{l_{ij}}\) is the rotated loading of the \(i^{th}\) variable on the \(j^{th}\) factor, and \(\tilde{h}_{i}\) is the square root of the communality for variable \(X_{i}\).
If we use
\(\gamma=1\) we have the Varimax rotation
\(\gamma=0\) we have the quartimax rotation
\(\gamma=m/2\) we have the equamax rotation
\(\gamma=p(m-1)/(p+m-2)\) we have the parsimax rotation

Oblique Factor Model

The Factor model may not be orthogonal in some practical situations if the factors are not uncorrelated as in the orthogonal factor model. Then the model is called an oblique factor model, and now the first assumption of the orthogonal factor model becomes, common factors have mean zero, unit variance, and they are correlated.i.e.
\[E\left(\mathbf{F}\right )=\mathbf{0}\] and \[Cov\left(\mathbf{F}\right )=\mathbf{\phi}\].

According to the correlation structure of the oblique factor model, the covariance matrix \(\mathbf{\Sigma}\) now becomes \(\mathbf{\Sigma}=\mathbf{L}\mathbf{\phi }\mathbf{L}'+\mathbf{\psi}\).
For oblique factor models, \(\mathbf{L}\) is called the pattern matrix, and \(\mathbf{L}\mathbf{\phi}\) is called the structure matrix.
The structure matrix \(\mathbf{L}\mathbf{\phi}\) gives the covariance between the observed \(\mathit{X}\) variables and the \(\mathbf{F}\) latent factors. Since factors are orthogonal \(\mathbf{\phi}=\mathbf{I}\) for orthogonal factor model, and \(\mathbf{L}\mathbf{\phi}=\mathbf{L}\). Therefore, for orthogonal factor model the pattern and structure matrices are identical.
The oblique factor model is transformed to an orthogonal factor model to estimate parameters, and then use oblique factor rotation methods to rotate the resulted solution. two popular oblique rotation methods are oblimin, promax and quartimin. Use the R package GPArotation to consider many other options for oblique rotation.

An example for factor analysis is given here.

Further details:
Applied Multivariate Statistical Analysis By R.A. Johnson & DW Wichern