Factor Analysis - Example
To understand the theory and details behind the factor analysis read the Introduction to Factor Analysis.
In this post, an example for factor analysis is given.
Example
Suppose a customer survey was conducted while purchasing car. In the questionnaire, 9 different variables were included, and 75 customers were participated for the study. The survey questions were framed using 5-point likert scale with 1 being very low and 5 being very high. The variables included to the questionnaire were the price, safety, exterior design, space and comfort, technology, resale value, fuel type, color, and maintenance.
Now we import the data set. Suppose your data set is a .csv file named data.csv saved in your working directory. Now, we to import the data using the following codes:
data=read.csv("data.csv")
head(data)
## Price Safety Exterior_Design Space_comfort Technology Resale_Value
## 1 4 4 5 4 3 5
## 2 3 5 3 3 4 3
## 3 4 4 3 4 5 5
## 4 4 4 4 3 3 5
## 5 5 5 4 4 5 5
## 6 4 4 5 3 4 3
## Fuel_Type Color Maintenance
## 1 4 2 4
## 2 4 4 3
## 3 4 4 5
## 4 5 4 4
## 5 3 5 5
## 6 4 2 3
Summary statistics
First, we find summary statistics of varibles. We can simply use summary()
function to get them. However, here we expect to use stat.desc()
function in pastecs
package since it gives more descriptive statistics. We apply round
function to obtain the resulted values to the second decimal place.
suppressMessages(library(pastecs))
sumstat<- stat.desc(data)
round(sumstat, 2)
## Price Safety Exterior_Design Space_comfort Technology
## nbr.val 75.00 75.00 75.00 75.00 75.00
## nbr.null 0.00 0.00 0.00 0.00 0.00
## nbr.na 0.00 0.00 0.00 0.00 0.00
## min 3.00 3.00 1.00 2.00 1.00
## max 5.00 5.00 5.00 5.00 5.00
## range 2.00 2.00 4.00 3.00 4.00
## sum 314.00 316.00 283.00 297.00 310.00
## median 4.00 4.00 4.00 4.00 4.00
## mean 4.19 4.21 3.77 3.96 4.13
## SE.mean 0.07 0.07 0.10 0.08 0.11
## CI.mean.0.95 0.14 0.15 0.19 0.15 0.21
## var 0.40 0.41 0.69 0.44 0.85
## std.dev 0.63 0.64 0.83 0.67 0.92
## coef.var 0.15 0.15 0.22 0.17 0.22
## Resale_Value Fuel_Type Color Maintenance
## nbr.val 75.00 75.00 75.00 75.00
## nbr.null 0.00 0.00 0.00 0.00
## nbr.na 0.00 0.00 0.00 0.00
## min 1.00 3.00 1.00 2.00
## max 5.00 5.00 5.00 5.00
## range 4.00 2.00 4.00 3.00
## sum 290.00 308.00 279.00 300.00
## median 4.00 4.00 4.00 4.00
## mean 3.87 4.11 3.72 4.00
## SE.mean 0.13 0.08 0.11 0.09
## CI.mean.0.95 0.27 0.16 0.22 0.19
## var 1.33 0.47 0.93 0.65
## std.dev 1.15 0.69 0.97 0.81
## coef.var 0.30 0.17 0.26 0.20
Then, we obtain covariances and correlations matrices of the data set. We can either use covariance or correlation matrix to identify factors. Use of correlation matrix indicates that we use standardized variables.
S=cov(data)
S
## Price Safety Exterior_Design Space_comfort
## Price 0.39711712 -0.04036036 0.015855856 0.06162162
## Safety -0.04036036 0.41333333 -0.045585586 0.14378378
## Exterior_Design 0.01585586 -0.04558559 0.691171171 0.07189189
## Space_comfort 0.06162162 0.14378378 0.071891892 0.44432432
## Technology 0.01531532 0.05225225 0.030630631 0.16756757
## Resale_Value 0.29549550 -0.18738739 -0.003603604 -0.14054054
## Fuel_Type 0.06090090 0.08504505 0.038018018 0.22054054
## Color 0.05297297 -0.02054054 -0.280540541 -0.07891892
## Maintenance 0.16216216 -0.12162162 -0.067567568 0.05405405
## Technology Resale_Value Fuel_Type Color
## Price 0.01531532 0.295495495 0.06090090 0.05297297
## Safety 0.05225225 -0.187387387 0.08504505 -0.02054054
## Exterior_Design 0.03063063 -0.003603604 0.03801802 -0.28054054
## Space_comfort 0.16756757 -0.140540541 0.22054054 -0.07891892
## Technology 0.84684685 -0.076576577 0.13423423 0.06486486
## Resale_Value -0.07657658 1.333333333 -0.02612613 0.19189189
## Fuel_Type 0.13423423 -0.026126126 0.47495495 -0.01027027
## Color 0.06486486 0.191891892 -0.01027027 0.93405405
## Maintenance 0.14864865 0.445945946 0.02702703 0.27027027
## Maintenance
## Price 0.16216216
## Safety -0.12162162
## Exterior_Design -0.06756757
## Space_comfort 0.05405405
## Technology 0.14864865
## Resale_Value 0.44594595
## Fuel_Type 0.02702703
## Color 0.27027027
## Maintenance 0.64864865
#Total variance
sum(diag(S))
## [1] 6.183784
cor.data=cor(data)
cor.data
## Price Safety Exterior_Design Space_comfort
## Price 1.00000000 -0.09961977 0.030264784 0.1466979
## Safety -0.09961977 1.00000000 -0.085287329 0.3355132
## Exterior_Design 0.03026478 -0.08528733 1.000000000 0.1297290
## Space_comfort 0.14669786 0.33551323 0.129728997 1.0000000
## Technology 0.02640974 0.08831864 0.040036923 0.2731728
## Resale_Value 0.40608990 -0.25241826 -0.003753832 -0.1825922
## Fuel_Type 0.14022912 0.19194314 0.066354511 0.4800784
## Color 0.08697792 -0.03305793 -0.349153835 -0.1225025
## Maintenance 0.31951074 -0.23488529 -0.100911513 0.1006870
## Technology Resale_Value Fuel_Type Color
## Price 0.02640974 0.406089905 0.14022912 0.08697792
## Safety 0.08831864 -0.252418262 0.19194314 -0.03305793
## Exterior_Design 0.04003692 -0.003753832 0.06635451 -0.34915384
## Space_comfort 0.27317282 -0.182592194 0.48007841 -0.12250254
## Technology 1.00000000 -0.072064959 0.21165798 0.07293250
## Resale_Value -0.07206496 1.000000000 -0.03283065 0.17194963
## Fuel_Type 0.21165798 -0.032830647 1.00000000 -0.01541948
## Color 0.07293250 0.171949632 -0.01541948 1.00000000
## Maintenance 0.20056436 0.479521510 0.04869309 0.34722222
## Maintenance
## Price 0.31951074
## Safety -0.23488529
## Exterior_Design -0.10091151
## Space_comfort 0.10068702
## Technology 0.20056436
## Resale_Value 0.47952151
## Fuel_Type 0.04869309
## Color 0.34722222
## Maintenance 1.00000000
To visualize these correlations, now we obtain the correlation plot by using corrplot package.
suppressMessages(library(corrplot))
corrplot(cor.data, order = "AOE", method = "circle", bg = "grey", tl.cex=0.8, col = colorRampPalette(c("yellow","green","navyblue"))(100))
In this plot, positive correlations are displayed in blue and negative correlations in red color. Color intensity and the size of the circles are proportional to the correlation coefficients. The correlation plot shows that most of the variables are fairly correlated.
Correlation tests
Now, we apply Bartlett’s test of sphericity (the null hypothesis of the test is that there is no correlation between the variables) to identify whether the above correlations are significant. The test statistic follows a chi square distribution.
suppressMessages(library(REdaS))
bart_spher(data)
## Bartlett's Test of Sphericity
##
## Call: bart_spher(x = data)
##
## X2 = 110.805
## df = 36
## p-value < 2.22e-16
According to the Bartlett’s test of sphericity results, there is a significant (p < 0.000) correlation among variables. This indicates that factor analysis can be performed and the variables can be grouped.
Testing the assumptions to do factor analysis, KMO test is used which measures Sampling Accuracy(MSA). The overall MSA value has to be greater than 0.5 from KMO test to do factor analysis.
library(psych)
data.KMO <- KMO(cor.data)
data.KMO
## Kaiser-Meyer-Olkin factor adequacy
## Call: KMO(r = cor.data)
## Overall MSA = 0.58
## MSA for each item =
## Price Safety Exterior_Design Space_comfort
## 0.67 0.60 0.53 0.52
## Technology Resale_Value Fuel_Type Color
## 0.65 0.58 0.64 0.58
## Maintenance
## 0.56
Here the overall MSA=0.58 which is greater than 0.5, and we can perform factor analysis.
Extracting factors
We use the correlation matrix for eigen analysis.
R.eigen <- eigen(cor(data))
R.eigen
## eigen() decomposition
## $values
## [1] 2.0991580 1.8826227 1.3426882 0.9493682 0.6852855 0.6128772 0.5617019
## [8] 0.5521248 0.3141736
##
## $vectors
## [,1] [,2] [,3] [,4] [,5]
## [1,] 0.35208657 -0.29959115 0.26223876 -0.40688984 0.25290772
## [2,] -0.35104767 -0.23797874 -0.32525215 -0.38787589 0.58564776
## [3,] -0.15889801 -0.07501709 0.67225367 0.24010197 0.09309004
## [4,] -0.22084213 -0.57248368 0.03323658 -0.08809602 -0.02650783
## [5,] -0.02342334 -0.40184482 -0.13248698 0.71305186 0.31788215
## [6,] 0.52939758 -0.05999686 0.22690195 -0.19187880 0.16172774
## [7,] -0.12536500 -0.52480053 0.01805726 -0.16813505 -0.65848265
## [8,] 0.35617344 -0.05608604 -0.54939200 0.05411791 -0.14736644
## [9,] 0.50536562 -0.27677794 -0.03488957 0.20524260 0.03412889
## [,6] [,7] [,8] [,9]
## [1,] 0.431400452 -0.50119993 0.1636832 0.15456214
## [2,] -0.342595309 0.05087181 -0.2016247 0.24555996
## [3,] -0.583047558 -0.30690187 0.1010309 0.08089313
## [4,] 0.004657572 0.26643126 0.5055957 -0.53599311
## [5,] 0.268603616 -0.16486300 -0.3227215 -0.08631563
## [6,] -0.220083482 0.26172032 -0.5122925 -0.47149003
## [7,] -0.054034721 -0.07776970 -0.4310997 0.22772426
## [8,] -0.450906799 -0.51300094 0.1858511 -0.20613131
## [9,] -0.172437302 0.46490956 0.2823838 0.54578095
fit <- princomp(data, cor=TRUE)
plot(fit,type="lines") # scree plot
Here we check the number of eigen values greater than or equal to 0.7, which is the Jolliffe’s method for identifying number of principal components. Here we have four eigen values greater than 0.7. The non-steep of the graph, which is called scree plot, can be seen after the fourth eigen value. That implies we can have four hidden common factors corresponding to the data. Now we obtain factor loadings, proportion of variance and the biplot.
fit <- princomp(data, cor=TRUE)
summary(fit) # print variance accounted for
## Importance of components:
## Comp.1 Comp.2 Comp.3 Comp.4 Comp.5
## Standard deviation 1.4488471 1.3720870 1.1587442 0.9743553 0.82781974
## Proportion of Variance 0.2332398 0.2091803 0.1491876 0.1054854 0.07614284
## Cumulative Proportion 0.2332398 0.4424201 0.5916076 0.6970930 0.77323584
## Comp.6 Comp.7 Comp.8 Comp.9
## Standard deviation 0.78286472 0.74946772 0.7430510 0.56051192
## Proportion of Variance 0.06809746 0.06241132 0.0613472 0.03490818
## Cumulative Proportion 0.84133330 0.90374462 0.9650918 1.00000000
loadings(fit) # pc loadings
##
## Loadings:
## Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 Comp.8
## Price 0.352 0.300 0.262 0.407 0.253 0.431 0.501 0.164
## Safety -0.351 0.238 -0.325 0.388 0.586 -0.343 -0.202
## Exterior_Design -0.159 0.672 -0.240 -0.583 0.307 0.101
## Space_comfort -0.221 0.572 -0.266 0.506
## Technology 0.402 -0.132 -0.713 0.318 0.269 0.165 -0.323
## Resale_Value 0.529 0.227 0.192 0.162 -0.220 -0.262 -0.512
## Fuel_Type -0.125 0.525 0.168 -0.658 -0.431
## Color 0.356 -0.549 -0.147 -0.451 0.513 0.186
## Maintenance 0.505 0.277 -0.205 -0.172 -0.465 0.282
## Comp.9
## Price 0.155
## Safety 0.246
## Exterior_Design
## Space_comfort -0.536
## Technology
## Resale_Value -0.471
## Fuel_Type 0.228
## Color -0.206
## Maintenance 0.546
##
## Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 Comp.8
## SS loadings 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
## Proportion Var 0.111 0.111 0.111 0.111 0.111 0.111 0.111 0.111
## Cumulative Var 0.111 0.222 0.333 0.444 0.556 0.667 0.778 0.889
## Comp.9
## SS loadings 1.000
## Proportion Var 0.111
## Cumulative Var 1.000
fit$scores # the principal components
## Comp.1 Comp.2 Comp.3 Comp.4 Comp.5
## [1,] -0.3035647 -0.64402922 2.401017368 0.54219661 -0.002072474
## [2,] -1.5357999 -1.69042873 -1.847094738 -0.02285481 0.041620717
## [3,] 1.4037520 0.51633338 -0.705154680 -0.80574458 0.203640827
## [4,] 0.7812858 -0.71602472 0.418573635 0.83275816 -1.343683825
## [5,] 1.7782536 0.75023698 -0.580130769 -0.14107251 2.445867595
## [6,] -1.5504758 -1.51963095 1.853840625 -0.44895365 0.061053399
## [7,] -1.2940078 -1.63806818 -1.142645305 0.28289125 -1.263240227
## [8,] 1.7610426 0.23452307 0.618215901 -1.70386881 -0.600699841
## [9,] 1.0009405 -0.04857035 1.675391215 1.07947933 0.094947781
## [10,] 1.9251728 0.47675597 -1.301257466 -0.74955884 -0.871737542
## [11,] 1.5176545 -0.11765081 -0.268980126 -0.14477340 1.526919964
## [12,] 2.4655956 -0.57167906 -0.004264804 -1.89329350 -0.714171625
## [13,] 0.3281424 -0.29805680 2.357405401 0.28564336 0.040588637
## [14,] 0.2327117 0.97543083 -1.719850741 0.31725795 1.046566607
## [15,] 2.1202064 -1.06971520 0.712957619 -1.05685090 -0.908423998
## [16,] 1.5736536 0.59479523 -0.498629430 1.85021420 -1.245918065
## [17,] 0.6780655 -0.71590939 1.372030770 -0.44928822 1.237528021
## [18,] 0.4806337 0.41171598 -1.100806921 -1.14032643 -0.078366187
## [19,] 2.1210760 0.56139556 -1.418007843 0.26223468 1.411269652
## [20,] 1.4293769 0.07672020 -0.560215587 -0.02567480 -0.144118250
## [21,] 1.5906813 1.85241211 0.554221362 -0.20085440 -0.241505817
## [22,] 2.2094365 -1.77511726 -0.801088373 -1.59160206 0.092347571
## [23,] 1.4037520 0.51633338 -0.705154680 -0.80574458 0.203640827
## [24,] 1.3345222 0.64897391 -0.242602207 0.10083704 0.565013378
## [25,] 1.0538289 0.93418597 0.280219950 -0.07081300 -0.993298557
## [26,] 3.0700391 -1.64807614 1.492233587 -0.21948465 0.068976748
## [27,] -1.2376319 -0.47983122 -1.840509329 -0.14635618 0.044246908
## [28,] 0.4806337 0.41171598 -1.100806921 -1.14032643 -0.078366187
## [29,] 1.2934657 0.13078355 -1.257645499 -0.49300559 -0.914398653
## [30,] 2.5322769 -0.49310188 1.155717581 -0.51938110 0.417481330
## [31,] -0.3035647 -0.64402922 2.401017368 0.54219661 -0.002072474
## [32,] 0.1787338 0.23603967 -0.563364110 1.74483565 1.091660136
## [33,] -2.1498078 -0.89034086 -0.454504583 0.39023364 0.449602711
## [34,] -1.3117294 1.94066453 3.222550278 0.32367380 0.206383430
## [35,] 0.2012052 -1.49543259 0.283849069 0.22521592 1.998462073
## [36,] 1.5302949 0.50302445 -0.339101202 0.45706264 0.118911906
## [37,] 2.2173047 1.72309056 -0.369444440 0.68638891 -1.771819318
## [38,] -1.8583900 -1.09461904 2.048977094 0.46416800 -0.326740599
## [39,] -0.3968574 -4.44702999 0.108183041 1.68149221 0.086859885
## [40,] -1.9690735 -2.07581138 -0.939505109 -0.85370707 0.058928633
## [41,] -0.5891435 0.90855621 -0.383307167 0.62582492 -0.259896229
## [42,] -0.7389407 -2.85766686 -1.164653964 1.94170267 0.712042811
## [43,] -0.5516685 1.83160426 -0.905393120 0.70250397 -0.453437854
## [44,] -0.9483073 2.21279448 -0.478048884 -0.02119298 0.047827927
## [45,] -2.7677848 -0.85900317 -0.242329057 0.97133755 0.042086352
## [46,] 0.1436950 -0.23071987 0.099627878 -0.22716111 -0.215056590
## [47,] 0.8860344 3.23895257 0.336880063 0.19245161 0.594745782
## [48,] -0.1603567 -0.61004012 -0.269515255 -1.42013226 0.953605565
## [49,] -0.9495711 -0.62900099 -0.054586275 -0.13789879 -0.398721208
## [50,] -0.4924520 -0.24114189 1.251214057 -0.27600020 -1.983184529
## [51,] -1.2561896 -1.69519880 -0.544384060 -0.09112014 -0.458909368
## [52,] -0.7794232 -0.33533727 -0.296024363 -0.56174297 -0.497063603
## [53,] 1.7581964 1.11713564 -2.339594033 4.47572524 -1.247195639
## [54,] -1.1407230 2.30363553 0.336008733 -0.31194162 0.160554205
## [55,] 1.6406672 -1.86860973 1.267562158 1.24257934 -1.383573107
## [56,] -0.8462550 0.21614906 0.625150716 1.34316892 1.216889612
## [57,] -0.8590260 -0.63511525 0.715523175 0.08576496 -0.104210997
## [58,] 0.3268273 -0.99734460 0.073249959 -0.47277155 0.746849995
## [59,] -2.7964664 1.42674146 -0.237145925 -0.87270766 -0.427144076
## [60,] -0.1565597 1.49266334 1.016238706 -0.56963164 -0.453447663
## [61,] -1.8798760 2.25257679 -1.356913502 -0.43249756 -0.291741056
## [62,] 0.3351582 0.18947012 1.090851712 0.47924008 0.342483777
## [63,] -1.0595808 2.95295203 2.428187258 0.25065198 1.067769092
## [64,] -0.3887173 -1.48449642 -1.282347745 0.20827263 0.758617497
## [65,] -1.8501103 2.23470336 0.358960797 0.77447053 0.421703676
## [66,] -1.8501103 2.23470336 0.358960797 0.77447053 0.421703676
## [67,] -2.0873794 -0.29534941 -1.127778838 -0.96062135 0.462071152
## [68,] -1.1129623 0.52928782 -0.245826987 -0.42869109 -0.537098524
## [69,] -1.7821443 -0.73973265 0.319870933 -0.24881690 -0.386218011
## [70,] -0.1136409 -0.57337774 0.288515140 0.03849628 -0.744599600
## [71,] -1.1385871 0.96890100 -0.390766081 -1.20876087 -0.189339447
## [72,] -0.3434890 0.15658461 -0.243137336 -1.17452182 -0.008301020
## [73,] -0.5083638 0.34755386 -2.380565535 0.01435104 0.683313370
## [74,] -3.0149377 -0.06441640 -0.415456867 -1.60059553 -1.545402689
## [75,] 0.2893523 -1.92039265 0.595335912 -0.54945060 0.940391620
## Comp.6 Comp.7 Comp.8 Comp.9
## [1,] -0.6141886842 -1.08694448 -0.17272212 -0.06537187
## [2,] -0.4810407951 -0.02157190 -1.18377885 0.30187907
## [3,] 0.2306425558 -0.98189173 -0.38334346 -0.19735549
## [4,] -0.9335111884 0.12616621 -1.30122559 0.54940243
## [5,] -0.7134383005 0.53155158 0.50810910 -0.01533283
## [6,] 0.2719353454 0.53330859 -0.74906706 0.78963097
## [7,] -0.2313847683 -0.52466221 0.24860280 -0.79772871
## [8,] 0.0610786738 -0.53059228 0.05472322 -0.48392172
## [9,] -0.8643924552 0.78250128 0.47596043 -0.24789078
## [10,] -0.3250199011 0.06848217 -1.58309967 0.73009550
## [11,] 1.2143080960 0.28633362 0.15491677 -0.96531828
## [12,] -0.4156502630 0.40617622 -0.51528574 0.11087135
## [13,] -0.8297353110 -1.66808142 0.18025764 0.61685432
## [14,] 0.5129721176 1.31102027 0.29633188 0.35928638
## [15,] -0.2398046616 -0.30855781 -0.35582737 0.42001969
## [16,] -0.0009514879 0.68720428 -0.20487626 -0.32586613
## [17,] 0.2490817246 -1.97296795 0.33460943 0.09181162
## [18,] 0.6144053324 -0.52552619 0.50994850 0.62478856
## [19,] -0.0933716406 0.67591031 -1.00758419 1.02888171
## [20,] -0.0632064249 -1.16224979 -0.03029018 -0.10292727
## [21,] 0.1348598939 0.30404666 -0.62925579 0.48018032
## [22,] 0.5848644189 0.50206858 -0.36086043 -1.00196933
## [23,] 0.2306425558 -0.98189173 -0.38334346 -0.19735549
## [24,] 1.1353745758 0.39993902 -0.47483048 -0.63266036
## [25,] -0.8481744799 -0.67700520 -0.53769525 0.32768722
## [26,] -0.9412638458 1.10418770 1.20440298 0.31194776
## [27,] -0.6895530673 -1.00510136 -0.06719523 0.17459207
## [28,] 0.6144053324 -0.52552619 0.50994850 0.62478856
## [29,] -0.1094732743 0.64961911 -1.93607942 0.04786931
## [30,] 0.5353486064 -0.02386194 1.29901648 -0.47523010
## [31,] -0.6141886842 -1.08694448 -0.17272212 -0.06537187
## [32,] 0.5206015692 0.72105765 -0.79048146 -0.83593553
## [33,] 0.2735013354 -1.72089914 -1.25401077 -0.48926591
## [34,] 1.2760367137 -0.16756072 -1.03005397 -0.95166832
## [35,] -0.5485712379 -0.53472277 -0.90410380 0.38890131
## [36,] 0.8178603565 -0.13337322 0.67784853 0.55506607
## [37,] 1.7965849902 -0.94232204 0.55844151 -0.81534061
## [38,] -0.0148792807 -0.04944199 0.36759009 0.07454599
## [39,] 0.7221762757 0.60543138 -1.02753231 -0.04858881
## [40,] 0.8029737029 -0.76410276 0.77834784 -0.59902270
## [41,] -0.9772171175 0.05265425 -0.75975353 0.44105585
## [42,] -0.3006198431 0.30480040 0.41356769 0.40499890
## [43,] -1.4398773449 0.18463771 0.19744524 -0.58317746
## [44,] -0.6763337822 -0.16938021 -0.34920294 -0.46288555
## [45,] 0.0687527959 0.49000007 1.01641026 0.96853615
## [46,] -0.3618129445 0.01870912 0.18571822 -0.27612487
## [47,] -1.5703055211 0.72800904 0.13456066 -0.06157361
## [48,] 0.2028509447 0.31364454 0.90905816 -0.29213898
## [49,] 0.0456150707 0.82802883 0.27938444 -0.54727904
## [50,] 0.7809653970 0.25873574 -0.67487918 -0.79549583
## [51,] 1.4061912212 0.55258896 -0.35351053 0.79006973
## [52,] 0.0219498321 0.47507465 1.07901017 0.54601918
## [53,] 0.0784366953 -1.47997993 1.05740505 -0.07568704
## [54,] -1.3823683170 0.20225897 -0.22686076 -0.36492898
## [55,] 0.2691107854 0.94073959 0.70543815 0.58464327
## [56,] 0.4761430048 0.78650362 -0.41508821 -0.11218682
## [57,] 0.3234282644 0.06547008 -0.36085644 -0.74363094
## [58,] -0.2828794243 -0.09489628 0.81546546 -0.60878279
## [59,] -1.6641256949 0.21088489 -0.39468728 -0.88300447
## [60,] 0.0281347626 1.71318833 0.03339859 0.71805475
## [61,] 0.8594928727 -0.97198402 1.02777766 1.56932025
## [62,] 0.7970670307 0.28502694 0.25361605 0.18551657
## [63,] 0.5003541955 -1.18131211 -0.19317299 1.20837889
## [64,] -2.2720790330 -0.62123242 0.78489640 -0.59147320
## [65,] 0.8899742082 0.90625732 -0.18763889 -0.27239828
## [66,] 0.8899742082 0.90625732 -0.18763889 -0.27239828
## [67,] -0.8861919945 0.12803282 -0.65088608 -0.50410577
## [68,] 0.0289841867 0.07268213 1.84261404 -0.26349402
## [69,] 0.7071910410 0.52183562 0.53243551 0.07851311
## [70,] -1.6395096538 0.97254899 1.30135456 0.11261182
## [71,] 0.3228331674 0.25304018 1.48956077 -0.35792223
## [72,] 0.1239174245 0.42724994 0.27931092 0.04051894
## [73,] 0.9995158413 -0.39568870 -0.28109794 0.22912860
## [74,] 0.4248024967 0.24328804 -0.26521176 0.48089351
## [75,] 0.1797808031 -0.22687974 -0.14173331 0.41545053
biplot(fit,scale=0)
Note that the cumulative proportion of variance of the first four principal components explains approximately 70% of the variance of the data. Facator loadings for all 9 components are shown, and they are correlated.Biplot clealy shows the four different facors visually.
Parallel Analysis
To confirn the number of factors further, we now perform Parallel Analysis
. The function fa.parallel
function in psych
package can be used to execute parallel analysis. Here we apply minres
as the factor method, and identify the acceptable number of factors by generating the scree plot
.
suppressMessages(library(psych))
fa.parallel(data, fm = 'minres', fa = 'fa')
## The estimated weights for the factor scores are probably incorrect. Try a different factor extraction method.
## The estimated weights for the factor scores are probably incorrect. Try a different factor extraction method.
## The estimated weights for the factor scores are probably incorrect. Try a different factor extraction method.
## The estimated weights for the factor scores are probably incorrect. Try a different factor extraction method.
## The estimated weights for the factor scores are probably incorrect. Try a different factor extraction method.
## The estimated weights for the factor scores are probably incorrect. Try a different factor extraction method.
## The estimated weights for the factor scores are probably incorrect. Try a different factor extraction method.
## The estimated weights for the factor scores are probably incorrect. Try a different factor extraction method.
## Parallel analysis suggests that the number of factors = 3 and the number of components = NA
The output suggests the maximum number of factors is 3. But it is also closer to 4. The blue line of the scree plot shows eigenvalues of actual data and the two red lines show simulated and resampled data. Here, we have to check the large steep in the actual data and identify the point where it levels off to the right. Also, we have to locate the the point where the gap between simulated data and actual data tends to be minimum. Therefore, the parallel analysis scree plot suggests that anywhere between 3 to 4 factors would be a good choice.
Factor rotation
If the original loadings are not clear, and may not be readily interpretable, it is usual practice to rotate them until a simple structure is achieved. The fa( )
function in psych
package can be used to rotate factors. If the factors are uncorrelated we can use varimax
factor rotation, and otherwise we use Oblimin
factor rotation. As the factor extraction method, we apply Minimum Residual (OLS)
. The other available methods are Maximum Liklihood
, Principal Component
etc.
Since the original factors are correlated, we apply oblique rotation (oblimin
). Also, we use Ordinary Least Squared
factoring i.e. minres
for the argument fm
in fa( )
function. This factoring method provides results similar to Maximum Likelihood
method without assuming multivariate normal distribution, and derives solutions through iterative eigen analysis. First, we start by considering a three factor model.
suppressMessages(library(GPArotation))
obli3factor <- fa(data,nfactors = 3,rotate = "oblimin",fm="minres")
print(obli3factor)
## Factor Analysis using method = minres
## Call: fa(r = data, nfactors = 3, rotate = "oblimin", fm = "minres")
## Standardized loadings (pattern matrix) based upon correlation matrix
## MR2 MR1 MR3 h2 u2 com
## Price 0.18 0.53 -0.06 0.29 0.71 1.3
## Safety 0.34 -0.37 0.14 0.25 0.75 2.3
## Exterior_Design 0.10 0.15 -0.51 0.25 0.75 1.2
## Space_comfort 0.91 -0.01 -0.05 0.83 0.17 1.0
## Technology 0.34 0.03 0.12 0.13 0.87 1.3
## Resale_Value -0.14 0.75 -0.05 0.57 0.43 1.1
## Fuel_Type 0.54 0.05 0.00 0.30 0.70 1.0
## Color -0.04 0.07 0.72 0.56 0.44 1.0
## Maintenance 0.15 0.63 0.26 0.57 0.43 1.4
##
## MR2 MR1 MR3
## SS loadings 1.43 1.42 0.90
## Proportion Var 0.16 0.16 0.10
## Cumulative Var 0.16 0.32 0.42
## Proportion Explained 0.38 0.38 0.24
## Cumulative Proportion 0.38 0.76 1.00
##
## With factor correlations of
## MR2 MR1 MR3
## MR2 1.00 -0.04 -0.06
## MR1 -0.04 1.00 0.27
## MR3 -0.06 0.27 1.00
##
## Mean item complexity = 1.3
## Test of the hypothesis that 3 factors are sufficient.
##
## The degrees of freedom for the null model are 36 and the objective function was 1.58 with Chi Square of 110.8
## The degrees of freedom for the model are 12 and the objective function was 0.12
##
## The root mean square of the residuals (RMSR) is 0.03
## The df corrected root mean square of the residuals is 0.06
##
## The harmonic number of observations is 75 with the empirical chi square 6.43 with prob < 0.89
## The total number of observations was 75 with Likelihood Chi Square = 8.05 with prob < 0.78
##
## Tucker Lewis Index of factoring reliability = 1.165
## RMSEA index = 0 and the 90 % confidence intervals are 0 0.08
## BIC = -43.76
## Fit based upon off diagonal values = 0.97
## Measures of factor score adequacy
## MR2 MR1 MR3
## Correlation of (regression) scores with factors 0.92 0.86 0.80
## Multiple R square of scores with factors 0.85 0.75 0.65
## Minimum correlation of possible factor scores 0.70 0.49 0.29
Since there is a loading for each variable on all three factors, we consider the loadings more than 0.3 as the cutoff value for not getting loading on more than one factor.
print(obli3factor$loadings,cutoff = 0.3)
##
## Loadings:
## MR2 MR1 MR3
## Price 0.531
## Safety 0.340 -0.370
## Exterior_Design -0.510
## Space_comfort 0.906
## Technology 0.345
## Resale_Value 0.748
## Fuel_Type 0.544
## Color 0.722
## Maintenance 0.633
##
## MR2 MR1 MR3
## SS loadings 1.435 1.409 0.890
## Proportion Var 0.159 0.157 0.099
## Cumulative Var 0.159 0.316 0.415
Since the variable safety
is loaded on two factors, we will consider the four factor model.
obli4factor <- fa(data,nfactors = 4,rotate = "oblimin",fm="minres")
print(obli4factor$loadings,cutoff = 0.3)
##
## Loadings:
## MR2 MR1 MR4 MR3
## Price 0.602
## Safety 0.423
## Exterior_Design -0.671
## Space_comfort 0.868
## Technology 0.338
## Resale_Value 0.694
## Fuel_Type 0.571
## Color 0.476
## Maintenance 0.896
##
## MR2 MR1 MR4 MR3
## SS loadings 1.408 1.085 0.918 0.736
## Proportion Var 0.156 0.121 0.102 0.082
## Cumulative Var 0.156 0.277 0.379 0.461
Factor mapping
Now, the variables have only single-loading,and we have a simple structure. Then, we look at the factor mapping.
fa.diagram(obli4factor)
Adequacy Test
print(obli4factor)
## Factor Analysis using method = minres
## Call: fa(r = data, nfactors = 4, rotate = "oblimin", fm = "minres")
## Standardized loadings (pattern matrix) based upon correlation matrix
## MR2 MR1 MR4 MR3 h2 u2 com
## Price 0.23 -0.02 0.60 0.01 0.37 0.63 1.3
## Safety 0.42 -0.26 -0.09 0.23 0.29 0.71 2.4
## Exterior_Design 0.03 0.02 0.00 -0.67 0.45 0.55 1.0
## Space_comfort 0.87 0.05 -0.02 -0.06 0.78 0.22 1.0
## Technology 0.28 0.34 -0.22 -0.02 0.19 0.81 2.7
## Resale_Value -0.12 0.11 0.69 0.00 0.61 0.39 1.1
## Fuel_Type 0.57 -0.03 0.11 0.02 0.31 0.69 1.1
## Color -0.05 0.29 0.01 0.48 0.37 0.63 1.7
## Maintenance 0.02 0.90 0.06 0.03 0.87 0.13 1.0
##
## MR2 MR1 MR4 MR3
## SS loadings 1.41 1.13 0.95 0.75
## Proportion Var 0.16 0.13 0.11 0.08
## Cumulative Var 0.16 0.28 0.39 0.47
## Proportion Explained 0.33 0.27 0.22 0.18
## Cumulative Proportion 0.33 0.60 0.82 1.00
##
## With factor correlations of
## MR2 MR1 MR4 MR3
## MR2 1.00 0.05 -0.13 -0.13
## MR1 0.05 1.00 0.55 0.17
## MR4 -0.13 0.55 1.00 -0.01
## MR3 -0.13 0.17 -0.01 1.00
##
## Mean item complexity = 1.5
## Test of the hypothesis that 4 factors are sufficient.
##
## The degrees of freedom for the null model are 36 and the objective function was 1.58 with Chi Square of 110.8
## The degrees of freedom for the model are 6 and the objective function was 0.03
##
## The root mean square of the residuals (RMSR) is 0.01
## The df corrected root mean square of the residuals is 0.04
##
## The harmonic number of observations is 75 with the empirical chi square 1.12 with prob < 0.98
## The total number of observations was 75 with Likelihood Chi Square = 2.07 with prob < 0.91
##
## Tucker Lewis Index of factoring reliability = 1.334
## RMSEA index = 0 and the 90 % confidence intervals are 0 0.057
## BIC = -23.83
## Fit based upon off diagonal values = 1
## Measures of factor score adequacy
## MR2 MR1 MR4 MR3
## Correlation of (regression) scores with factors 0.90 0.94 0.84 0.76
## Multiple R square of scores with factors 0.82 0.88 0.71 0.57
## Minimum correlation of possible factor scores 0.63 0.76 0.41 0.15
The root mean square of residuals (RMSR) for the final four factor model is 0.01. This value should close to zero to have an acceptable model. Also, the Root Mean Square Error of Approximation (RMSEA) index is 0, and it shows a good model fit as it’s below 0.05. Finally, the Tucker-Lewis Index (TLI) is 1.334, which is also an acceptable value considering it’s over 0.9.
What are the factors?
Here, we have four factors. According to the grouping of the variables, we can name them as Technological benefits, Functional benefits, Aesthetics, and Economic value.
Factor 1 | Factor 2 | Factor 3 | Factor 4 |
---|---|---|---|
Maintenance | Space comfort | Exterior design | Resale value |
Technology | Fuel type | Color | Price |
Safety | |||
Technological benefits | Functional benefits | Aesthetics | Economic value |
factornal
function for maximum likelihood factor analysis
The factanal( ) function produces maximum likelihood factor analysis for multivariate normal data. Since, the variables are on likert scale this method is not a good choice for this data set. If you want to apply maximum likelihood method for a data set first, test the multivariate normality of data using he following codes:
library(MVN)
mvn(data, univariateTest="SW",univariatePlot = "qqplot",multivariatePlot="qq",mvnTest="mardia")
Here, the “mardia” test is test is used to check the multivariate normality of data. Now, to apply Maximum Likelihood method to extract factors use the following codes:
fitmax=factanal(data,factors = 1,rotation = "varimax")```
print(fitmax, digits=2, cutoff=.3, sort=TRUE)`
Start with factor one, and repeat until you get a significant number of factors. Then, you can plot the resulted solution using the following codes:
load <- fitmax$loadings[,1]
plot(load,type="n") # set up plot
text(load,labels=names(data),cex=.7) # add variable names
violets are \(\color{blue}{\text{lovely blue}}\)
Further details:
1. https://rpubs.com/jeeva1407/367782
2. https://rpubs.com/aaronsc32/factor-analysis-introduction
3. https://rpubs.com/ykwon0407/89429
4. http://rpubs.com/nikkev/pca-factor 5. Applied Multivariate Statistical Analysis By R.A. Johnson & DW Wichern
Reading data in different formats:
http://www.sthda.com/english/wiki/reading-data-from-txt-csv-files-r-base-functions
Univariate and multivariate normality tests
https://rpubs.com/lozza44/263609
https://journal.r-project.org/archive/2014/RJ-2014-031/RJ-2014-031.pdf
http://www.biosoft.hacettepe.edu.tr/MVN/