Statistical Methods for Researchers
To perform a reasonable data analysis, it is important to have a deep understanding of statistics. The first step of data analysis is applying descriptive Statistical methods, which include obtaining summary statistics (mean, median, mode, sample standard deviation, sample variance etc), and data visualization. These methods help to understand the variations and distribution patterns hidden in data. The second step of data analysis is applying inferential statistics to make decisions. In the inferential statistics, one can generalize sample results to the relevant population through hypothesis tests and other inferential procedures.
Posts on Statistical Methods
Some of the statistical methods used in Data analysis with practical applications are given in the following posts.
To understand the theory and details behind the factor analysis read the Introduction to Factor Analysis.
In this post, an example for factor analysis is given.
Example
Suppose a customer survey was conducted while purchasing car. In the questionnaire, 9 different variables were included, and 75 customers were participated for the study. The survey questions were framed using 5-point likert scale with 1 being very low and 5 being very high.
Most of the parametric tests are based on the Normal distribution. Therefore, if you have a small sample (n < 25), it is a good practise to check whether the data are normally distributed before conducting a hypothesis test. According to the central limit Theorem, sample mean is approximately normally distributed if the sample size is large. Therefore, if the test statistics of the hypothesis is based on the sample mean and the sample size is large, checking the normality assumption is not mandatory.
Factor analysis is used to describe the covariance relationships among many variables in terms of a few underlying, but unobservable random quantities called factors. If variables can be grouped (not the observations) by their correlations, then all variables within a particular group are highly correlated among themselves. This means, each group of variables represents a single underlying hidden factor that is responsible for the observed correlations.
For example, correlations from a group of test scores in Mathematics, Statistics, Chemistry and Physics might correspond to a factor named “intelligence”.
In some statistical techniques, it is essential to convert a set of correlated variables to a set of uncorrelated variables. This can be done by using Principal Component Analysis (PCA). The converted uncorrelated variables are called principal components that represent most of the information in the original set of variables.
This statistical technique is also a useful descriptive tool to examine your data, and to reduce the number of variables of the original data set.