R for Statistical Applications

One of the heart touching R packages that I noted in rstudio::global(2021), 24-hour virtual conference was flipbookr package developed by Gina Reynolds, Garrick Aden-Buie and Emi Tanaka. Using flipbookr package you can present your code step-by-step and side-by-side with its output. This incremental code-output evolution is really helpful to learn how the output changes step-by-step when adding R codes one by one. How do we create a flipbook? First install the package from GitHub.

Writing a book is now an amazing experience using R open source software. The R package, bookdown, developed by Yihui Xie, generates printer-ready books and ebooks from R Markdown documents. This package produces books in all output forms (PDF, HTML, ePub, LaTeX, Word and Kindle books etc.). We can also add dynamic graphics and interactive applications (HTML widgets and Shiny apps) to books, and further the package supports a wide range of languages (R, C/C++, Python, Fortran, Julia, Shell scripts, and SQL, etc).

Creating a dashboard is an attractive way to visualize different groups of related data. To setup a dashboard we can use the R package flexdashboard. First, setup the orientation of the dashboard in YML header. The default orientation is columns, which shows individual charts stacked vertically within each column. To setup the orientation row-wise specify orientation: rows option in YML header. Similarly, we can display several components in different windows using a tabset.

R or Jupyter Notebook? Although I am a fan of R, RStudio and R Notebook, some researchers are familiar with Jupyter Notebook. Specially, those who work in industry may love to use Jupyter Notebook. Jupyter Notebook is a web application in which you can create and share documents that contain live codes, equations, text and also graphical visualizations. Therefore, we can use Jupyter Notebook to perform data analysis in real time.

We have to remind the style and syntax of Markdown when preparing markdown files. If you are very familiar with markdown this may not be a problem. Instead of reminding the syntax how about if a word like platform is available. Now, many such editors are available. Some editors that you can use Typora - This works on Windows, Mac and Linux, and very similar to LyX. You can download Typora to your computer and can easily type your document as a markdown file.

Before using R for the Data Analysis, you should know some key points to avoid getting many error messages. In this post, I will explain some of those key points. Preparing Data for the Analysis Variable names Since R is a Case Sensitive Language, variables ‘Age’ and ‘age’ will be treated as different variables in R. Therefore, you should select a common form to name all your variables. I usually use all simple for the variable names.

To understand the theory and details behind the factor analysis read the Introduction to Factor Analysis. In this post, an example for factor analysis is given. Example Suppose a customer survey was conducted while purchasing car. In the questionnaire, 9 different variables were included, and 75 customers were participated for the study. The survey questions were framed using 5-point likert scale with 1 being very low and 5 being very high.

Most of the parametric tests are based on the Normal distribution. Therefore, if you have a small sample (n < 25), it is a good practise to check whether the data are normally distributed before conducting a hypothesis test. According to the central limit Theorem, sample mean is approximately normally distributed if the sample size is large. Therefore, if the test statistics of the hypothesis is based on the sample mean and the sample size is large, checking the normality assumption is not mandatory.

Factor analysis is used to describe the covariance relationships among many variables in terms of a few underlying, but unobservable random quantities called factors. If variables can be grouped (not the observations) by their correlations, then all variables within a particular group are highly correlated among themselves. This means, each group of variables represents a single underlying hidden factor that is responsible for the observed correlations. For example, correlations from a group of test scores in Mathematics, Statistics, Chemistry and Physics might correspond to a factor named “intelligence”.

The box plot is useful for comparing the quartiles and variation of quantitative variables. In a box plot, lower and upper ends of a box (the hinges) are the first (Q1) and third quartile (Q3), and the middle horizontal line represents the median (Q2) of the data. Outliers of the data are shown by the whiskers of the boxes, when data falls above 1.5 * IQR, where the inter-quartile range IQR = Q3 - Q1.

Posts

Teaching and Learning R using flipbookr package

Writing books using bookdown package

Creating dashboards using "flexdashboard" R package

Jupyter Notebook

Markdown Editors

Intial steps for your Data Analysis

Factor Analysis - Example

Testing Hypotheses

Introduction to Factor Analysis

Drawing Box Plots