One of the heart touching R packages that I noted in rstudio::global(2021), 24-hour virtual conference was flipbookr package developed by Gina Reynolds, Garrick Aden-Buie and Emi Tanaka. Using flipbookr package you can present your code step-by-step and side-by-side with its output.
This incremental code-output evolution is really helpful to learn how the output changes step-by-step when adding R codes one by one.
How do we create a flipbook?
First install the package from GitHub.
Writing a book is now an amazing experience using R open source software. The R package, bookdown, developed by Yihui Xie, generates printer-ready books and ebooks from R Markdown documents. This package produces books in all output forms (PDF, HTML, ePub, LaTeX, Word and Kindle books etc.). We can also add dynamic graphics and interactive applications (HTML widgets and Shiny apps) to books, and further the package supports a wide range of languages (R, C/C++, Python, Fortran, Julia, Shell scripts, and SQL, etc).
Creating a dashboard is an attractive way to visualize different groups of related data. To setup a dashboard we can use the R package flexdashboard.
First, setup the orientation of the dashboard in YML header. The default orientation is columns, which shows individual charts stacked vertically within each column. To setup the orientation row-wise specify orientation: rows option in YML header.
Similarly, we can display several components in different windows using a tabset.
Before using R for the Data Analysis, you should know some key points to avoid getting many error messages. In this post, I will explain some of those key points.
Preparing Data for the Analysis Variable names
Since R is a Case Sensitive Language, variables ‘Age’ and ‘age’ will be treated as different variables in R. Therefore, you should select a common form to name all your variables. I usually use all simple for the variable names.
To understand the theory and details behind the factor analysis read the Introduction to Factor Analysis.
In this post, an example for factor analysis is given.
Example
Suppose a customer survey was conducted while purchasing car. In the questionnaire, 9 different variables were included, and 75 customers were participated for the study. The survey questions were framed using 5-point likert scale with 1 being very low and 5 being very high.
Factor analysis is used to describe the covariance relationships among many variables in terms of a few underlying, but unobservable random quantities called factors. If variables can be grouped (not the observations) by their correlations, then all variables within a particular group are highly correlated among themselves. This means, each group of variables represents a single underlying hidden factor that is responsible for the observed correlations.
For example, correlations from a group of test scores in Mathematics, Statistics, Chemistry and Physics might correspond to a factor named “intelligence”.
The box plot is useful for comparing the quartiles and variation of quantitative variables. In a box plot, lower and upper ends of a box (the hinges) are the first (Q1) and third quartile (Q3), and the middle horizontal line represents the median (Q2) of the data. Outliers of the data are shown by the whiskers of the boxes, when data falls above 1.5 * IQR, where the inter-quartile range IQR = Q3 - Q1.
The monthly dengue incidence data for each administrative district presented in the website of the Epidemiology Unit, Ministry of Health, Sri Lanka are used for the analysis.
Administrative units of Sri Lanka
Sri Lanka is divided into 9 provinces as the first layer of administrative units. The 9 provinces are Central Province(CP), Eastern Province(EP), North Central Province(NC), Northern Province(NP), North Western Province(NW), Sabaragamuwa Province(SG), Southern Province(SP), Uva Province(UP) and Western Province(WP).
In some statistical techniques, it is essential to convert a set of correlated variables to a set of uncorrelated variables. This can be done by using Principal Component Analysis (PCA). The converted uncorrelated variables are called principal components that represent most of the information in the original set of variables.
This statistical technique is also a useful descriptive tool to examine your data, and to reduce the number of variables of the original data set.
We can have R codes, results and interpretation of results in one document by using R Notebook for the data analysis. R Notebook is an R Markdown document that can be executed independently and interactively, and it interacts with R directly while producing a reproducible document with publication-quality output.Therefore, at least a basic knowledge in R Markdown is needed to use R Markdown and R Notebook. Some basic concepts required are given below, and you can learn more by using the lessons given in RStudio website https://rmarkdown.