Intial steps for your Data Analysis
Before using R for the Data Analysis, you should know some key points to avoid getting many error messages. In this post, I will explain some of those key points.
Preparing Data for the Analysis
Variable names
Since R is a Case Sensitive Language, variables ‘Age’ and ‘age’ will be treated as different variables in R. Therefore, you should select a common form to name all your variables. I usually use all simple for the variable names. May be you can use the first letter capital and then simple for all others. Avoid using longer names.
Example: To specify the variable height you can either use the variable name as height or HeightIf you want to add a variable name with two or more words insert
.
or_
sign in between these wrods instead of a space.
Example: To write the male height as a variable name you can use
male.height, male_height or MaleHeightDo not include symbols such as
?, $,%, ^, &, *, (, ),-,#, ?,,,<,>, /, |, \, [ ,] ,{, and }
to variable names.
Check your data
If there are any missing values in your data set, indicate them as
NA
.If you use any specific R packages for the data analysis, check the examples given in the help file of that package. For example, if you want to use
MASS package
, run the codehelp(package="MASS")
. If you want to understand lm (linear model) function in theMASS package
run the codehelp(lm, package="MASS")
.
In many help pages, there is an example which illustrates how the functions work. For example, if you want to execute examples relevant to lm (linear model) function run the commandexample(lm)
.One of the most important point is to understand the data format required to use the relevant package. Note that the functions given in the relevant package works only with this data format. Therefore, if your data set in not in that format, you have to reshape your data according to the required format.
You may also need a particular data structure to use the relevant package.
If you want to create a new variable, recode or rename variables.
Refer:Quick RIf you want to sort, subset or merge your data refer the following links: sort, subset merge
To reshaping your data, read the following two suggestions given by STHDA website
(i) Tidyr R package
(ii) Tibble R package
Importing Data for the analysis
The next step is importing data to your R session. Before importing data, check whether your current directory is your working directory.
Run the code
getwd()
to check your working directory, and if the current directory is not your working directory, set it assetwd("<path to your dataset>")
. For example, if your data set is in D:\Rworks directory, usesetwd("D:/Rworks")
.To read data files directly from your computer, use
library("readr")
and select files by runningdata <- read.delim(file.choose())
for txt files ordata <- read.csv(file.choose())
for comma delimited (csv) files.You can also import files by specifying paths. For example, to import a data file in a directory D:\Rworks use
(i)data <- read.csv("D:/Rworks/data.csv", header=TRUE, sep=",", row.names="id")
for a comma delimited (csv) file.
(ii)data<-read.delim("D:/Rworks/data.csv", header = TRUE, sep = "\t", dec = ".", ...)
for a TAB delimited file.
(iii)data <- read.table("D:/Rworks/data.csv", header = FALSE, sep = "", dec = ".")
for tabular data.To read a txt file displays in a website use
data<- read.table("https://s3.amazonaws.com/assets.datacamp.com/blog_assets/test.txt",header = FALSE)
You can also use read.delim(), and read.csv() as well for the relevant data formats.To read xlsx files use readxl package as below:
library("readxl")
data <- read_excel("data.xlsx")
.To read the first worksheet from the workbook named dataexcel.xlsx, use
library(xlsx)
data <- read.xlsx("c:/dataexcel.xlsx", 1)
If you expect to read another sheet named “sheet4” usedata <- read.xlsx("D:/Rworks/dataexcel.xlsx", sheetName = "sheet4")
.
Exporting Data in R
To write data to a txt file having tab separated values, use
write.table(data, file = "data.txt", sep = "\t", row.names = TRUE, col.names = NA)
.To write data to a comma delimited (csv) file, use
write.csv(data, file = "data.csv")
.To write data to a new xlsx workbook, use
library("xlsx")
write.xlsx(data1, file = "dataworkbook1.xlsx",sheetName = "firstsheet", append = FALSE)
.
Then add a second worksheet to the same workbook usingwrite.xlsx(data2, file = "dataworkbook2.xlsx",sheetName="secondsheet", append=TRUE)