Intro to RMarkdown

Introduction

R Markdown is a format for writing reproducible, dynamic reports with R. Use it to embed R code and results into slideshows, pdfs, html documents, Word files and more.
Some advantages of using R Markdown include coverting your work into accesible formats with inputted R code and plots that are reproducible.

Getting Started

You can install the R Markdown package as follows:

install.packages("rmarkdown")

To create an R Markdown file, follow these steps in descending order:
1. In the upper left corner of your current RStudio session, click on File
2. Go to New File
3. Lastly, click on where it says R Markdown
You should see a pop-up screen titled “New R Markdown” where, on the left, you have the different types of R Markdowns, in the top, the title and author sections that must be filled or modified and at the bottom, the default output format, which indicates whether to transform the R Markdown file into HTML, PDF or Word.
For the purposes of this tutorial, I created an R Markdown Document in HTML format, but other options exist. I also provided a title and author (which is simply my name and last name in this case).
Any R Markdown file that you create should always look initially similar to this:

At the top of the newly created file, will contain basic information such as the title and author associated with this file. Any modifications to this portion of the file should be made within the hyphen boundaries ---. Do NOT touch the hyphen boundaries and always work BELOW the last one.
The example text and code that appears below the hyphen boundaries each time you create a new R Markdown can be deleted and replaced by your own text and code.

Adding Text

You can start adding text by directly typing on each line:

This tutorial is meant to teach and guide novice R users how to create their own R Markdown file.

You can create headers by using a pound sign “#” and subheadings with more “#” signs:

### Subheading or #### Smaller Subheading

Add bullet points using a hyphen followed by a space:

- bullet point

Add sub-points using four spaces and a plus sign:

    + sub-point

Add an ordered list by typing the number/letter:

1. first point

    a. sub-sub-point

Formatting Text

italics

*italics* or _italics_

bold

**bold** or __bold__

link

[link](https://www.stu.edu/)

back tickmarks for “code” format: code here

`code here`

write equations, such as this \(\frac{1}{n} \sum_{i=1}^{n} x_{i}\), as seen on textbooks and on the web by following steps such as these:

$\frac{1}{n} \sum_{i=1}^{n} x_{i}$

Embedding R Code

R code chunks begin with triple backtick, open brace, r, and then close back tick. They can also include some display options. They also end with a triple backtick.

# ```{r}
#
# ```

In this section, I will be demonstrating how to implement code in your R Markdown. Specifically, I will be showing how to read and work with a .csv file from the web and another from a directory.
The first example is a .csv file I extracted from my professor Dr. Sanchez’s github repository. The file I extracted is named HomesForSale.csv, which, as the name suggests, contains information on homes for sale in California, New Jersey, New York and Pennsylvania regarding their corresponding price, size, beds and baths.

# Reading the "Homes for Sale" dataset from Dr. Sanchez's github repository.
suppressMessages(library(RCurl))
suppressMessages(library(tidyverse))
x <- getURL("https://raw.githubusercontent.com/reisanar/datasets/master/HomesForSale.csv")
Homes <- read.csv(text = x)
(Homes <- as_data_frame(Homes))

## # A tibble: 120 × 5
##     State Price  Size  Beds Baths
##    <fctr> <int> <dbl> <int> <dbl>
## 1      NJ   375   2.1     3   2.5
## 2      NJ   200   0.9     1   1.0
## 3      NJ   599   2.3     5   2.5
## 4      NJ   365   2.1     3   3.0
## 5      NJ   220   2.1     5   2.0
## 6      NJ   250   1.9     4   2.5
## 7      NJ   410   2.2     4   2.5
## 8      NJ   429   2.8     5   2.5
## 9      NJ   325   2.0     3   2.5
## 10     NJ   235   1.1     4   1.0
## # ... with 110 more rows

The summary function can be used to assess the statistical properties of each attribute.

summary(Homes)

##  State       Price             Size            Beds           Baths      
##  CA:30   Min.   :  47.0   Min.   :0.600   Min.   :1.000   Min.   :1.000  
##  NJ:30   1st Qu.: 186.2   1st Qu.:1.300   1st Qu.:3.000   1st Qu.:2.000  
##  NY:30   Median : 270.0   Median :1.700   Median :3.000   Median :2.000  
##  PA:30   Mean   : 479.7   Mean   :2.034   Mean   :3.275   Mean   :2.324  
##          3rd Qu.: 483.8   3rd Qu.:2.500   3rd Qu.:4.000   3rd Qu.:2.500  
##          Max.   :5900.0   Max.   :6.900   Max.   :7.000   Max.   :8.000

Scatter plots are used to plot two variables against each other (or 3 in the case of 3D plots).

plot(Homes$Size, Homes$Price)

For data sets with only a few attributes, all the pairwise scatter plots may be constructed.

pairs(as.matrix(Homes[,-1]))

Histograms provide you with the frequency in which each data value of a given attribute occurs.

b <- seq(min(Homes$Size), max(Homes$Size), length=10)
b

##  [1] 0.6 1.3 2.0 2.7 3.4 4.1 4.8 5.5 6.2 6.9

hist(Homes$Size, breaks=b, xlab="Size", main="Histogram of Homes Size")

Density plots can be viewed as smoothed versions of a histogram. We can estimate the density using R’s density function.

dens.size = density(Homes$Size)
plot(dens.size, ylab = "f(Size)", xlab = "Size", main= "Density of Homes Size")

Box plots are used to compactly show many pieces of information about a variable’s distribution including some summary statistics.

boxplot(Homes$Size ~ Homes$State)

For working with a .csv file from a directory, one of the first steps is to verify what is the current working directory for your current R session. To do so, execute the following command:

getwd()

Upon verifying this, you must be certain whether your .csv file is actually in this working directory. Otherwise, you can expect to receive an error message similar to this:

In file(file, "rt") :
  cannot open file 'FisherIris.csv': No such file or directory

To be sure if it is located in your current working directory, use R’s dir function, which will provide you a list of all the files located in your current working directory:

dir()

If not, you can change the current directory to the directory of your choice with R’s setwd command. When specifying the desired directory, you can quickly do so by pressing the tab button after each forwardslash /. This will provide you with a list of options, which can be chosen by scrolling to one in particular and pressing Enter. This serves as an auto-complete method of choosing the directory of your liking.

# Example of using the "setwd" command.
setwd("C:/Users/javyr/Documents/")

Once this process is taken care of, you may proceed with reading and working with the .csv file located in the directory you placed it in your computer. In this case, I will use the “Fisher’s Iris” dataset located in my desktop, which consists of 3 classes of flower types, setosa, virginica and versicolor, along with 4 attributes, sepal width, sepal length, petal width and petal length.
Afterwards, similar summary statistics and visualizations can be performed with this dataset compared to the previous dataset discussed.

iris <- read.csv("FisherIris.csv")
(iris <- as_data_frame(iris))

## # A tibble: 150 × 5
##          Type PetalWidth PetalLength SepalWidth SepalLength
##        <fctr>      <int>       <int>      <int>       <int>
## 1      Setosa          2          14         33          50
## 2   Virginica         24          56         31          67
## 3   Virginica         23          51         31          69
## 4      Setosa          2          10         36          46
## 5   Virginica         20          52         30          65
## 6   Virginica         19          51         27          58
## 7  Versicolor         13          45         28          57
## 8  Versicolor         16          47         33          63
## 9   Virginica         17          45         25          49
## 10 Versicolor         14          47         32          70
## # ... with 140 more rows

names(iris) = c("iris.type", "petal.width", "petal.length", "sepal.width", "sepal.length")

Summary statistic of iris dataset.

summary(iris)

##       iris.type   petal.width     petal.length    sepal.width   
##  Setosa    :50   Min.   : 1.00   Min.   :10.00   Min.   :20.00  
##  Versicolor:50   1st Qu.: 3.00   1st Qu.:16.00   1st Qu.:28.00  
##  Virginica :50   Median :13.00   Median :44.00   Median :30.00  
##                  Mean   :11.93   Mean   :37.79   Mean   :30.55  
##                  3rd Qu.:18.00   3rd Qu.:51.00   3rd Qu.:33.00  
##                  Max.   :25.00   Max.   :69.00   Max.   :44.00  
##   sepal.length  
##  Min.   :43.00  
##  1st Qu.:51.00  
##  Median :58.00  
##  Mean   :58.45  
##  3rd Qu.:64.00  
##  Max.   :79.00

All pairwise scatter plots for iris dataset, which also distinguish observations belonging to each type with a different color.

pairs(as.matrix(iris[,-1]), pch=21, bg=c("red", "blue", "green")[unclass(iris$iris.type)])

Histogram visual of iris's petal.width not only with equal-width bins, but also with a distinct color, which is “blue”.

b <- seq(min(iris$petal.width), max(iris$petal.width), length=11)
b

##  [1]  1.0  3.4  5.8  8.2 10.6 13.0 15.4 17.8 20.2 22.6 25.0

hist(iris$petal.width, breaks=b, col="blue", xlab="Petal Width", main="Histogram of Petal Width")

Density function of petal.length for each of the three classes of irises.

density.setosa = density(iris$petal.length[iris$iris.type == "Setosa"])
density.versicolor = density(iris$petal.length[iris$iris.type == "Versicolor"])
density.virginica = density(iris$petal.length[iris$iris.type == "Virginica"])

plot(density.setosa, ylab="f(length)", xlab="length", main="Density plot of Petal Legnths", xlim = c(0,80), lwd=4, col="red")
lines(density.versicolor, col="blue", lwd=4)
lines(density.virginica, col="green", lwd=4)
legend(40, 0.25, c("Setosa", "Versicolor", "Virginica"), lwd=rep(4,3), col=c("red", "blue", "green"))

A box plot of each attribute of the iris data set.

boxplot(iris[,-1])

Finally, to render an R Markdown document into it’s final output format, click on the “Knit” button or type (Ctrl+Shift+K).

Intro to RMarkdown

Javier Rojas

June 30, 2017

Introduction

Getting Started

Adding Text

Formatting Text

Embedding R Code