R Markdown is a format for writing reproducible, dynamic reports with R. Use it to embed R code and results into slideshows, pdfs, html documents, Word files and more.
Some advantages of using R Markdown include coverting your work into accesible formats with inputted R code and plots that are reproducible.
install.packages("rmarkdown")
To create an R Markdown file, follow these steps in descending order:
In the upper left corner of your current RStudio session, click on File
Go to New File
Lastly, click on where it says R Markdown
You should see a pop-up screen titled “New R Markdown” where, on the left, you have the different types of R Markdowns, in the top, the title and author sections that must be filled or modified and at the bottom, the default output format, which indicates whether to transform the R Markdown file into HTML, PDF or Word.
For the purposes of this tutorial, I created an R Markdown Document in HTML format, but other options exist. I also provided a title and author (which is simply my name and last name in this case).
Any R Markdown file that you create should always look initially similar to this:
At the top of the newly created file, will contain basic information such as the title and author associated with this file. Any modifications to this portion of the file should be made within the hyphen boundaries ---
. Do NOT touch the hyphen boundaries and always work BELOW the last one.
The example text and code that appears below the hyphen boundaries each time you create a new R Markdown can be deleted and replaced by your own text and code.
This tutorial is meant to teach and guide novice R users how to create their own R Markdown file.
### Subheading or #### Smaller Subheading
- bullet point
+ sub-point
1. first point
a. sub-sub-point
*italics* or _italics_
**bold** or __bold__
[link](https://www.stu.edu/)
code here
`code here`
$\frac{1}{n} \sum_{i=1}^{n} x_{i}$
# ```{r}
#
# ```
In this section, I will be demonstrating how to implement code in your R Markdown. Specifically, I will be showing how to read and work with a .csv file from the web and another from a directory.
The first example is a .csv file I extracted from my professor Dr. Sanchez’s github repository. The file I extracted is named HomesForSale.csv
, which, as the name suggests, contains information on homes for sale in California, New Jersey, New York and Pennsylvania regarding their corresponding price
, size
, beds
and baths
.
# Reading the "Homes for Sale" dataset from Dr. Sanchez's github repository.
suppressMessages(library(RCurl))
suppressMessages(library(tidyverse))
x <- getURL("https://raw.githubusercontent.com/reisanar/datasets/master/HomesForSale.csv")
Homes <- read.csv(text = x)
(Homes <- as_data_frame(Homes))
## # A tibble: 120 × 5
## State Price Size Beds Baths
## <fctr> <int> <dbl> <int> <dbl>
## 1 NJ 375 2.1 3 2.5
## 2 NJ 200 0.9 1 1.0
## 3 NJ 599 2.3 5 2.5
## 4 NJ 365 2.1 3 3.0
## 5 NJ 220 2.1 5 2.0
## 6 NJ 250 1.9 4 2.5
## 7 NJ 410 2.2 4 2.5
## 8 NJ 429 2.8 5 2.5
## 9 NJ 325 2.0 3 2.5
## 10 NJ 235 1.1 4 1.0
## # ... with 110 more rows
summary(Homes)
## State Price Size Beds Baths
## CA:30 Min. : 47.0 Min. :0.600 Min. :1.000 Min. :1.000
## NJ:30 1st Qu.: 186.2 1st Qu.:1.300 1st Qu.:3.000 1st Qu.:2.000
## NY:30 Median : 270.0 Median :1.700 Median :3.000 Median :2.000
## PA:30 Mean : 479.7 Mean :2.034 Mean :3.275 Mean :2.324
## 3rd Qu.: 483.8 3rd Qu.:2.500 3rd Qu.:4.000 3rd Qu.:2.500
## Max. :5900.0 Max. :6.900 Max. :7.000 Max. :8.000
plot(Homes$Size, Homes$Price)
pairs(as.matrix(Homes[,-1]))
b <- seq(min(Homes$Size), max(Homes$Size), length=10)
b
## [1] 0.6 1.3 2.0 2.7 3.4 4.1 4.8 5.5 6.2 6.9
hist(Homes$Size, breaks=b, xlab="Size", main="Histogram of Homes Size")
density
function.dens.size = density(Homes$Size)
plot(dens.size, ylab = "f(Size)", xlab = "Size", main= "Density of Homes Size")
boxplot(Homes$Size ~ Homes$State)
getwd()
In file(file, "rt") :
cannot open file 'FisherIris.csv': No such file or directory
dir
function, which will provide you a list of all the files located in your current working directory:dir()
setwd
command. When specifying the desired directory, you can quickly do so by pressing the tab
button after each forwardslash /
. This will provide you with a list of options, which can be chosen by scrolling to one in particular and pressing Enter
. This serves as an auto-complete method of choosing the directory of your liking.# Example of using the "setwd" command.
setwd("C:/Users/javyr/Documents/")
Once this process is taken care of, you may proceed with reading and working with the .csv file located in the directory you placed it in your computer. In this case, I will use the “Fisher’s Iris” dataset located in my desktop, which consists of 3 classes of flower types, setosa, virginica and versicolor, along with 4 attributes, sepal width, sepal length, petal width and petal length.
Afterwards, similar summary statistics and visualizations can be performed with this dataset compared to the previous dataset discussed.
iris <- read.csv("FisherIris.csv")
(iris <- as_data_frame(iris))
## # A tibble: 150 × 5
## Type PetalWidth PetalLength SepalWidth SepalLength
## <fctr> <int> <int> <int> <int>
## 1 Setosa 2 14 33 50
## 2 Virginica 24 56 31 67
## 3 Virginica 23 51 31 69
## 4 Setosa 2 10 36 46
## 5 Virginica 20 52 30 65
## 6 Virginica 19 51 27 58
## 7 Versicolor 13 45 28 57
## 8 Versicolor 16 47 33 63
## 9 Virginica 17 45 25 49
## 10 Versicolor 14 47 32 70
## # ... with 140 more rows
names(iris) = c("iris.type", "petal.width", "petal.length", "sepal.width", "sepal.length")
iris
dataset.summary(iris)
## iris.type petal.width petal.length sepal.width
## Setosa :50 Min. : 1.00 Min. :10.00 Min. :20.00
## Versicolor:50 1st Qu.: 3.00 1st Qu.:16.00 1st Qu.:28.00
## Virginica :50 Median :13.00 Median :44.00 Median :30.00
## Mean :11.93 Mean :37.79 Mean :30.55
## 3rd Qu.:18.00 3rd Qu.:51.00 3rd Qu.:33.00
## Max. :25.00 Max. :69.00 Max. :44.00
## sepal.length
## Min. :43.00
## 1st Qu.:51.00
## Median :58.00
## Mean :58.45
## 3rd Qu.:64.00
## Max. :79.00
iris
dataset, which also distinguish observations belonging to each type with a different color.pairs(as.matrix(iris[,-1]), pch=21, bg=c("red", "blue", "green")[unclass(iris$iris.type)])
iris's petal.width
not only with equal-width bins, but also with a distinct color, which is “blue”.b <- seq(min(iris$petal.width), max(iris$petal.width), length=11)
b
## [1] 1.0 3.4 5.8 8.2 10.6 13.0 15.4 17.8 20.2 22.6 25.0
hist(iris$petal.width, breaks=b, col="blue", xlab="Petal Width", main="Histogram of Petal Width")
petal.length
for each of the three classes of irises.density.setosa = density(iris$petal.length[iris$iris.type == "Setosa"])
density.versicolor = density(iris$petal.length[iris$iris.type == "Versicolor"])
density.virginica = density(iris$petal.length[iris$iris.type == "Virginica"])
plot(density.setosa, ylab="f(length)", xlab="length", main="Density plot of Petal Legnths", xlim = c(0,80), lwd=4, col="red")
lines(density.versicolor, col="blue", lwd=4)
lines(density.virginica, col="green", lwd=4)
legend(40, 0.25, c("Setosa", "Versicolor", "Virginica"), lwd=rep(4,3), col=c("red", "blue", "green"))
boxplot(iris[,-1])