ggplot2 is a powerful R package that produces data visualizations easily and intuitively.
Since this package is not built into the language, you will need to install it by running in the console the following code:
install.packages("ggplot2")
library("ggplot2")
Once this is completed, you can now proceed with using this package. For demonstration purposes, I will use the diamonds
dataset that is built into the package for revealing the packages functionalities. In addition, I will provide examples of two other datasets pre-installed with R, which are the iris
and mtcars
datasets.
The dataset can be accessed with the data
function:
data(diamonds)
data(iris)
data(mtcars)
View()
function:View(diamonds)
View(iris)
View(mtcars)
help()
function can give us a description of the diamonds
dataset and details about each of the columns.help(diamonds)
help(iris)
help(mtcars)
To start getting familiar with ggplot2, lets attempt to understand the relationship between the dataset diamonds
’s attributes through the use of a scatter plot.
Before we do so, we need to set up the graph’s “aesthetics”, which is the dimension of a graph that we can perceive visually such as its x- and y- axes, color, size, and shape.
Lets create a ggplot object where the attribute caret
is on the x-axis and price
is on the y-axis:
d <- ggplot(data = diamonds, aes(x=carat, y=price))
d + geom_point()
i <- ggplot(data=iris, aes(x=Sepal.Length, y=Sepal.Width))
i + geom_point()
m <- ggplot(data = mtcars, aes(x = wt, y = mpg))
m + geom_point()
The data being graphed (diamonds
dataset)
Mapping of aesthetics to the attributes with the aes()
function and assignments “x=carat” (carat on the x-axis) and “y=price” (price on the y-axis)
Lastly, the assigned layer, which defines what type of graph it is. Since we want a scatter plot, the corresponding layer is geom_pont
.
d + geom_point(aes(colour=clarity))
i + geom_point(aes(colour=Species))
m + geom_point(aes(colour=factor(cyl)))
d + geom_point(aes(colour=clarity, size=cut))
## Warning: Using size for a discrete variable is not advised.
m + geom_point(aes(colour=factor(cyl), size = qsec))
d + geom_point(aes(colour=clarity, shape=cut))
i + geom_point(aes(colour=Species, shape=Species))
m + geom_point(aes(colour = factor(cyl), shape = factor(cyl)))
geom_smooth
layer, which adds a smoothing curve that shows the general trend of the data:d + geom_point() + geom_smooth()
## `geom_smooth()` using method = 'gam'
i + geom_point(aes(colour = Species)) + geom_smooth(method = "lm")
geom_smooth
later; specifically “se=FALSE”, where “s.e.” stands for “standard error.d + geom_point() + geom_smooth(se=FALSE)
## `geom_smooth()` using method = 'gam'
m + geom_point(aes(colour = factor(cyl))) + geom_smooth(method="lm", se=FALSE)
facet_wrap()
function where you put a tilde (~) along with the attribute you would like to divide the plots by.d + geom_point(aes(colour=cut)) + facet_wrap(~ clarity)
i + geom_point() + facet_grid(. ~ Species) + geom_smooth(method = "lm")
facetgrid()
. In this case that would be “facetgrid(”, then you put “color ~ clarity”, where the tilde (~) means “explained by.”d + geom_point(aes(colour=cut)) + facet_grid(color ~ clarity)
ggplot(data=mtcars, aes(x=mpg, y=disp)) +
geom_point(aes(color = carb)) +
facet_grid(cyl ~ gear)
d + geom_point() + ggtitle("My scatter plot")
ggplot(diamonds, aes(x=carat, y=price)) +
geom_point() +
ggtitle("My scatter plot") +
xlab("Weight (carats)")
ggplot(diamonds, aes(x=carat, y=price)) +
geom_point() +
ggtitle("My scatter plot") +
xlab("Weight (carats)") +
xlim(0, 2)
## Warning: Removed 1889 rows containing missing values (geom_point).
ggplot(diamonds, aes(x=carat, y=price)) +
geom_point() +
ggtitle("My scatter plot") +
xlab("Weight (carats)") +
ylim(0, 10000)
## Warning: Removed 5222 rows containing missing values (geom_point).
ggplot(diamonds, aes(x=carat, y=price)) +
geom_point() +
ggtitle("My scatter plot") +
xlab("Weight (carats)") +
scale_y_log10()
m + geom_line(aes(colour = as.factor(cyl)))
geom_point()
to geom_histogram()
:ggplot(diamonds, aes(x=price)) +
geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
geom_histogram
layer. You can make them wider or thinner:ggplot(diamonds, aes(x=price)) +
geom_histogram(binwidth=2000)
ggplot(data = mtcars, aes(x = mpg, fill=cyl)) +
geom_histogram(binwidth = 1)
ggplot(diamonds, aes(x=price, fill=clarity)) +
geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
geom_histogram
to geom_density
. Remove the fill attribute.ggplot(diamonds, aes(x=price)) +
geom_density()
color
aesthetic instead of fill
. For example, you can add color=cut
.ggplot(diamonds, aes(x=price, color=cut)) +
geom_density()
ggplot(iris, aes(x = Sepal.Length, color = Species, fill=Species)) +
geom_density(alpha=0.3)
ggplot(diamonds, aes(x=color, y=price)) +
geom_boxplot()
ggplot(iris, aes(x=Species,y=Sepal.Length)) +
geom_boxplot()
p = ggplot(diamonds, aes(x=carat, y=price)) +
geom_point()
ggsave(filename="diamonds.png", p)
ggsave(filename="diamonds.pdf", p)
ggsave(filename="diamonds.jpeg", p)
ggplot(diamonds, aes(x=carat, y=price)) +
geom_point()
ggsave("diamonds.png")