Chapter 2 Plotting

2.1 Plotting in Base R

(useful for exploration of data)

2.1.1 Core Plot Function

Plots data x against data y. (if only x is supplied, the indices will be plotted against their values)

plot(x,y, ...)

Optional Arguments

  • col - colour of points (can use RGB or colour name as a string; can be vector for each point)
  • pch - plotting symbol (cross, circle etc), an integer
  • xlab and ylab - labels
  • xlim and ylim - limits in the form of a 2-vector (e.g. xlim = c(20,100) restricts x from 20 to 100)
  • main - Plot Title
  • type
    • "p" - points (default)
    • "l" - line connecting observations
    • "b" - both points and lines

2.1.2 Other Plot Functions

  • hist() - Histogram
  • boxplot() - Boxplot
  • barplot() - Categorical Bar Charts (use table to get summary)

Note: you can store plots inside of variables

data("diamonds", package = "ggplot2")

hgram <- hist(diamonds$price, freq = FALSE)
str(hgram)
## List of 6
##  $ breaks  : num [1:20] 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 ...
##  $ counts  : int [1:19] 14524 9683 6129 4225 4665 3163 2278 1668 1307 1076 ...
##  $ density : num [1:19] 2.69e-04 1.80e-04 1.14e-04 7.83e-05 8.65e-05 ...
##  $ mids    : num [1:19] 500 1500 2500 3500 4500 5500 6500 7500 8500 9500 ...
##  $ xname   : chr "diamonds$price"
##  $ equidist: logi TRUE
##  - attr(*, "class")= chr "histogram"

2.1.3 Adding to Plots

Each plot() function creates a new plot. To add to an existing plot use,

  • points() - adds a plot of points to an existing plot
  • lines() - shorthand for points(x, y, type="l")
  • abline() - adds a \(y=mx+c\) line directly

2.1.3.1 Fitting Lines to Plots

(see linear regression)

  • lm() - fits a straight line, pass inside of abline()
  • lowess() - fits a smooth line, pass inside of lines() (f argument controls smoothness)
  • density() - fits a smooth continuous version of a histogram

Example

data("diamonds", package = "ggplot2")
plot(diamonds$carat, diamonds$price, pch = 20)


abline(lm(price ~ carat, diamonds),
       col = "red")
lines(lowess(diamonds$carat, diamonds$price, f = 0.05),
      col = "green")

hist(diamonds$price, freq = FALSE)
lines(density(diamonds$price), col = "red")

2.1.4 Multiple Plots

(often better to just use ggplot2)

To get a grid of all pairwise scatter plots, use pairs()

pairs(mtcars)

pairs(mtcars[,1:4])

You can also manually set the grid size (using par(mfrow = c(n,m))) and then populate each grid slot one by one by calling slots

par(mfrow = c(2,1))
plot(diamonds$carat, diamonds$price)
boxplot(diamonds$carat)
par(mfrow = c(1,1)) # <- need this to reset to a single plot!

To reset the plotting window to default use dev.off().

2.2 Plotting in ggplot2

(useful for presentation of data)

Loading ggplot2,

# Either ...
library("tidyverse")
# for all tidyverse packages
# OR, for just plotting
library("ggplot2")

2.2.1 Main Structure

Starting a plot

Every plot starts with the function ggplot() with the optional arguments:

  • data - to specify the data frame containing the variables we later reference
  • aes() - mapping to specify what variables map to the x axis, y axis, colour legend, etc

For example,

## Warning: package 'tidyverse' was built under R version 4.1.2
## Warning: package 'ggplot2' was built under R version 4.1.2
## Warning: package 'tibble' was built under R version 4.1.2
## Warning: package 'tidyr' was built under R version 4.1.2
## Warning: package 'readr' was built under R version 4.1.2
## Warning: package 'purrr' was built under R version 4.1.2
## Warning: package 'dplyr' was built under R version 4.1.2
## Warning: package 'stringr' was built under R version 4.1.2
## Warning: package 'forcats' was built under R version 4.1.2
ggplot(diamonds, aes(x = carat, y = price))

Axis are labelled and scaled but nothing is plotted yet (as we have not called a “Geom”).

Geoms

A geom_ will add a layer to the plot. Examples of Geoms:

  • geom_point() - most basic, plots x against y as scatter plot
  • geom_line()
  • geom_smooth() - smoothed curve (defaukt method is “gam”, can also use “lm”)
  • geom_bar() - barchart (1 variable and counts)
  • geom_col() - barchart (2 variables)
  • geom_boxplot() - boxplot

More unusual ones,

  • geom_hex()
  • geom_polygon()

aes()

If you want to specify the x and y variables, colour by a property, group by a property, change the point size based on a property etc then you put that information into the aes().

The aes(...) that goes into the original ggplot(aes()) will be inherited by all plots unless overridden. The aes(...) that goes into a particular geom, geom_...(aes()) only applies to that geom.

Labels

xlab("X-axis Label"), ylab("Y-axis Label") and ggtitle("Title") can also be added to the plot in the same way as Geoms.

Alternativily, use + labs(title="Title", x="X-axis", y="Y-axis")

2.2.2 Updating a Plot (Plots in Variables)

data("mtcars")
p <- ggplot(mtcars, aes(x = hp, y = mpg)) +
  geom_point()

p + geom_smooth()

p + geom_smooth(method = "lm")

p + scale_y_log10() + scale_x_log10() +
  geom_smooth(method = "lm")

p + scale_y_log10() + scale_x_log10() +
  geom_smooth(method = "lm") +
  geom_vline(xintercept = 100)

Here, p stores the basic plot and each time, something different is added to it for a new plot, but without updating p.

2.2.3 Faceting

Faceting enables splitting your data into multiple plots according to a categorical variable.

  • facet_wrap() - a single variable split
    • formula notation to indicate splitting variable ~ var
    • optionally specify number of rows
  • facet_grid() - two variable split
    • formula indicating both splitting variables rows_var ~ cols_var

formula indicating both splitting variables rows_var ~ cols_var

For example,

ggplot(mtcars, aes(x = hp, y = mpg)) +
  facet_wrap(~ gear) +
  geom_point()

ggplot(mtcars, aes(x = hp, y = mpg)) +
  facet_grid(cyl ~ gear) +
  geom_point()

2.2.4 Examples

ggplot(diamonds, aes(x = carat, y = price)) +
  geom_point(aes(colour = cut), size = 0.2) +
  geom_smooth(aes(colour = cut)) +
  xlab("Number of carats") + ylab("Price in $")

ggplot(mpg, aes(x=displ, y=hwy)) +
  geom_point(aes(colour = class))

ggplot(mpg, aes(x=displ, y=hwy)) +
  facet_wrap(~class) +
  geom_point() +
  geom_point(aes(y=cty), colour="red") + #aes() doesn't hold all information!!!
  ylab("Fuel efficiency")

ggplot(mpg, aes(x=displ, y=hwy)) +
  geom_point(aes(colour=drv)) +
  geom_smooth(colour="black") +
  geom_smooth(aes(colour=drv))

ggplot(mpg, aes(x=class)) +
  geom_bar(aes(fill=drv))