Practice generating layered graphics using ggplot2

library(tidyverse)

Run the code below in your console to download this exercise as a set of R scripts.

usethis::use_course("cis-ds/grammar-of-graphics")

Given your preparation for today’s class, now let’s practice generating layered graphics in R using data from Gapminder World, which compiles country-level data on quality-of-life measures.

Load the gapminder dataset

If you have not already installed the gapminder package and you try to load it using the following code, you will get an error:

library(gapminder)
Error in library(gapminder) : there is no package called ‘gapminder’

If this happens, install the gapminder package by running install.packages("gapminder") in your console.

Once you’ve done this, run the following code to load the gapminder dataset, the ggplot2 library, and a helper library for printing the contents of gapminder:

library(gapminder)
library(ggplot2)
library(tibble)

glimpse(gapminder)
## Rows: 1,704
## Columns: 6
## $ country   <fct> Afghanistan, Afghanistan, Afghanistan, Afghanistan, Afghani…
## $ continent <fct> Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia,…
## $ year      <int> 1952, 1957, 1962, 1967, 1972, 1977, 1982, 1987, 1992, 1997,…
## $ lifeExp   <dbl> 28.8, 30.3, 32.0, 34.0, 36.1, 38.4, 39.9, 40.8, 41.7, 41.8,…
## $ pop       <int> 8425333, 9240934, 10267083, 11537966, 13079460, 14880372, 1…
## $ gdpPercap <dbl> 779, 821, 853, 836, 740, 786, 978, 852, 649, 635, 727, 975,…
Run ?gapminder in the console to open the help file for the data and definitions for each of the columns.

Using the grammar of graphics and your knowledge of the ggplot2 library, generate a series of graphs that explore the relationships between specific variables.

Generate a histogram of life expectancy

Click for the solution

ggplot(data = gapminder, mapping = aes(x = lifeExp)) +
  geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Generate separate histograms of life expectancy for each continent

Hint: think about how to split your plots to show different subsets of data.

Click for the solution

ggplot(data = gapminder, mapping = aes(x = lifeExp)) +
  geom_histogram() +
  facet_wrap(facets = vars(continent))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Compare the distribution of life expectancy, by continent by generating a boxplot

Click for the solution

ggplot(data = gapminder, mapping = aes(x = continent, y = lifeExp)) +
  geom_boxplot()

Redraw the plot, but this time use a violin plot

Click for the solution

ggplot(data = gapminder, mapping = aes(x = continent, y = lifeExp)) +
  geom_violin()

Generate a scatterplot of the relationship between per capita GDP and life expectancy

Click for the solution

ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp)) +
  geom_point()

Add a smoothing line to the scatterplot

Click for the solution

ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp)) +
  geom_point() +
  geom_smooth()
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'

Identify whether this relationship differs by continent

Use the color aesthetic to identify differences

Click for the solution

ggplot(
  data = gapminder,
  mapping = aes(x = gdpPercap, y = lifeExp, color = continent)
) +
  geom_point() +
  geom_smooth()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Use faceting to identify differences

Click for the solution

# using facet_wrap()
ggplot(
  data = gapminder,
  mapping = aes(x = gdpPercap, y = lifeExp, color = continent)
) +
  geom_point() +
  geom_smooth() +
  facet_wrap(facets = vars(continent))
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

# using facet_grid()
ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp, color = continent)) +
  geom_point() +
  geom_smooth() +
  facet_grid(cols = vars(continent))
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Why use facet_grid() here instead of facet_wrap()? Good question! Let’s reframe it and instead ask, what is the difference between facet_grid() and facet_wrap()?1

The answer below refers to the case when you have 2 arguments in facet_grid() or facet_wrap(). facet_grid(rows = vars(x), cols = vars(y)) will display $y \times x$ plots even if some plots are empty. For example:

library(palmerpenguins)

ggplot(data = penguins, aes(x = bill_length_mm, y = body_mass_g)) +
  geom_point() +
  facet_grid(rows = vars(species), cols = vars(island))
## Warning: Removed 2 rows containing missing values (geom_point).

There are 3 distinct species and island values. This plot displays $3 \times 3 = 9$ plots, even if some are empty (for example, Chinstrap penguins were not observed on Biscoe Island).

facet_wrap(facets = vars(species, island)) displays only the plots having actual values.

ggplot(data = penguins, aes(x = bill_length_mm, y = body_mass_g)) +
  geom_point() +
  facet_wrap(facets = vars(species, island))
## Warning: Removed 2 rows containing missing values (geom_point).

There are 5 plots displayed now, one for every combination of species and island. So for this exercise, I would use facet_wrap() because we are faceting on a single variable. If we faceted on multiple variables, facet_grid() may be more appropriate.


  1. Example drawn from this StackOverflow thread. ^

Bonus: Identify the outlying countries on the right-side of the graph by labeling each observation with the country name

Click for the solution

ggplot(
  data = gapminder,
  mapping = aes(x = gdpPercap, y = lifeExp, label = country)
) +
  geom_smooth() +
  geom_text()
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'

Session Info

devtools::session_info()
## ─ Session info ───────────────────────────────────────────────────────────────
##  setting  value                       
##  version  R version 4.1.0 (2021-05-18)
##  os       macOS Big Sur 10.16         
##  system   x86_64, darwin17.0          
##  ui       X11                         
##  language (EN)                        
##  collate  en_US.UTF-8                 
##  ctype    en_US.UTF-8                 
##  tz       America/Chicago             
##  date     2022-01-06                  
## 
## ─ Packages ───────────────────────────────────────────────────────────────────
##  package        * version date       lib source        
##  assertthat       0.2.1   2019-03-21 [1] CRAN (R 4.1.0)
##  backports        1.2.1   2020-12-09 [1] CRAN (R 4.1.0)
##  blogdown         1.7     2021-12-19 [1] CRAN (R 4.1.0)
##  bookdown         0.23    2021-08-13 [1] CRAN (R 4.1.0)
##  broom            0.7.9   2021-07-27 [1] CRAN (R 4.1.0)
##  bslib            0.3.1   2021-10-06 [1] CRAN (R 4.1.0)
##  cachem           1.0.6   2021-08-19 [1] CRAN (R 4.1.0)
##  callr            3.7.0   2021-04-20 [1] CRAN (R 4.1.0)
##  cellranger       1.1.0   2016-07-27 [1] CRAN (R 4.1.0)
##  cli              3.1.0   2021-10-27 [1] CRAN (R 4.1.0)
##  codetools        0.2-18  2020-11-04 [1] CRAN (R 4.1.0)
##  colorspace       2.0-2   2021-06-24 [1] CRAN (R 4.1.0)
##  crayon           1.4.2   2021-10-29 [1] CRAN (R 4.1.0)
##  DBI              1.1.1   2021-01-15 [1] CRAN (R 4.1.0)
##  dbplyr           2.1.1   2021-04-06 [1] CRAN (R 4.1.0)
##  desc             1.3.0   2021-03-05 [1] CRAN (R 4.1.0)
##  devtools         2.4.2   2021-06-07 [1] CRAN (R 4.1.0)
##  digest           0.6.28  2021-09-23 [1] CRAN (R 4.1.0)
##  dplyr          * 1.0.7   2021-06-18 [1] CRAN (R 4.1.0)
##  ellipsis         0.3.2   2021-04-29 [1] CRAN (R 4.1.0)
##  evaluate         0.14    2019-05-28 [1] CRAN (R 4.1.0)
##  fansi            0.5.0   2021-05-25 [1] CRAN (R 4.1.0)
##  farver           2.1.0   2021-02-28 [1] CRAN (R 4.1.0)
##  fastmap          1.1.0   2021-01-25 [1] CRAN (R 4.1.0)
##  forcats        * 0.5.1   2021-01-27 [1] CRAN (R 4.1.0)
##  fs               1.5.0   2020-07-31 [1] CRAN (R 4.1.0)
##  gapminder      * 0.3.0   2017-10-31 [1] CRAN (R 4.1.0)
##  generics         0.1.1   2021-10-25 [1] CRAN (R 4.1.0)
##  ggplot2        * 3.3.5   2021-06-25 [1] CRAN (R 4.1.0)
##  glue             1.5.0   2021-11-07 [1] CRAN (R 4.1.0)
##  gtable           0.3.0   2019-03-25 [1] CRAN (R 4.1.0)
##  haven            2.4.3   2021-08-04 [1] CRAN (R 4.1.0)
##  here             1.0.1   2020-12-13 [1] CRAN (R 4.1.0)
##  highr            0.9     2021-04-16 [1] CRAN (R 4.1.0)
##  hms              1.1.1   2021-09-26 [1] CRAN (R 4.1.0)
##  htmltools        0.5.2   2021-08-25 [1] CRAN (R 4.1.0)
##  httr             1.4.2   2020-07-20 [1] CRAN (R 4.1.0)
##  jquerylib        0.1.4   2021-04-26 [1] CRAN (R 4.1.0)
##  jsonlite         1.7.2   2020-12-09 [1] CRAN (R 4.1.0)
##  knitr            1.33    2021-04-24 [1] CRAN (R 4.1.0)
##  labeling         0.4.2   2020-10-20 [1] CRAN (R 4.1.0)
##  lifecycle        1.0.1   2021-09-24 [1] CRAN (R 4.1.0)
##  lubridate        1.7.10  2021-02-26 [1] CRAN (R 4.1.0)
##  magrittr         2.0.1   2020-11-17 [1] CRAN (R 4.1.0)
##  memoise          2.0.0   2021-01-26 [1] CRAN (R 4.1.0)
##  modelr           0.1.8   2020-05-19 [1] CRAN (R 4.1.0)
##  munsell          0.5.0   2018-06-12 [1] CRAN (R 4.1.0)
##  palmerpenguins * 0.1.0   2020-07-23 [1] CRAN (R 4.1.0)
##  pillar           1.6.4   2021-10-18 [1] CRAN (R 4.1.0)
##  pkgbuild         1.2.0   2020-12-15 [1] CRAN (R 4.1.0)
##  pkgconfig        2.0.3   2019-09-22 [1] CRAN (R 4.1.0)
##  pkgload          1.2.1   2021-04-06 [1] CRAN (R 4.1.0)
##  prettyunits      1.1.1   2020-01-24 [1] CRAN (R 4.1.0)
##  processx         3.5.2   2021-04-30 [1] CRAN (R 4.1.0)
##  ps               1.6.0   2021-02-28 [1] CRAN (R 4.1.0)
##  purrr          * 0.3.4   2020-04-17 [1] CRAN (R 4.1.0)
##  R6               2.5.1   2021-08-19 [1] CRAN (R 4.1.0)
##  Rcpp             1.0.7   2021-07-07 [1] CRAN (R 4.1.0)
##  readr          * 2.0.2   2021-09-27 [1] CRAN (R 4.1.0)
##  readxl           1.3.1   2019-03-13 [1] CRAN (R 4.1.0)
##  remotes          2.4.0   2021-06-02 [1] CRAN (R 4.1.0)
##  reprex           2.0.1   2021-08-05 [1] CRAN (R 4.1.0)
##  rlang            0.4.12  2021-10-18 [1] CRAN (R 4.1.0)
##  rmarkdown        2.11    2021-09-14 [1] CRAN (R 4.1.0)
##  rprojroot        2.0.2   2020-11-15 [1] CRAN (R 4.1.0)
##  rstudioapi       0.13    2020-11-12 [1] CRAN (R 4.1.0)
##  rvest            1.0.1   2021-07-26 [1] CRAN (R 4.1.0)
##  sass             0.4.0   2021-05-12 [1] CRAN (R 4.1.0)
##  scales           1.1.1   2020-05-11 [1] CRAN (R 4.1.0)
##  sessioninfo      1.1.1   2018-11-05 [1] CRAN (R 4.1.0)
##  stringi          1.7.5   2021-10-04 [1] CRAN (R 4.1.0)
##  stringr        * 1.4.0   2019-02-10 [1] CRAN (R 4.1.0)
##  testthat         3.0.4   2021-07-01 [1] CRAN (R 4.1.0)
##  tibble         * 3.1.6   2021-11-07 [1] CRAN (R 4.1.0)
##  tidyr          * 1.1.4   2021-09-27 [1] CRAN (R 4.1.0)
##  tidyselect       1.1.1   2021-04-30 [1] CRAN (R 4.1.0)
##  tidyverse      * 1.3.1   2021-04-15 [1] CRAN (R 4.1.0)
##  tzdb             0.1.2   2021-07-20 [1] CRAN (R 4.1.0)
##  usethis          2.0.1   2021-02-10 [1] CRAN (R 4.1.0)
##  utf8             1.2.2   2021-07-24 [1] CRAN (R 4.1.0)
##  vctrs            0.3.8   2021-04-29 [1] CRAN (R 4.1.0)
##  withr            2.4.2   2021-04-18 [1] CRAN (R 4.1.0)
##  xfun             0.29    2021-12-14 [1] CRAN (R 4.1.0)
##  xml2             1.3.2   2020-04-23 [1] CRAN (R 4.1.0)
##  yaml             2.2.1   2020-02-01 [1] CRAN (R 4.1.0)
## 
## [1] /Library/Frameworks/R.framework/Versions/4.1/Resources/library