Vectors

library(tidyverse)
library(rcis)
set.seed(1234)

Run the code below in your console to download this exercise as a set of R scripts.

usethis::use_course("cis-ds/vectors-and-iteration")

So far the only type of data object in R you have encountered is a data.frame (or the tidyverse variant tibble). At its core though, the primary method of data storage in R is the vector. So far we have only encountered vectors as components of a data frame; data frames are built from vectors. There are a few different types of vectors: logical, numeric, and character. But now we want to understand more precisely how these data objects are structured and related to one another.

Types of vectors

Figure 20.1 from [*R for Data Science*](http://r4ds.had.co.nz/vectors.html)
Figure 20.1 from R for Data Science

There are two categories of vectors:

  1. Atomic vectors - these are the types previously covered, including logical, integer, double, and character.
  2. Lists - there are new and we will cover them later in this module. Lists are distinct from atomic vectors because lists can contain other lists.

Atomic vectors are homogenous - that is, all elements of the vector must be the same type. Lists can be hetergenous and contain multiple types of elements. NULL is the counterpart to NA. While NA represents the absence of a value, NULL represents the absence of a vector.

Atomic vectors

Logical vectors

Logical vectors take on one of three possible values:

  • TRUE
  • FALSE
  • NA (missing value)

    parse_logical(c("TRUE", "TRUE", "FALSE", "TRUE", "NA"))
    
    ## [1]  TRUE  TRUE FALSE  TRUE    NA
    
Whenever you filter a data frame, R is (in the background) creating a vector of TRUE and FALSE - whenever the condition is TRUE, keep the row, otherwise exclude it.

Numeric vectors

Numeric vectors contain numbers (duh!). They can be stored as integers (whole numbers) or doubles (numbers with decimal points). In practice, you rarely need to concern yourself with this difference, but just know that they are different but related things.

parse_integer(c("1", "5", "3", "4", "12423"))
## [1]     1     5     3     4 12423
parse_double(c("4.2", "4", "6", "53.2"))
## [1]  4.2  4.0  6.0 53.2
Doubles can store both whole numbers and numbers with decimal points.

Character vectors

Character vectors contain strings, which are typically text but could also be dates or any other combination of characters.

parse_character(c("Goodnight Moon", "Runaway Bunny", "Big Red Barn"))
## [1] "Goodnight Moon" "Runaway Bunny"  "Big Red Barn"

Using atomic vectors

Be sure to read “Using atomic vectors” for more detail on how to use and interact with atomic vectors. I have no desire to rehash everything Hadley already wrote, but here are a couple things about atomic vectors I want to reemphasize.

Scalars

Scalars are a single number; vectors are a set of multiple values. In R, scalars are merely a vector of length 1. So when you try to perform arithmetic or other types of functions on a vector, it will recycle the scalar value.

(x <- sample(10))
##  [1] 10  6  5  4  1  8  2  7  9  3
x + c(100, 100, 100, 100, 100, 100, 100, 100, 100, 100)
##  [1] 110 106 105 104 101 108 102 107 109 103
x + 100
##  [1] 110 106 105 104 101 108 102 107 109 103

This is why you don’t need to write an iterative operation when performing these basic operations - R automatically converts it for you.

Sometimes this isn’t so great, because R will also recycle vectors if the lengths are not equal:

# create a sequence of numbers between 1 and 10
(x1 <- seq(from = 1, to = 2))
## [1] 1 2
(x2 <- seq(from = 1, to = 10))
##  [1]  1  2  3  4  5  6  7  8  9 10
# add together two sequences of numbers
x1 + x2
##  [1]  2  4  4  6  6  8  8 10 10 12

Did you really mean to recycle 1:2 five times, or was this actually an error? tidyverse functions will only allow you to implicitly recycle scalars, otherwise it will throw an error and you’ll have to manually recycle shorter vectors.

Subsetting

To filter a vector, we cannot use filter() because that only works for filtering rows in a tibble. [ is the subsetting function for vectors. It is used like x[a].

Subset with a numeric vector containing integers

(x <- c("one", "two", "three", "four", "five"))
## [1] "one"   "two"   "three" "four"  "five"

Subset with positive integers keeps the corresponding elements:

x[c(3, 2, 5)]
## [1] "three" "two"   "five"

Negative values drop the corresponding elements:

x[c(-1, -3, -5)]
## [1] "two"  "four"

You cannot mix positive and negative values:

x[c(-1, 1)]
## Error in x[c(-1, 1)]: only 0's may be mixed with negative subscripts

Subset with a logical vector

Subsetting with a logical vector keeps all values corresponding to a TRUE value.

(x <- c(10, 3, NA, 5, 8, 1, NA))
## [1] 10  3 NA  5  8  1 NA
# All non-missing values of x
!is.na(x)
## [1]  TRUE  TRUE FALSE  TRUE  TRUE  TRUE FALSE
x[!is.na(x)]
## [1] 10  3  5  8  1
# All even (or missing!) values of x
x[x %% 2 == 0]
## [1] 10 NA  8 NA

Exercise: subset the vector

(x <- seq(from = 1, to = 10))
##  [1]  1  2  3  4  5  6  7  8  9 10

Create the sequence above in your R session. Write commands to subset the vector in the following ways:

  1. Keep the first through fourth elements, plus the seventh element.

    Click for the solution

    x[c(1, 2, 3, 4, 7)]
    
    ## [1] 1 2 3 4 7
    
    # use a sequence shortcut
    x[c(seq(1, 4), 7)]
    
    ## [1] 1 2 3 4 7
    

  2. Keep the first through eighth elements, plus the tenth element.

    Click for the solution

    # long way
    x[c(1, 2, 3, 4, 5, 6, 7, 8, 10)]
    
    ## [1]  1  2  3  4  5  6  7  8 10
    
    # sequence shortcut
    x[c(seq(1, 8), 10)]
    
    ## [1]  1  2  3  4  5  6  7  8 10
    
    # negative indexing
    x[c(-9)]
    
    ## [1]  1  2  3  4  5  6  7  8 10
    

  3. Keep all elements with values greater than five.

    Click for the solution

    # get the index for which values in x are greater than 5
    x > 5
    
    ##  [1] FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE
    
    x[x > 5]
    
    ## [1]  6  7  8  9 10
    

  4. Keep all elements evenly divisible by three.

    Click for the solution

    x[x %% 3 == 0]
    
    ## [1] 3 6 9
    

Lists

Lists are an entirely different type of vector.

x <- list(1, 2, 3)
x
## [[1]]
## [1] 1
## 
## [[2]]
## [1] 2
## 
## [[3]]
## [1] 3

Use str() to view the structure of the list.

str(x)
## List of 3
##  $ : num 1
##  $ : num 2
##  $ : num 3
x_named <- list(a = 1, b = 2, c = 3)
str(x_named)
## List of 3
##  $ a: num 1
##  $ b: num 2
##  $ c: num 3
If you are running RStudio 1.1 or above, you can also use the object explorer to interactively examine the structure of objects.

Unlike the other atomic vectors, lists are recursive. This means they can:

  1. Store a mix of objects.

    y <- list("a", 1L, 1.5, TRUE)
    str(y)
    
    ## List of 4
    ##  $ : chr "a"
    ##  $ : int 1
    ##  $ : num 1.5
    ##  $ : logi TRUE
    
  2. Contain other lists.

    z <- list(list(1, 2), list(3, 4))
    str(z)
    
    ## List of 2
    ##  $ :List of 2
    ##   ..$ : num 1
    ##   ..$ : num 2
    ##  $ :List of 2
    ##   ..$ : num 3
    ##   ..$ : num 4
    

    It isn’t immediately apparent why you would want to do this, but in later units we will discover the value of lists as many packages for R store non-tidy data as lists.

You’ve already worked with lists without even knowing it. Data frames and tibbles are a type of a list. Notice that you can store a data frame with a mix of column types.

str(gun_deaths)
## tibble [100,798 × 10] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ id       : num [1:100798] 1 2 3 4 5 6 7 8 9 10 ...
##  $ year     : num [1:100798] 2012 2012 2012 2012 2012 ...
##  $ month    : chr [1:100798] "Jan" "Jan" "Jan" "Feb" ...
##  $ intent   : chr [1:100798] "Suicide" "Suicide" "Suicide" "Suicide" ...
##  $ police   : num [1:100798] 0 0 0 0 0 0 0 0 0 0 ...
##  $ sex      : chr [1:100798] "M" "F" "M" "M" ...
##  $ age      : num [1:100798] 34 21 60 64 31 17 48 41 50 NA ...
##  $ race     : chr [1:100798] "Asian/Pacific Islander" "White" "White" "White" ...
##  $ place    : chr [1:100798] "Home" "Street" "Other specified" "Home" ...
##  $ education: Factor w/ 4 levels "Less than HS",..: 4 3 4 4 2 1 2 2 3 NA ...

How to subset lists

Sometimes lists (especially deeply-nested lists) can be confusing to view and manipulate. Take the example from R for Data Science:

x <- list(a = c(1, 2, 3), b = "a string", c = pi, d = list(-1, -5))
str(x)
## List of 4
##  $ a: num [1:3] 1 2 3
##  $ b: chr "a string"
##  $ c: num 3.14
##  $ d:List of 2
##   ..$ : num -1
##   ..$ : num -5
  • [ extracts a sub-list. The result will always be a list.

    str(x[c(1, 2)])
    
    ## List of 2
    ##  $ a: num [1:3] 1 2 3
    ##  $ b: chr "a string"
    
    str(x[4])
    
    ## List of 1
    ##  $ d:List of 2
    ##   ..$ : num -1
    ##   ..$ : num -5
    
  • [[ extracts a single component from a list and removes a level of hierarchy.

    str(x[[1]])
    
    ##  num [1:3] 1 2 3
    
    str(x[[4]])
    
    ## List of 2
    ##  $ : num -1
    ##  $ : num -5
    
  • $ can be used to extract named elements of a list.

    x$a
    
    ## [1] 1 2 3
    
    x[["a"]]
    
    ## [1] 1 2 3
    
    x[["a"]]
    
    ## [1] 1 2 3
    
Figure 20.2 from [R for Data Science](http://r4ds.had.co.nz/vectors.html#fig:lists-subsetting)
Figure 20.2 from R for Data Science
Still confused about list subsetting? Review the pepper shaker.

Exercise: subset a list

y <- list(a = c(1, 2, 3), b = "a string", c = pi, d = list(-1, -5))
str(y)
## List of 4
##  $ a: num [1:3] 1 2 3
##  $ b: chr "a string"
##  $ c: num 3.14
##  $ d:List of 2
##   ..$ : num -1
##   ..$ : num -5

Create the list above in your R session. Write commands to subset the list in the following ways:

  1. Subset a. The result should be an atomic vector.

    Click for the solution

    # use the index value
    y[[1]]
    
    ## [1] 1 2 3
    
    # use the element name
    y$a
    
    ## [1] 1 2 3
    
    y[["a"]]
    
    ## [1] 1 2 3
    

  2. Subset pi. The results should be a new list.

    Click for the solution

    # correct method
    str(y["c"])
    
    ## List of 1
    ##  $ c: num 3.14
    
    # incorrect method to produce another list
    # the result is a scalar
    str(y$c)
    
    ##  num 3.14
    

  3. Subset the first and third elements from y.

    Click for the solution

    y[c(1, 3)]
    
    ## $a
    ## [1] 1 2 3
    ## 
    ## $c
    ## [1] 3.141593
    
    y[c("a", "c")]
    
    ## $a
    ## [1] 1 2 3
    ## 
    ## $c
    ## [1] 3.141593
    

Session Info

devtools::session_info()
## ─ Session info ───────────────────────────────────────────────────────────────
##  setting  value                       
##  version  R version 4.1.0 (2021-05-18)
##  os       macOS Big Sur 10.16         
##  system   x86_64, darwin17.0          
##  ui       X11                         
##  language (EN)                        
##  collate  en_US.UTF-8                 
##  ctype    en_US.UTF-8                 
##  tz       America/Chicago             
##  date     2021-10-21                  
## 
## ─ Packages ───────────────────────────────────────────────────────────────────
##  package     * version date       lib source        
##  assertthat    0.2.1   2019-03-21 [1] CRAN (R 4.1.0)
##  backports     1.2.1   2020-12-09 [1] CRAN (R 4.1.0)
##  blogdown      1.4     2021-07-23 [1] CRAN (R 4.1.0)
##  bookdown      0.23    2021-08-13 [1] CRAN (R 4.1.0)
##  broom         0.7.9   2021-07-27 [1] CRAN (R 4.1.0)
##  bslib         0.2.5.1 2021-05-18 [1] CRAN (R 4.1.0)
##  cachem        1.0.6   2021-08-19 [1] CRAN (R 4.1.0)
##  callr         3.7.0   2021-04-20 [1] CRAN (R 4.1.0)
##  cellranger    1.1.0   2016-07-27 [1] CRAN (R 4.1.0)
##  cli           3.0.1   2021-07-17 [1] CRAN (R 4.1.0)
##  colorspace    2.0-2   2021-06-24 [1] CRAN (R 4.1.0)
##  crayon        1.4.1   2021-02-08 [1] CRAN (R 4.1.0)
##  DBI           1.1.1   2021-01-15 [1] CRAN (R 4.1.0)
##  dbplyr        2.1.1   2021-04-06 [1] CRAN (R 4.1.0)
##  desc          1.3.0   2021-03-05 [1] CRAN (R 4.1.0)
##  devtools      2.4.2   2021-06-07 [1] CRAN (R 4.1.0)
##  digest        0.6.27  2020-10-24 [1] CRAN (R 4.1.0)
##  dplyr       * 1.0.7   2021-06-18 [1] CRAN (R 4.1.0)
##  ellipsis      0.3.2   2021-04-29 [1] CRAN (R 4.1.0)
##  evaluate      0.14    2019-05-28 [1] CRAN (R 4.1.0)
##  fansi         0.5.0   2021-05-25 [1] CRAN (R 4.1.0)
##  fastmap       1.1.0   2021-01-25 [1] CRAN (R 4.1.0)
##  forcats     * 0.5.1   2021-01-27 [1] CRAN (R 4.1.0)
##  fs            1.5.0   2020-07-31 [1] CRAN (R 4.1.0)
##  generics      0.1.0   2020-10-31 [1] CRAN (R 4.1.0)
##  ggplot2     * 3.3.5   2021-06-25 [1] CRAN (R 4.1.0)
##  glue          1.4.2   2020-08-27 [1] CRAN (R 4.1.0)
##  gtable        0.3.0   2019-03-25 [1] CRAN (R 4.1.0)
##  haven         2.4.3   2021-08-04 [1] CRAN (R 4.1.0)
##  here          1.0.1   2020-12-13 [1] CRAN (R 4.1.0)
##  hms           1.1.0   2021-05-17 [1] CRAN (R 4.1.0)
##  htmltools     0.5.1.1 2021-01-22 [1] CRAN (R 4.1.0)
##  httr          1.4.2   2020-07-20 [1] CRAN (R 4.1.0)
##  jquerylib     0.1.4   2021-04-26 [1] CRAN (R 4.1.0)
##  jsonlite      1.7.2   2020-12-09 [1] CRAN (R 4.1.0)
##  knitr         1.33    2021-04-24 [1] CRAN (R 4.1.0)
##  lifecycle     1.0.1   2021-09-24 [1] CRAN (R 4.1.0)
##  lubridate     1.7.10  2021-02-26 [1] CRAN (R 4.1.0)
##  magrittr      2.0.1   2020-11-17 [1] CRAN (R 4.1.0)
##  memoise       2.0.0   2021-01-26 [1] CRAN (R 4.1.0)
##  modelr        0.1.8   2020-05-19 [1] CRAN (R 4.1.0)
##  munsell       0.5.0   2018-06-12 [1] CRAN (R 4.1.0)
##  pillar        1.6.3   2021-09-26 [1] CRAN (R 4.1.0)
##  pkgbuild      1.2.0   2020-12-15 [1] CRAN (R 4.1.0)
##  pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.1.0)
##  pkgload       1.2.1   2021-04-06 [1] CRAN (R 4.1.0)
##  prettyunits   1.1.1   2020-01-24 [1] CRAN (R 4.1.0)
##  processx      3.5.2   2021-04-30 [1] CRAN (R 4.1.0)
##  ps            1.6.0   2021-02-28 [1] CRAN (R 4.1.0)
##  purrr       * 0.3.4   2020-04-17 [1] CRAN (R 4.1.0)
##  R6            2.5.1   2021-08-19 [1] CRAN (R 4.1.0)
##  rcis       * 0.2.1   2020-12-08 [1] local         
##  Rcpp          1.0.7   2021-07-07 [1] CRAN (R 4.1.0)
##  readr       * 2.0.1   2021-08-10 [1] CRAN (R 4.1.0)
##  readxl        1.3.1   2019-03-13 [1] CRAN (R 4.1.0)
##  remotes       2.4.0   2021-06-02 [1] CRAN (R 4.1.0)
##  reprex        2.0.1   2021-08-05 [1] CRAN (R 4.1.0)
##  rlang         0.4.11  2021-04-30 [1] CRAN (R 4.1.0)
##  rmarkdown     2.10    2021-08-06 [1] CRAN (R 4.1.0)
##  rprojroot     2.0.2   2020-11-15 [1] CRAN (R 4.1.0)
##  rstudioapi    0.13    2020-11-12 [1] CRAN (R 4.1.0)
##  rvest         1.0.1   2021-07-26 [1] CRAN (R 4.1.0)
##  sass          0.4.0   2021-05-12 [1] CRAN (R 4.1.0)
##  scales        1.1.1   2020-05-11 [1] CRAN (R 4.1.0)
##  sessioninfo   1.1.1   2018-11-05 [1] CRAN (R 4.1.0)
##  stringi       1.7.3   2021-07-16 [1] CRAN (R 4.1.0)
##  stringr     * 1.4.0   2019-02-10 [1] CRAN (R 4.1.0)
##  testthat      3.0.4   2021-07-01 [1] CRAN (R 4.1.0)
##  tibble      * 3.1.5   2021-09-30 [1] CRAN (R 4.1.0)
##  tidyr       * 1.1.3   2021-03-03 [1] CRAN (R 4.1.0)
##  tidyselect    1.1.1   2021-04-30 [1] CRAN (R 4.1.0)
##  tidyverse   * 1.3.1   2021-04-15 [1] CRAN (R 4.1.0)
##  tzdb          0.1.2   2021-07-20 [1] CRAN (R 4.1.0)
##  usethis       2.0.1   2021-02-10 [1] CRAN (R 4.1.0)
##  utf8          1.2.2   2021-07-24 [1] CRAN (R 4.1.0)
##  vctrs         0.3.8   2021-04-29 [1] CRAN (R 4.1.0)
##  withr         2.4.2   2021-04-18 [1] CRAN (R 4.1.0)
##  xfun          0.25    2021-08-06 [1] CRAN (R 4.1.0)
##  xml2          1.3.2   2020-04-23 [1] CRAN (R 4.1.0)
##  yaml          2.2.1   2020-02-01 [1] CRAN (R 4.1.0)
## 
## [1] /Library/Frameworks/R.framework/Versions/4.1/Resources/library