Vectors
library(tidyverse)
library(rcis)
set.seed(1234)
Run the code below in your console to download this exercise as a set of R scripts.
usethis::use_course("cis-ds/vectors-and-iteration")
So far the only type of data object in R you have encountered is a data.frame
(or the tidyverse
variant tibble
). At its core though, the primary method of data storage in R is the vector. So far we have only encountered vectors as components of a data frame; data frames are built from vectors. There are a few different types of vectors: logical, numeric, and character. But now we want to understand more precisely how these data objects are structured and related to one another.
Types of vectors
There are two categories of vectors:
- Atomic vectors - these are the types previously covered, including logical, integer, double, and character.
- Lists - there are new and we will cover them later in this module. Lists are distinct from atomic vectors because lists can contain other lists.
Atomic vectors are homogenous - that is, all elements of the vector must be the same type. Lists can be hetergenous and contain multiple types of elements. NULL
is the counterpart to NA
. While NA
represents the absence of a value, NULL
represents the absence of a vector.
Atomic vectors
Logical vectors
Logical vectors take on one of three possible values:
TRUE
FALSE
NA
(missing value)parse_logical(c("TRUE", "TRUE", "FALSE", "TRUE", "NA"))
## [1] TRUE TRUE FALSE TRUE NA
TRUE
and FALSE
- whenever the condition is TRUE
, keep the row, otherwise exclude it.Numeric vectors
Numeric vectors contain numbers (duh!). They can be stored as integers (whole numbers) or doubles (numbers with decimal points). In practice, you rarely need to concern yourself with this difference, but just know that they are different but related things.
parse_integer(c("1", "5", "3", "4", "12423"))
## [1] 1 5 3 4 12423
parse_double(c("4.2", "4", "6", "53.2"))
## [1] 4.2 4.0 6.0 53.2
Character vectors
Character vectors contain strings, which are typically text but could also be dates or any other combination of characters.
parse_character(c("Goodnight Moon", "Runaway Bunny", "Big Red Barn"))
## [1] "Goodnight Moon" "Runaway Bunny" "Big Red Barn"
Using atomic vectors
Be sure to read “Using atomic vectors” for more detail on how to use and interact with atomic vectors. I have no desire to rehash everything Hadley already wrote, but here are a couple things about atomic vectors I want to reemphasize.
Scalars
Scalars are a single number; vectors are a set of multiple values. In R, scalars are merely a vector of length 1. So when you try to perform arithmetic or other types of functions on a vector, it will recycle the scalar value.
(x <- sample(10))
## [1] 10 6 5 4 1 8 2 7 9 3
x + c(100, 100, 100, 100, 100, 100, 100, 100, 100, 100)
## [1] 110 106 105 104 101 108 102 107 109 103
x + 100
## [1] 110 106 105 104 101 108 102 107 109 103
This is why you don’t need to write an iterative operation when performing these basic operations - R automatically converts it for you.
Sometimes this isn’t so great, because R will also recycle vectors if the lengths are not equal:
# create a sequence of numbers between 1 and 10
(x1 <- seq(from = 1, to = 2))
## [1] 1 2
(x2 <- seq(from = 1, to = 10))
## [1] 1 2 3 4 5 6 7 8 9 10
# add together two sequences of numbers
x1 + x2
## [1] 2 4 4 6 6 8 8 10 10 12
Did you really mean to recycle 1:2
five times, or was this actually an error? tidyverse
functions will only allow you to implicitly recycle scalars, otherwise it will throw an error and you’ll have to manually recycle shorter vectors.
Subsetting
To filter a vector, we cannot use filter()
because that only works for filtering rows in a tibble
. [
is the subsetting function for vectors. It is used like x[a]
.
Subset with a numeric vector containing integers
(x <- c("one", "two", "three", "four", "five"))
## [1] "one" "two" "three" "four" "five"
Subset with positive integers keeps the corresponding elements:
x[c(3, 2, 5)]
## [1] "three" "two" "five"
Negative values drop the corresponding elements:
x[c(-1, -3, -5)]
## [1] "two" "four"
You cannot mix positive and negative values:
x[c(-1, 1)]
## Error in x[c(-1, 1)]: only 0's may be mixed with negative subscripts
Subset with a logical vector
Subsetting with a logical vector keeps all values corresponding to a TRUE
value.
(x <- c(10, 3, NA, 5, 8, 1, NA))
## [1] 10 3 NA 5 8 1 NA
# All non-missing values of x
!is.na(x)
## [1] TRUE TRUE FALSE TRUE TRUE TRUE FALSE
x[!is.na(x)]
## [1] 10 3 5 8 1
# All even (or missing!) values of x
x[x %% 2 == 0]
## [1] 10 NA 8 NA
Exercise: subset the vector
(x <- seq(from = 1, to = 10))
## [1] 1 2 3 4 5 6 7 8 9 10
Create the sequence above in your R session. Write commands to subset the vector in the following ways:
Keep the first through fourth elements, plus the seventh element.
Click for the solution
x[c(1, 2, 3, 4, 7)]
## [1] 1 2 3 4 7
# use a sequence shortcut x[c(seq(1, 4), 7)]
## [1] 1 2 3 4 7
Keep the first through eighth elements, plus the tenth element.
Click for the solution
# long way x[c(1, 2, 3, 4, 5, 6, 7, 8, 10)]
## [1] 1 2 3 4 5 6 7 8 10
# sequence shortcut x[c(seq(1, 8), 10)]
## [1] 1 2 3 4 5 6 7 8 10
# negative indexing x[c(-9)]
## [1] 1 2 3 4 5 6 7 8 10
Keep all elements with values greater than five.
Click for the solution
# get the index for which values in x are greater than 5 x > 5
## [1] FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE
x[x > 5]
## [1] 6 7 8 9 10
Keep all elements evenly divisible by three.
Click for the solution
x[x %% 3 == 0]
## [1] 3 6 9
Lists
Lists are an entirely different type of vector.
x <- list(1, 2, 3)
x
## [[1]]
## [1] 1
##
## [[2]]
## [1] 2
##
## [[3]]
## [1] 3
Use str()
to view the structure of the list.
str(x)
## List of 3
## $ : num 1
## $ : num 2
## $ : num 3
x_named <- list(a = 1, b = 2, c = 3)
str(x_named)
## List of 3
## $ a: num 1
## $ b: num 2
## $ c: num 3
Unlike the other atomic vectors, lists are recursive. This means they can:
Store a mix of objects.
y <- list("a", 1L, 1.5, TRUE) str(y)
## List of 4 ## $ : chr "a" ## $ : int 1 ## $ : num 1.5 ## $ : logi TRUE
Contain other lists.
z <- list(list(1, 2), list(3, 4)) str(z)
## List of 2 ## $ :List of 2 ## ..$ : num 1 ## ..$ : num 2 ## $ :List of 2 ## ..$ : num 3 ## ..$ : num 4
It isn’t immediately apparent why you would want to do this, but in later units we will discover the value of lists as many packages for R store non-tidy data as lists.
You’ve already worked with lists without even knowing it. Data frames and tibble
s are a type of a list. Notice that you can store a data frame with a mix of column types.
str(gun_deaths)
## tibble [100,798 × 10] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ id : num [1:100798] 1 2 3 4 5 6 7 8 9 10 ...
## $ year : num [1:100798] 2012 2012 2012 2012 2012 ...
## $ month : chr [1:100798] "Jan" "Jan" "Jan" "Feb" ...
## $ intent : chr [1:100798] "Suicide" "Suicide" "Suicide" "Suicide" ...
## $ police : num [1:100798] 0 0 0 0 0 0 0 0 0 0 ...
## $ sex : chr [1:100798] "M" "F" "M" "M" ...
## $ age : num [1:100798] 34 21 60 64 31 17 48 41 50 NA ...
## $ race : chr [1:100798] "Asian/Pacific Islander" "White" "White" "White" ...
## $ place : chr [1:100798] "Home" "Street" "Other specified" "Home" ...
## $ education: Factor w/ 4 levels "Less than HS",..: 4 3 4 4 2 1 2 2 3 NA ...
How to subset lists
Sometimes lists (especially deeply-nested lists) can be confusing to view and manipulate. Take the example from R for Data Science:
x <- list(a = c(1, 2, 3), b = "a string", c = pi, d = list(-1, -5))
str(x)
## List of 4
## $ a: num [1:3] 1 2 3
## $ b: chr "a string"
## $ c: num 3.14
## $ d:List of 2
## ..$ : num -1
## ..$ : num -5
[
extracts a sub-list. The result will always be a list.str(x[c(1, 2)])
## List of 2 ## $ a: num [1:3] 1 2 3 ## $ b: chr "a string"
str(x[4])
## List of 1 ## $ d:List of 2 ## ..$ : num -1 ## ..$ : num -5
[[
extracts a single component from a list and removes a level of hierarchy.str(x[[1]])
## num [1:3] 1 2 3
str(x[[4]])
## List of 2 ## $ : num -1 ## $ : num -5
$
can be used to extract named elements of a list.x$a
## [1] 1 2 3
x[["a"]]
## [1] 1 2 3
x[["a"]]
## [1] 1 2 3
Exercise: subset a list
y <- list(a = c(1, 2, 3), b = "a string", c = pi, d = list(-1, -5))
str(y)
## List of 4
## $ a: num [1:3] 1 2 3
## $ b: chr "a string"
## $ c: num 3.14
## $ d:List of 2
## ..$ : num -1
## ..$ : num -5
Create the list above in your R session. Write commands to subset the list in the following ways:
Subset
a
. The result should be an atomic vector.Click for the solution
# use the index value y[[1]]
## [1] 1 2 3
# use the element name y$a
## [1] 1 2 3
y[["a"]]
## [1] 1 2 3
Subset
pi
. The results should be a new list.Click for the solution
# correct method str(y["c"])
## List of 1 ## $ c: num 3.14
# incorrect method to produce another list # the result is a scalar str(y$c)
## num 3.14
Subset the first and third elements from
y
.Click for the solution
y[c(1, 3)]
## $a ## [1] 1 2 3 ## ## $c ## [1] 3.141593
y[c("a", "c")]
## $a ## [1] 1 2 3 ## ## $c ## [1] 3.141593
Session Info
devtools::session_info()
## ─ Session info ───────────────────────────────────────────────────────────────
## setting value
## version R version 4.1.0 (2021-05-18)
## os macOS Big Sur 10.16
## system x86_64, darwin17.0
## ui X11
## language (EN)
## collate en_US.UTF-8
## ctype en_US.UTF-8
## tz America/Chicago
## date 2021-10-21
##
## ─ Packages ───────────────────────────────────────────────────────────────────
## package * version date lib source
## assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.1.0)
## backports 1.2.1 2020-12-09 [1] CRAN (R 4.1.0)
## blogdown 1.4 2021-07-23 [1] CRAN (R 4.1.0)
## bookdown 0.23 2021-08-13 [1] CRAN (R 4.1.0)
## broom 0.7.9 2021-07-27 [1] CRAN (R 4.1.0)
## bslib 0.2.5.1 2021-05-18 [1] CRAN (R 4.1.0)
## cachem 1.0.6 2021-08-19 [1] CRAN (R 4.1.0)
## callr 3.7.0 2021-04-20 [1] CRAN (R 4.1.0)
## cellranger 1.1.0 2016-07-27 [1] CRAN (R 4.1.0)
## cli 3.0.1 2021-07-17 [1] CRAN (R 4.1.0)
## colorspace 2.0-2 2021-06-24 [1] CRAN (R 4.1.0)
## crayon 1.4.1 2021-02-08 [1] CRAN (R 4.1.0)
## DBI 1.1.1 2021-01-15 [1] CRAN (R 4.1.0)
## dbplyr 2.1.1 2021-04-06 [1] CRAN (R 4.1.0)
## desc 1.3.0 2021-03-05 [1] CRAN (R 4.1.0)
## devtools 2.4.2 2021-06-07 [1] CRAN (R 4.1.0)
## digest 0.6.27 2020-10-24 [1] CRAN (R 4.1.0)
## dplyr * 1.0.7 2021-06-18 [1] CRAN (R 4.1.0)
## ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.1.0)
## evaluate 0.14 2019-05-28 [1] CRAN (R 4.1.0)
## fansi 0.5.0 2021-05-25 [1] CRAN (R 4.1.0)
## fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.1.0)
## forcats * 0.5.1 2021-01-27 [1] CRAN (R 4.1.0)
## fs 1.5.0 2020-07-31 [1] CRAN (R 4.1.0)
## generics 0.1.0 2020-10-31 [1] CRAN (R 4.1.0)
## ggplot2 * 3.3.5 2021-06-25 [1] CRAN (R 4.1.0)
## glue 1.4.2 2020-08-27 [1] CRAN (R 4.1.0)
## gtable 0.3.0 2019-03-25 [1] CRAN (R 4.1.0)
## haven 2.4.3 2021-08-04 [1] CRAN (R 4.1.0)
## here 1.0.1 2020-12-13 [1] CRAN (R 4.1.0)
## hms 1.1.0 2021-05-17 [1] CRAN (R 4.1.0)
## htmltools 0.5.1.1 2021-01-22 [1] CRAN (R 4.1.0)
## httr 1.4.2 2020-07-20 [1] CRAN (R 4.1.0)
## jquerylib 0.1.4 2021-04-26 [1] CRAN (R 4.1.0)
## jsonlite 1.7.2 2020-12-09 [1] CRAN (R 4.1.0)
## knitr 1.33 2021-04-24 [1] CRAN (R 4.1.0)
## lifecycle 1.0.1 2021-09-24 [1] CRAN (R 4.1.0)
## lubridate 1.7.10 2021-02-26 [1] CRAN (R 4.1.0)
## magrittr 2.0.1 2020-11-17 [1] CRAN (R 4.1.0)
## memoise 2.0.0 2021-01-26 [1] CRAN (R 4.1.0)
## modelr 0.1.8 2020-05-19 [1] CRAN (R 4.1.0)
## munsell 0.5.0 2018-06-12 [1] CRAN (R 4.1.0)
## pillar 1.6.3 2021-09-26 [1] CRAN (R 4.1.0)
## pkgbuild 1.2.0 2020-12-15 [1] CRAN (R 4.1.0)
## pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.1.0)
## pkgload 1.2.1 2021-04-06 [1] CRAN (R 4.1.0)
## prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.1.0)
## processx 3.5.2 2021-04-30 [1] CRAN (R 4.1.0)
## ps 1.6.0 2021-02-28 [1] CRAN (R 4.1.0)
## purrr * 0.3.4 2020-04-17 [1] CRAN (R 4.1.0)
## R6 2.5.1 2021-08-19 [1] CRAN (R 4.1.0)
## rcis * 0.2.1 2020-12-08 [1] local
## Rcpp 1.0.7 2021-07-07 [1] CRAN (R 4.1.0)
## readr * 2.0.1 2021-08-10 [1] CRAN (R 4.1.0)
## readxl 1.3.1 2019-03-13 [1] CRAN (R 4.1.0)
## remotes 2.4.0 2021-06-02 [1] CRAN (R 4.1.0)
## reprex 2.0.1 2021-08-05 [1] CRAN (R 4.1.0)
## rlang 0.4.11 2021-04-30 [1] CRAN (R 4.1.0)
## rmarkdown 2.10 2021-08-06 [1] CRAN (R 4.1.0)
## rprojroot 2.0.2 2020-11-15 [1] CRAN (R 4.1.0)
## rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.1.0)
## rvest 1.0.1 2021-07-26 [1] CRAN (R 4.1.0)
## sass 0.4.0 2021-05-12 [1] CRAN (R 4.1.0)
## scales 1.1.1 2020-05-11 [1] CRAN (R 4.1.0)
## sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.1.0)
## stringi 1.7.3 2021-07-16 [1] CRAN (R 4.1.0)
## stringr * 1.4.0 2019-02-10 [1] CRAN (R 4.1.0)
## testthat 3.0.4 2021-07-01 [1] CRAN (R 4.1.0)
## tibble * 3.1.5 2021-09-30 [1] CRAN (R 4.1.0)
## tidyr * 1.1.3 2021-03-03 [1] CRAN (R 4.1.0)
## tidyselect 1.1.1 2021-04-30 [1] CRAN (R 4.1.0)
## tidyverse * 1.3.1 2021-04-15 [1] CRAN (R 4.1.0)
## tzdb 0.1.2 2021-07-20 [1] CRAN (R 4.1.0)
## usethis 2.0.1 2021-02-10 [1] CRAN (R 4.1.0)
## utf8 1.2.2 2021-07-24 [1] CRAN (R 4.1.0)
## vctrs 0.3.8 2021-04-29 [1] CRAN (R 4.1.0)
## withr 2.4.2 2021-04-18 [1] CRAN (R 4.1.0)
## xfun 0.25 2021-08-06 [1] CRAN (R 4.1.0)
## xml2 1.3.2 2020-04-23 [1] CRAN (R 4.1.0)
## yaml 2.2.1 2020-02-01 [1] CRAN (R 4.1.0)
##
## [1] /Library/Frameworks/R.framework/Versions/4.1/Resources/library