class: center, middle, inverse, title-slide .title[ # Getting data from the web: API access ] .author[ ### INFO 5940
Cornell University ] --- class: inverse, middle # Methods for obtaining data online --- ## Methods for obtaining data online * Click and download * Install and play * API query * Scraping --- ## Click and download * `read.csv` or `readr::read_csv` * `downloader` package or `curl` --- ## Application programming interface (API) - Representational State Transfer (REST) - Uniform Resource Location (URL) - HTTP methods - GET - POST --- ## Application programming interface (API) <img src="/img/wikipedia.png" width="80%" style="display: block; margin: auto;" /> --- ## RESTful queries 1. Submit request to server via URL 1. Return result in a structured format 1. Parse results into a local format --- ## Install and play packages * Packages with R functions written for existing APIs * Useful because * Reproducible * Up-to-date (ideally) * Ease of access --- class: inverse, middle # Using APIs with existing R packages --- ## `manifestoR` * Collects and organizes political party manifestos from around the world * Over 1000 parties from 1945 until today in over 50 countries on five continents * [`manifestoR`](https://github.com/ManifestoProject/manifestoR) --- ## API authentication * Key/token * Obtain key * Store in `.Rprofile` ```r # in .Rprofile options(this_is_my_key = "XXXX") # later, in the R script: key <- getOption("this_is_my_key") ``` -- * `usethis::edit_r_profile()` * Read the documentation - different packages have different storage methods --- ## Load library and set API key ```r library(manifestoR) # retrieve API key stored in .Rprofile mp_setapikey(key = getOption("manifesto_key")) ``` --- ## Retrieve the database ```r (mpds <- mp_maindataset()) ``` ``` ## Connecting to Manifesto Project DB API... ## Connecting to Manifesto Project DB API... corpus version: 2021-1 ``` ``` ## # A tibble: 4,739 × 174 ## country countryname oecdmember eumember edate date party partyname ## <dbl> <chr> <dbl> <dbl> <date> <dbl> <dbl> <chr> ## 1 11 Sweden 0 0 1944-09-17 194409 11220 Communist Pa… ## 2 11 Sweden 0 0 1944-09-17 194409 11320 Social Democ… ## 3 11 Sweden 0 0 1944-09-17 194409 11420 People’s Par… ## 4 11 Sweden 0 0 1944-09-17 194409 11620 Right Party ## 5 11 Sweden 0 0 1944-09-17 194409 11810 Agrarian Par… ## 6 11 Sweden 0 0 1948-09-19 194809 11220 Communist Pa… ## 7 11 Sweden 0 0 1948-09-19 194809 11320 Social Democ… ## 8 11 Sweden 0 0 1948-09-19 194809 11420 People’s Par… ## 9 11 Sweden 0 0 1948-09-19 194809 11620 Right Party ## 10 11 Sweden 0 0 1948-09-19 194809 11810 Agrarian Par… ## # … with 4,729 more rows, and 166 more variables: partyabbrev <chr>, ## # parfam <dbl>, coderid <dbl>, manual <dbl>, coderyear <dbl>, ## # testresult <dbl>, testeditsim <dbl>, pervote <dbl>, voteest <dbl>, ## # presvote <dbl>, absseat <dbl>, totseats <dbl>, progtype <dbl>, ## # datasetorigin <dbl>, corpusversion <chr>, total <dbl>, peruncod <dbl>, ## # per101 <dbl>, per102 <dbl>, per103 <dbl>, per104 <dbl>, per105 <dbl>, ## # per106 <dbl>, per107 <dbl>, per108 <dbl>, per109 <dbl>, per110 <dbl>, … ``` --- <img src="index_files/figure-html/manifesto-dist-1.png" width="80%" style="display: block; margin: auto;" /> --- ## Download manifestos <img src="index_files/figure-html/manifestor-corpus-wordcloud-1.png" width="80%" style="display: block; margin: auto;" /> --- ## Census data with `tidycensus` * API to access data from US Census Bureau * Decennial census * American Community Survey * Returns tidy data frames with (optional) `sf` geometry * Search for variables with `load_variables()` --- ## Store API key ```r library(tidycensus) ``` ```r census_api_key("YOUR API KEY GOES HERE") ``` --- ## Obtain data ```r usa_inc <- get_acs( geography = "state", variables = c(medincome = "B19013_001"), year = 2020 ) usa_inc ``` ``` ## # A tibble: 52 × 5 ## GEOID NAME variable estimate moe ## <chr> <chr> <chr> <dbl> <dbl> ## 1 01 Alabama medincome 52035 377 ## 2 02 Alaska medincome 77790 1134 ## 3 04 Arizona medincome 61529 286 ## 4 05 Arkansas medincome 49475 431 ## 5 06 California medincome 78672 270 ## 6 08 Colorado medincome 75231 379 ## 7 09 Connecticut medincome 79855 587 ## 8 10 Delaware medincome 69110 1112 ## 9 11 District of Columbia medincome 90842 1580 ## 10 12 Florida medincome 57703 269 ## # … with 42 more rows ``` --- ## Visualize data <img src="index_files/figure-html/income-usa-plot-1.png" width="70%" style="display: block; margin: auto;" /> --- # Twitter API * REST API * Streaming API -- * [`rtweet`](https://docs.ropensci.org/rtweet/) --- # Using `rtweet` ```r library(rtweet) ``` * Requires a Twitter account * Prompt to authorize application on first usage --- # Searching tweets ```r rt <- search_tweets( q = "#rstats", n = 3000, include_rts = FALSE ) rt ``` ``` ## # A tibble: 3,000 × 43 ## created_at id id_str full_…¹ trunc…² displ…³ entities ## <dttm> <dbl> <chr> <chr> <lgl> <dbl> <list> ## 1 2022-08-08 15:34:31 1.56e18 15567255626… "I'm p… FALSE 225 <named list> ## 2 2022-08-09 20:03:29 1.56e18 15571556395… "🫒 De… FALSE 250 <named list> ## 3 2022-08-07 18:56:34 1.56e18 15564140236… "100+ … FALSE 275 <named list> ## 4 2022-08-10 09:30:13 1.56e18 15573586603… "Read … FALSE 107 <named list> ## 5 2022-08-10 09:27:16 1.56e18 15573579189… "@luka… FALSE 121 <named list> ## 6 2022-08-10 09:26:20 1.56e18 15573576819… "Error… FALSE 104 <named list> ## 7 2022-08-10 09:26:20 1.56e18 15573576814… "How d… FALSE 108 <named list> ## 8 2022-08-10 09:26:19 1.56e18 15573576794… "The s… FALSE 262 <named list> ## 9 2022-08-10 09:23:05 1.56e18 15573568662… "Pytho… FALSE 270 <named list> ## 10 2022-08-10 09:19:40 1.56e18 15573560057… "On av… FALSE 146 <named list> ## # … with 2,990 more rows, 36 more variables: metadata <list>, source <chr>, ## # in_reply_to_status_id <dbl>, in_reply_to_status_id_str <chr>, ## # in_reply_to_user_id <dbl>, in_reply_to_user_id_str <chr>, ## # in_reply_to_screen_name <chr>, geo <list>, coordinates <list>, ## # place <list>, contributors <lgl>, is_quote_status <lgl>, ## # quoted_status_id <dbl>, quoted_status_id_str <chr>, quoted_status <list>, ## # retweet_count <int>, favorite_count <int>, favorited <lgl>, … ## # ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names ``` --- # Searching users ```r countvoncount <- get_timeline(user = "countvoncount", n = 4000) countvoncount ``` ``` ## # A tibble: 3,250 × 43 ## created_at id id_str full_…¹ trunc…² displ…³ entities ## <dttm> <dbl> <chr> <chr> <lgl> <dbl> <list> ## 1 2022-08-09 14:23:11 1.56e18 15570700013… Three … FALSE 36 <named list> ## 2 2022-08-08 12:22:48 1.56e18 15566773163… Three … FALSE 36 <named list> ## 3 2022-08-07 17:22:47 1.56e18 15563904200… Three … FALSE 37 <named list> ## 4 2022-08-07 09:22:46 1.56e18 15562696218… Three … FALSE 35 <named list> ## 5 2022-08-06 20:22:45 1.56e18 15560733242… Three … FALSE 35 <named list> ## 6 2022-08-06 11:22:45 1.56e18 15559374262… Three … FALSE 36 <named list> ## 7 2022-08-05 17:22:43 1.56e18 15556656300… Three … FALSE 36 <named list> ## 8 2022-08-05 12:22:43 1.56e18 15555901309… Three … FALSE 34 <named list> ## 9 2022-08-04 20:22:42 1.56e18 15553485341… Three … FALSE 44 <named list> ## 10 2022-08-04 08:22:41 1.56e18 15551673370… Three … FALSE 31 <named list> ## # … with 3,240 more rows, 36 more variables: source <chr>, ## # in_reply_to_status_id <lgl>, in_reply_to_status_id_str <lgl>, ## # in_reply_to_user_id <lgl>, in_reply_to_user_id_str <lgl>, ## # in_reply_to_screen_name <lgl>, geo <list>, coordinates <list>, ## # place <list>, contributors <lgl>, is_quote_status <lgl>, ## # retweet_count <int>, favorite_count <int>, favorited <lgl>, ## # retweeted <lgl>, lang <chr>, possibly_sensitive <list>, … ## # ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names ``` --- # Visualizing tweets ```r ts_plot(countvoncount, by = "1 week") ``` <img src="index_files/figure-html/rstats-freq-1.png" width="80%" style="display: block; margin: auto;" /> --- # Visualizing tweets ```r ts_plot(countvoncount, by = "1 month") ``` <img src="index_files/figure-html/rstats-freq-day-1.png" width="80%" style="display: block; margin: auto;" /> --- # Visualizing tweets ```r ts_plot(countvoncount, by = "1 week") + theme(plot.title = element_text(face = "bold")) + labs( x = NULL, y = NULL, title = "Frequency of @countvoncount Twitter posts", subtitle = "Twitter status (tweet) counts aggregated using one week intervals", caption = "\nSource: Data collected from Twitter's REST API via rtweet" ) ``` <img src="index_files/figure-html/rstats-freq-clean-1.png" width="80%" style="display: block; margin: auto;" /> --- # Exercise: Practice using `rtweet` .pull-left[ <img src="https://upload.wikimedia.org/wikipedia/commons/thumb/2/21/Katy_Perry%E2%80%93Zenith_Paris.jpg/360px-Katy_Perry%E2%80%93Zenith_Paris.jpg" width="80%" style="display: block; margin: auto;" /> ] .pull-right[ <img src="https://upload.wikimedia.org/wikipedia/commons/thumb/d/df/Kim_Kardashian_West_2014.jpg/320px-Kim_Kardashian_West_2014.jpg" width="80%" style="display: block; margin: auto;" /> ] --- class: inverse, middle # Writing an API function --- ## Writing an API function * No package for API * Write your own function! * [Open Movie Database](http://www.omdbapi.com/) --- ## Expected elements 1. Authentication Key/Token 1. Base URL 1. Search Parameters 1. Response Format --- ## Determine the shape of an API request <img src="../../../../../../../../img/ombd.png" width="80%" style="display: block; margin: auto;" /> --- ## Determine the shape of an API request ```http http://www.omdbapi.com/?apikey=[apikey]&t=Sharknado&y=2013 ``` ``` ## { ## "Title": "Sharknado", ## "Year": "2013", ## "Rated": "Not Rated", ## "Released": "11 Jul 2013", ## "Runtime": "86 min", ## "Genre": "Action, Adventure, Comedy", ## "Director": "Anthony C. Ferrante", ## "Writer": "Thunder Levin", ## "Actors": "Ian Ziering, Tara Reid, John Heard", ## "Plot": "When a freak hurricane swamps Los Angeles, nature's deadliest killer rules sea, land, and air as thousands of sharks terrorize the waterlogged populace.", ## "Language": "English", ## "Country": "United States", ## "Awards": "1 win & 2 nominations", ## "Poster": "https://m.media-amazon.com/images/M/MV5BODcwZWFiNTEtNDgzMC00ZmE2LWExMzYtNzZhZDgzNDc5NDkyXkEyXkFqcGdeQXVyMTQxNzMzNDI@._V1_SX300.jpg", ## "Ratings": [ ## { ## "Source": "Internet Movie Database", ## "Value": "3.3/10" ## }, ## { ## "Source": "Rotten Tomatoes", ## "Value": "74%" ## } ## ], ## "Metascore": "N/A", ## "imdbRating": "3.3", ## "imdbVotes": "49,549", ## "imdbID": "tt2724064", ## "Type": "movie", ## "DVD": "03 Sep 2013", ## "BoxOffice": "N/A", ## "Production": "N/A", ## "Website": "N/A", ## "Response": "True" ## } ## ``` --- ## `httr::GET()` ```r sharknado <- GET(url = "http://www.omdbapi.com/?", query = list(t = "Sharknado", y = 2013, apikey = getOption("omdb_key")) ) ``` --- ## JavaScript Object Notation (JSON) ```r content(sharknado, type = "text") %>% # print the contents in a clear structure prettify() ``` ``` ## { ## "Title": "Sharknado", ## "Year": "2013", ## "Rated": "TV-14", ## "Released": "11 Jul 2013", ## "Runtime": "86 min", ## "Genre": "Action, Adventure, Comedy, Horror, Sci-Fi, Thriller", ## "Director": "Anthony C. Ferrante", ## "Writer": "Thunder Levin", ## "Actors": "Ian Ziering, Tara Reid, John Heard, Cassandra Scerbo", ## "Plot": "When a freak hurricane swamps Los Angeles, nature's deadliest killer rules sea, land, and air as thousands of sharks terrorize the waterlogged populace.", ## "Language": "English", ## "Country": "USA", ## "Awards": "1 win & 2 nominations.", ## "Poster": "https://m.media-amazon.com/images/M/MV5BODcwZWFiNTEtNDgzMC00ZmE2LWExMzYtNzZhZDgzNDc5NDkyXkEyXkFqcGdeQXVyMTQxNzMzNDI@._V1_SX300.jpg", ## "Ratings": [ ## { ## "Source": "Internet Movie Database", ## "Value": "3.3/10" ## }, ## { ## "Source": "Rotten Tomatoes", ## "Value": "74%" ## } ## ], ## "Metascore": "N/A", ## "imdbRating": "3.3", ## "imdbVotes": "47,284", ## "imdbID": "tt2724064", ## "Type": "movie", ## "DVD": "03 Sep 2013", ## "BoxOffice": "N/A", ## "Production": "N/A", ## "Website": "N/A", ## "Response": "True" ## } ## ``` --- ## JSON ```r sharknado_df <- content(sharknado) %>% as_tibble() sharknado_df ``` ``` ## # A tibble: 2 × 25 ## Title Year Rated Released Runtime Genre Director Writer Actors Plot Language ## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> ## 1 Shar… 2013 TV-14 11 Jul … 86 min Acti… Anthony… Thund… Ian Z… When… English ## 2 Shar… 2013 TV-14 11 Jul … 86 min Acti… Anthony… Thund… Ian Z… When… English ## # … with 14 more variables: Country <chr>, Awards <chr>, Poster <chr>, ## # Ratings <list>, Metascore <chr>, imdbRating <chr>, imdbVotes <chr>, ## # imdbID <chr>, Type <chr>, DVD <chr>, BoxOffice <chr>, Production <chr>, ## # Website <chr>, Response <chr> ``` --- ## Additional information from `GET()` ```r sharknado$url ``` ``` ## [1] "http://www.omdbapi.com/?t=Sharknado&y=2013&apikey=[apikey]" ``` -- ```r status_code(sharknado) ``` ``` ## [1] 200 ``` --- ## HTTP status code Code | Status -------|--------| 1xx | Informational 2xx | Success 3xx | Redirection 4xx | Client error (you did something wrong) 5xx | Server error (server did something wrong) > [A more intuitive guide](https://www.flickr.com/photos/girliemac/sets/72157628409467125) --- ## Iteration through a set of movies ```r omdb_api <- function(title, api_key){ # send GET request response <- GET(url = "http://www.omdbapi.com/?", query = list(t = title, apikey = api_key) ) # parse response to JSON response_df <- content(response) %>% as_tibble() # print a message to track progress message(glue::glue("Scraping {title}...")) return(response_df) } ``` --- ## Iteration through a set of movies ```r sharknados <- c("Sharknado", "Sharknado 2", "Sharknado 3", "Sharknado 4", "Sharknado 5") ``` ```r # modify function to delay by one second omdb_api_slow <- purrr::slowly(f = omdb_api, rate = rate_delay(1)) # iterate over all the films sharknados_df <- map_dfr(.x = sharknados, .f = omdb_api_slow, api_key = getOption("omdb_key")) ``` ``` ## Scraping Sharknado... ``` ``` ## Scraping Sharknado 2... ``` ``` ## Scraping Sharknado 3... ``` ``` ## Scraping Sharknado 4... ``` ``` ## Scraping Sharknado 5... ``` --- ## Iteration through a set of movies ```r sharknados_df ``` ``` ## # A tibble: 10 × 25 ## Title Year Rated Released Runtime Genre Director Writer Actors Plot ## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> ## 1 Sharknado 2013 TV-14 11 Jul … 86 min Acti… Anthony… Thund… Ian Z… When… ## 2 Sharknado 2013 TV-14 11 Jul … 86 min Acti… Anthony… Thund… Ian Z… When… ## 3 Sharknado 2:… 2014 TV-14 30 Jul … 95 min Acti… Anthony… Thund… Ian Z… Fin … ## 4 Sharknado 2:… 2014 TV-14 30 Jul … 95 min Acti… Anthony… Thund… Ian Z… Fin … ## 5 Sharknado 3:… 2015 TV-14 22 Jul … 93 min Acti… Anthony… Thund… Ian Z… A mo… ## 6 Sharknado 3:… 2015 TV-14 22 Jul … 93 min Acti… Anthony… Thund… Ian Z… A mo… ## 7 Sharknado 4:… 2016 TV-14 31 Jul … 95 min Acti… Anthony… Thund… Ian Z… Fin,… ## 8 Sharknado 4:… 2016 TV-14 31 Jul … 95 min Acti… Anthony… Thund… Ian Z… Fin,… ## 9 Sharknado 5:… 2017 TV-14 06 Aug … 93 min Acti… Anthony… Thund… Ian Z… With… ## 10 Sharknado 5:… 2017 TV-14 06 Aug … 93 min Acti… Anthony… Thund… Ian Z… With… ## # … with 15 more variables: Language <chr>, Country <chr>, Awards <chr>, ## # Poster <chr>, Ratings <list>, Metascore <chr>, imdbRating <chr>, ## # imdbVotes <chr>, imdbID <chr>, Type <chr>, DVD <chr>, BoxOffice <chr>, ## # Production <chr>, Website <chr>, Response <chr> ``` --- ## Messy API responses ```r content(sharknado) %>% as_tibble() ``` ``` ## # A tibble: 2 × 25 ## Title Year Rated Released Runtime Genre Director Writer Actors Plot Language ## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> ## 1 Shar… 2013 TV-14 11 Jul … 86 min Acti… Anthony… Thund… Ian Z… When… English ## 2 Shar… 2013 TV-14 11 Jul … 86 min Acti… Anthony… Thund… Ian Z… When… English ## # … with 14 more variables: Country <chr>, Awards <chr>, Poster <chr>, ## # Ratings <list>, Metascore <chr>, imdbRating <chr>, imdbVotes <chr>, ## # imdbID <chr>, Type <chr>, DVD <chr>, BoxOffice <chr>, Production <chr>, ## # Website <chr>, Response <chr> ``` --- ## Whoops ``` ## List of 25 ## $ Title : chr "Sharknado" ## $ Year : chr "2013" ## $ Rated : chr "TV-14" ## $ Released : chr "11 Jul 2013" ## $ Runtime : chr "86 min" ## $ Genre : chr "Action, Adventure, Comedy, Horror, Sci-Fi, Thriller" ## $ Director : chr "Anthony C. Ferrante" ## $ Writer : chr "Thunder Levin" ## $ Actors : chr "Ian Ziering, Tara Reid, John Heard, Cassandra Scerbo" ## $ Plot : chr "When a freak hurricane swamps Los Angeles, nature's deadliest killer rules sea, land, and air as thousands of s"| __truncated__ ## $ Language : chr "English" ## $ Country : chr "USA" ## $ Awards : chr "1 win & 2 nominations." ## $ Poster : chr "https://m.media-amazon.com/images/M/MV5BODcwZWFiNTEtNDgzMC00ZmE2LWExMzYtNzZhZDgzNDc5NDkyXkEyXkFqcGdeQXVyMTQxNzM"| __truncated__ ## $ Ratings :List of 2 ## ..$ :List of 2 ## .. ..$ Source: chr "Internet Movie Database" ## .. ..$ Value : chr "3.3/10" ## ..$ :List of 2 ## .. ..$ Source: chr "Rotten Tomatoes" ## .. ..$ Value : chr "74%" ## $ Metascore : chr "N/A" ## $ imdbRating: chr "3.3" ## $ imdbVotes : chr "47,284" ## $ imdbID : chr "tt2724064" ## $ Type : chr "movie" ## $ DVD : chr "03 Sep 2013" ## $ BoxOffice : chr "N/A" ## $ Production: chr "N/A" ## $ Website : chr "N/A" ## $ Response : chr "True" ``` --- class: inverse, middle # Rectangling messy data --- ## Rectangling and `tidyr` .task[ Art and craft of taking a deeply nested list and taming it into a tidy data set of rows and columns ] -- * `unnest_longer()` - each row contains multiple observations * `unnest_wider()` - each row contains a single observation * `unnest_auto()` - make an educated guess * `hoist()` - extract a specific element --- ## `unnest_wider()` and `hoist()` ```r str(gh_users, list.len = 3) ``` ``` ## List of 6 ## $ :List of 30 ## ..$ login : chr "gaborcsardi" ## ..$ id : int 660288 ## ..$ avatar_url : chr "https://avatars.githubusercontent.com/u/660288?v=3" ## .. [list output truncated] ## $ :List of 30 ## ..$ login : chr "jennybc" ## ..$ id : int 599454 ## ..$ avatar_url : chr "https://avatars.githubusercontent.com/u/599454?v=3" ## .. [list output truncated] ## $ :List of 30 ## ..$ login : chr "jtleek" ## ..$ id : int 1571674 ## ..$ avatar_url : chr "https://avatars.githubusercontent.com/u/1571674?v=3" ## .. [list output truncated] ## [list output truncated] ``` --- ## `unnest_wider()` and `hoist()` ```r (users <- tibble(user = gh_users)) ``` ``` ## # A tibble: 6 × 1 ## user ## <list> ## 1 <named list [30]> ## 2 <named list [30]> ## 3 <named list [30]> ## 4 <named list [30]> ## 5 <named list [30]> ## 6 <named list [30]> ``` -- ```r names(users$user[[1]]) ``` ``` ## [1] "login" "id" "avatar_url" ## [4] "gravatar_id" "url" "html_url" ## [7] "followers_url" "following_url" "gists_url" ## [10] "starred_url" "subscriptions_url" "organizations_url" ## [13] "repos_url" "events_url" "received_events_url" ## [16] "type" "site_admin" "name" ## [19] "company" "blog" "location" ## [22] "email" "hireable" "bio" ## [25] "public_repos" "public_gists" "followers" ## [28] "following" "created_at" "updated_at" ``` --- ## `unnest_wider()` ```r users %>% unnest_wider(col = user) ``` ``` ## # A tibble: 6 × 30 ## login id avatar_url gravatar_id url html_url followers_url following_url ## <chr> <int> <chr> <chr> <chr> <chr> <chr> <chr> ## 1 gabo… 6.60e5 https://a… "" http… https:/… https://api.… https://api.… ## 2 jenn… 5.99e5 https://a… "" http… https:/… https://api.… https://api.… ## 3 jtle… 1.57e6 https://a… "" http… https:/… https://api.… https://api.… ## 4 juli… 1.25e7 https://a… "" http… https:/… https://api.… https://api.… ## 5 leep… 3.51e6 https://a… "" http… https:/… https://api.… https://api.… ## 6 masa… 8.36e6 https://a… "" http… https:/… https://api.… https://api.… ## # … with 22 more variables: gists_url <chr>, starred_url <chr>, ## # subscriptions_url <chr>, organizations_url <chr>, repos_url <chr>, ## # events_url <chr>, received_events_url <chr>, type <chr>, site_admin <lgl>, ## # name <chr>, company <chr>, blog <chr>, location <chr>, email <chr>, ## # hireable <lgl>, bio <chr>, public_repos <int>, public_gists <int>, ## # followers <int>, following <int>, created_at <chr>, updated_at <chr> ``` --- ## `hoist()` ```r users %>% hoist( .col = user, followers = "followers", login = "login", url = "html_url" ) ``` ``` ## # A tibble: 6 × 4 ## followers login url user ## <int> <chr> <chr> <list> ## 1 303 gaborcsardi https://github.com/gaborcsardi <named list [27]> ## 2 780 jennybc https://github.com/jennybc <named list [27]> ## 3 3958 jtleek https://github.com/jtleek <named list [27]> ## 4 115 juliasilge https://github.com/juliasilge <named list [27]> ## 5 213 leeper https://github.com/leeper <named list [27]> ## 6 34 masalmon https://github.com/masalmon <named list [27]> ``` --- ## `gh_repos` and nested list structures ```r (repos <- tibble(repo = gh_repos)) ``` ``` ## # A tibble: 6 × 1 ## repo ## <list> ## 1 <list [30]> ## 2 <list [30]> ## 3 <list [30]> ## 4 <list [26]> ## 5 <list [30]> ## 6 <list [30]> ``` --- ## `unnest_longer()` ```r repos <- repos %>% unnest_longer(col = repo) repos ``` ``` ## # A tibble: 176 × 1 ## repo ## <list> ## 1 <named list [68]> ## 2 <named list [68]> ## 3 <named list [68]> ## 4 <named list [68]> ## 5 <named list [68]> ## 6 <named list [68]> ## 7 <named list [68]> ## 8 <named list [68]> ## 9 <named list [68]> ## 10 <named list [68]> ## # … with 166 more rows ``` --- ## `unnest_longer()` ```r repos %>% hoist( .col = repo, login = c("owner", "login"), name = "name", homepage = "homepage", watchers = "watchers_count" ) ``` ``` ## # A tibble: 176 × 5 ## login name homepage watchers repo ## <chr> <chr> <chr> <int> <list> ## 1 gaborcsardi after <NA> 5 <named list [65]> ## 2 gaborcsardi argufy <NA> 19 <named list [65]> ## 3 gaborcsardi ask <NA> 5 <named list [65]> ## 4 gaborcsardi baseimports <NA> 0 <named list [65]> ## 5 gaborcsardi citest <NA> 0 <named list [65]> ## 6 gaborcsardi clisymbols "" 18 <named list [65]> ## 7 gaborcsardi cmaker <NA> 0 <named list [65]> ## 8 gaborcsardi cmark <NA> 0 <named list [65]> ## 9 gaborcsardi conditions <NA> 0 <named list [65]> ## 10 gaborcsardi crayon <NA> 52 <named list [65]> ## # … with 166 more rows ``` --- count: false .panel1-gh-repos-auto-auto[ ```r *tibble(repo = gh_repos) ``` ] .panel2-gh-repos-auto-auto[ ``` ## # A tibble: 6 × 1 ## repo ## <list> ## 1 <list [30]> ## 2 <list [30]> ## 3 <list [30]> ## 4 <list [26]> ## 5 <list [30]> ## 6 <list [30]> ``` ] --- count: false .panel1-gh-repos-auto-auto[ ```r tibble(repo = gh_repos) %>% * unnest_auto(col = repo) ``` ] .panel2-gh-repos-auto-auto[ ``` ## # A tibble: 176 × 1 ## repo ## <list> ## 1 <named list [68]> ## 2 <named list [68]> ## 3 <named list [68]> ## 4 <named list [68]> ## 5 <named list [68]> ## 6 <named list [68]> ## 7 <named list [68]> ## 8 <named list [68]> ## 9 <named list [68]> ## 10 <named list [68]> ## # … with 166 more rows ``` ] --- count: false .panel1-gh-repos-auto-auto[ ```r tibble(repo = gh_repos) %>% unnest_auto(col = repo) %>% * unnest_auto(col = repo) ``` ] .panel2-gh-repos-auto-auto[ ``` ## # A tibble: 176 × 68 ## id name full_name owner private html_url description fork url ## <int> <chr> <chr> <list> <lgl> <chr> <chr> <lgl> <chr> ## 1 6.12e7 after gaborcsa… <named list> FALSE https:/… Run Code i… FALSE http… ## 2 4.05e7 argu… gaborcsa… <named list> FALSE https:/… Declarativ… FALSE http… ## 3 3.64e7 ask gaborcsa… <named list> FALSE https:/… Friendly C… FALSE http… ## 4 3.49e7 base… gaborcsa… <named list> FALSE https:/… Do we get … FALSE http… ## 5 6.16e7 cite… gaborcsa… <named list> FALSE https:/… Test R pac… TRUE http… ## 6 3.39e7 clis… gaborcsa… <named list> FALSE https:/… Unicode sy… FALSE http… ## 7 3.72e7 cmak… gaborcsa… <named list> FALSE https:/… port of cm… TRUE http… ## 8 6.80e7 cmark gaborcsa… <named list> FALSE https:/… CommonMark… TRUE http… ## 9 6.32e7 cond… gaborcsa… <named list> FALSE https:/… <NA> TRUE http… ## 10 2.43e7 cray… gaborcsa… <named list> FALSE https:/… R package … FALSE http… ## # … with 166 more rows, and 59 more variables: forks_url <chr>, keys_url <chr>, ## # collaborators_url <chr>, teams_url <chr>, hooks_url <chr>, ## # issue_events_url <chr>, events_url <chr>, assignees_url <chr>, ## # branches_url <chr>, tags_url <chr>, blobs_url <chr>, git_tags_url <chr>, ## # git_refs_url <chr>, trees_url <chr>, statuses_url <chr>, ## # languages_url <chr>, stargazers_url <chr>, contributors_url <chr>, ## # subscribers_url <chr>, subscription_url <chr>, commits_url <chr>, … ``` ] <style> .panel1-gh-repos-auto-auto { color: black; width: 38.6060606060606%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-gh-repos-auto-auto { color: black; width: 59.3939393939394%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-gh-repos-auto-auto { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- ## ASOIAF characters ```r chars <- tibble(char = got_chars) chars ``` ``` ## # A tibble: 30 × 1 ## char ## <list> ## 1 <named list [18]> ## 2 <named list [18]> ## 3 <named list [18]> ## 4 <named list [18]> ## 5 <named list [18]> ## 6 <named list [18]> ## 7 <named list [18]> ## 8 <named list [18]> ## 9 <named list [18]> ## 10 <named list [18]> ## # … with 20 more rows ``` --- ## ASOIAF characters ```r chars2 <- chars %>% unnest_wider(col = char) chars2 ``` ``` ## # A tibble: 30 × 18 ## url id name gender culture born died alive titles aliases father ## <chr> <int> <chr> <chr> <chr> <chr> <chr> <lgl> <list> <list> <chr> ## 1 https://w… 1022 Theo… Male "Ironb… "In … "" TRUE <chr> <chr> "" ## 2 https://w… 1052 Tyri… Male "" "In … "" TRUE <chr> <chr> "" ## 3 https://w… 1074 Vict… Male "Ironb… "In … "" TRUE <chr> <chr> "" ## 4 https://w… 1109 Will Male "" "" "In … FALSE <chr> <chr> "" ## 5 https://w… 1166 Areo… Male "Norvo… "In … "" TRUE <chr> <chr> "" ## 6 https://w… 1267 Chett Male "" "At … "In … FALSE <chr> <chr> "" ## 7 https://w… 1295 Cres… Male "" "In … "In … FALSE <chr> <chr> "" ## 8 https://w… 130 Aria… Female "Dorni… "In … "" TRUE <chr> <chr> "" ## 9 https://w… 1303 Daen… Female "Valyr… "In … "" TRUE <chr> <chr> "" ## 10 https://w… 1319 Davo… Male "Weste… "In … "" TRUE <chr> <chr> "" ## # … with 20 more rows, and 7 more variables: mother <chr>, spouse <chr>, ## # allegiances <list>, books <list>, povBooks <list>, tvSeries <list>, ## # playedBy <list> ``` --- ## Nested list objects ```r chars2 %>% select(where(is.list)) ``` ``` ## # A tibble: 30 × 7 ## titles aliases allegiances books povBooks tvSeries playedBy ## <list> <list> <list> <list> <list> <list> <list> ## 1 <chr [3]> <chr [4]> <chr [1]> <chr [3]> <chr [2]> <chr [6]> <chr [1]> ## 2 <chr [2]> <chr [11]> <chr [1]> <chr [2]> <chr [4]> <chr [6]> <chr [1]> ## 3 <chr [2]> <chr [1]> <chr [1]> <chr [3]> <chr [2]> <chr [1]> <chr [1]> ## 4 <chr [1]> <chr [1]> <NULL> <chr [1]> <chr [1]> <chr [1]> <chr [1]> ## 5 <chr [1]> <chr [1]> <chr [1]> <chr [3]> <chr [2]> <chr [2]> <chr [1]> ## 6 <chr [1]> <chr [1]> <NULL> <chr [2]> <chr [1]> <chr [1]> <chr [1]> ## 7 <chr [1]> <chr [1]> <NULL> <chr [2]> <chr [1]> <chr [1]> <chr [1]> ## 8 <chr [1]> <chr [1]> <chr [1]> <chr [4]> <chr [1]> <chr [1]> <chr [1]> ## 9 <chr [5]> <chr [11]> <chr [1]> <chr [1]> <chr [4]> <chr [6]> <chr [1]> ## 10 <chr [4]> <chr [5]> <chr [2]> <chr [1]> <chr [3]> <chr [5]> <chr [1]> ## # … with 20 more rows ``` --- ## Choose your own adventure <img src="https://images-na.ssl-images-amazon.com/images/I/81E88dflPeL.jpg" width="30%" style="display: block; margin: auto;" /> --- class: inverse, middle # Every appearance per book/season --- count: false ## Every appearance per book/season .panel1-got-appearances-auto[ ```r *select( * .data = chars2, * name, books, tvSeries *) ``` ] .panel2-got-appearances-auto[ ``` ## # A tibble: 30 × 3 ## name books tvSeries ## <chr> <list> <list> ## 1 Theon Greyjoy <chr [3]> <chr [6]> ## 2 Tyrion Lannister <chr [2]> <chr [6]> ## 3 Victarion Greyjoy <chr [3]> <chr [1]> ## 4 Will <chr [1]> <chr [1]> ## 5 Areo Hotah <chr [3]> <chr [2]> ## 6 Chett <chr [2]> <chr [1]> ## 7 Cressen <chr [2]> <chr [1]> ## 8 Arianne Martell <chr [4]> <chr [1]> ## 9 Daenerys Targaryen <chr [1]> <chr [6]> ## 10 Davos Seaworth <chr [1]> <chr [5]> ## # … with 20 more rows ``` ] --- count: false ## Every appearance per book/season .panel1-got-appearances-auto[ ```r select( .data = chars2, name, books, tvSeries ) %>% * pivot_longer( * cols = c(books, tvSeries), * names_to = "media", * values_to = "value" * ) ``` ] .panel2-got-appearances-auto[ ``` ## # A tibble: 60 × 3 ## name media value ## <chr> <chr> <list> ## 1 Theon Greyjoy books <chr [3]> ## 2 Theon Greyjoy tvSeries <chr [6]> ## 3 Tyrion Lannister books <chr [2]> ## 4 Tyrion Lannister tvSeries <chr [6]> ## 5 Victarion Greyjoy books <chr [3]> ## 6 Victarion Greyjoy tvSeries <chr [1]> ## 7 Will books <chr [1]> ## 8 Will tvSeries <chr [1]> ## 9 Areo Hotah books <chr [3]> ## 10 Areo Hotah tvSeries <chr [2]> ## # … with 50 more rows ``` ] --- count: false ## Every appearance per book/season .panel1-got-appearances-auto[ ```r select( .data = chars2, name, books, tvSeries ) %>% pivot_longer( cols = c(books, tvSeries), names_to = "media", values_to = "value" ) %>% * unnest_longer(col = value) ``` ] .panel2-got-appearances-auto[ ``` ## # A tibble: 180 × 3 ## name media value ## <chr> <chr> <chr> ## 1 Theon Greyjoy books A Game of Thrones ## 2 Theon Greyjoy books A Storm of Swords ## 3 Theon Greyjoy books A Feast for Crows ## 4 Theon Greyjoy tvSeries Season 1 ## 5 Theon Greyjoy tvSeries Season 2 ## 6 Theon Greyjoy tvSeries Season 3 ## 7 Theon Greyjoy tvSeries Season 4 ## 8 Theon Greyjoy tvSeries Season 5 ## 9 Theon Greyjoy tvSeries Season 6 ## 10 Tyrion Lannister books A Feast for Crows ## # … with 170 more rows ``` ] <style> .panel1-got-appearances-auto { color: black; width: 38.6060606060606%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-got-appearances-auto { color: black; width: 59.3939393939394%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-got-appearances-auto { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- class: inverse, middle # Match character's title to their name --- count: false ## Match character's title to their name .panel1-got-title-name-auto[ ```r *select( * .data = chars2, * name, * title = titles *) ``` ] .panel2-got-title-name-auto[ ``` ## # A tibble: 30 × 2 ## name title ## <chr> <list> ## 1 Theon Greyjoy <chr [3]> ## 2 Tyrion Lannister <chr [2]> ## 3 Victarion Greyjoy <chr [2]> ## 4 Will <chr [1]> ## 5 Areo Hotah <chr [1]> ## 6 Chett <chr [1]> ## 7 Cressen <chr [1]> ## 8 Arianne Martell <chr [1]> ## 9 Daenerys Targaryen <chr [5]> ## 10 Davos Seaworth <chr [4]> ## # … with 20 more rows ``` ] --- count: false ## Match character's title to their name .panel1-got-title-name-auto[ ```r select( .data = chars2, name, title = titles ) %>% * unnest_longer(col = title) ``` ] .panel2-got-title-name-auto[ ``` ## # A tibble: 60 × 2 ## name title ## <chr> <chr> ## 1 Theon Greyjoy "Prince of Winterfell" ## 2 Theon Greyjoy "Captain of Sea Bitch" ## 3 Theon Greyjoy "Lord of the Iron Islands (by law of the green lands)" ## 4 Tyrion Lannister "Acting Hand of the King (former)" ## 5 Tyrion Lannister "Master of Coin (former)" ## 6 Victarion Greyjoy "Lord Captain of the Iron Fleet" ## 7 Victarion Greyjoy "Master of the Iron Victory" ## 8 Will "" ## 9 Areo Hotah "Captain of the Guard at Sunspear" ## 10 Chett "" ## # … with 50 more rows ``` ] <style> .panel1-got-title-name-auto { color: black; width: 38.6060606060606%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-got-title-name-auto { color: black; width: 59.3939393939394%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-got-title-name-auto { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- # May the force be with you <img src="https://media.giphy.com/media/C0ZArORmrDQCRTIFnQ/giphy.gif" width="80%" style="display: block; margin: auto;" />