Let’s convert the Leaders dataset into a tibble. It is a list with one element per character:
library(tibblify)
str(politicians[1])
#> List of 1
#> $ :List of 8
#> ..$ id : int 1
#> ..$ name : chr "Barack"
#> ..$ surname : chr "Obama"
#> ..$ dob : chr "1961-08-04"
#> ..$ n_children: num 2
#> ..$ parents :List of 2
#> .. ..$ mother: chr "Ann Dunham"
#> .. ..$ father: chr "Barack Obama Sr."
#> ..$ spouses :List of 1
#> .. ..$ : chr "Michelle Robinson"
#> ..$ offices :List of 2
#> .. ..$ :List of 2
#> .. .. ..$ name : chr "President of the United States"
#> .. .. ..$ start: chr "2009-01-20"
#> .. ..$ :List of 2
#> .. .. ..$ name : chr "United States Senator from Illinois"
#> .. .. ..$ start: chr "2005-01-03"We can let tibblify() automatically recognize the structure of the list and find an appropriate presentation as a tibble:
politicians_tibble <- tibblify(politicians)
politicians_tibble
#> # A tibble: 2 x 8
#> id name surname dob n_children parents$mother $father spouses offices
#> <int> <chr> <chr> <chr> <dbl> <chr> <chr> <list<> <list<t>
#> 1 1 Barack Obama 1961-… 2 Ann Dunham Barack… [1] [2 × 2]
#> 2 2 Boris Johnson 1964-… NA <NA> Stanle… [2] [3 × 2]The parents column is a tibble with the columns mother and father because in the original list leader1 the field parents is a named list.
politicians_tibble$parents
#> # A tibble: 2 x 2
#> mother father
#> <chr> <chr>
#> 1 Ann Dunham Barack Obama Sr.
#> 2 <NA> Stanley Johnsonand the spouses column is a list_of character because the spouses field is a list and all elements are characters
politicians_tibble$spouses
#> <list_of<character>[2]>
#> [[1]]
#> [1] "Michelle Robinson"
#>
#> [[2]]
#> [1] "Allegra Mostyn-Owen" "Marina Wheeler"In the above example we used tibblify() without any further specification on how to convert the list into a tibble. This is quite useful in an interactive session but often you want to provide a specification yourself. Some of the reasons are:
First, we use get_spec() to view the specification used to convert our list to a tibble:
get_spec(politicians_tibble)
#> lcols(
#> id = lcol_int("id"),
#> name = lcol_chr("name"),
#> surname = lcol_chr("surname"),
#> dob = lcol_chr("dob"),
#> n_children = lcol_dbl("n_children", .default = NA),
#> parents = lcol_df(
#> "parents",
#> mother = lcol_chr("mother", .default = NA),
#> father = lcol_chr("father")
#> ),
#> spouses = lcol_lst_of(
#> "spouses",
#> .ptype = character(0),
#> .parser = ~vec_c(!!!.x, .ptype = character()),
#> .default = NULL
#> ),
#> offices = lcol_df_lst(
#> "offices",
#> name = lcol_chr("name"),
#> start = lcol_chr("start")
#> )
#> )A specification always starts with a call to lcols() (similar to readr::cols()). Then you specify the columns you want with name-value pairs. The name is the name of the resulting column and the value is a specification created with one of the lcol_*() functions.
The first argument to lcol_*() is always a path which describes where to find the element. The syntax is the same as in purrr::map() used to extract fields. Some examples
leader <- politicians[[1]]
# get the element `id`
path <- c("id")
leader[["id"]]
#> [1] 1
# get the element `father` in the element `parents`
path <- c("parents", "father")
leader[["parents"]][["mother"]]
#> [1] "Ann Dunham"
# get the first element in the element `spouses`
path <- list("spouses", 1)
leader[["spouses"]][[1]]
#> [1] "Michelle Robinson"A couple of typical vector types have a predefined extractor:
lcol_chr(): create a character column.lcol_lgl(): create a logical column.lcol_int(): create an integer column.lcol_dbl(): create a double column.lcol_dat(): create a date column.lcol_dtt(): create a datetime column.See parsing other types to create a column of your own prototype.
tibblify(
politicians,
lcols(
lcol_int("id"),
lcol_chr("name"),
`family name` = lcol_chr("surname")
)
)
#> # A tibble: 2 x 3
#> id name `family name`
#> <int> <chr> <chr>
#> 1 1 Barack Obama
#> 2 2 Boris JohnsonIf an element doesn’t exist an error is thrown as in purrr::chuck(). To use a default value instead of throwing an error use the .default argument. The .default value is also used in case the element at the path is empty:
list_default <- list(
list(a = 1),
list(a = NULL),
list(a = integer()),
list()
)
tibblify(
list_default,
lcols(lcol_int("a"))
)
#> Error: empty or absent element at path a
tibblify(
list_default,
lcols(lcol_int("a", .default = 0))
)
#> # A tibble: 4 x 1
#> a
#> <int>
#> 1 1
#> 2 0
#> 3 0
#> 4 0When the cast is not possible with vctrs::vec_cast() you can use the .parser argument to supply a custom parser. It is passed to rlang::as_function() so you can use a function or a formula. A typical use case are dates stored as strings.
tibblify(
politicians,
lcols(
lcol_chr("surname"),
lcol_dat("dob", .parser = ~ as.Date(.x, format = "%Y-%m-%d"))
)
)
#> # A tibble: 2 x 2
#> surname dob
#> <chr> <date>
#> 1 Obama 1961-08-04
#> 2 Johnson 1964-06-19A list_of is a list where each element in the list has the same prototype. It is useful when you have fields with more than one element as in the spouses field.
spouses_tbl <- tibblify(
politicians,
lcols(
lcol_chr("surname"),
lcol_lst_of("spouses", .ptype = character())
)
)
spouses_tbl$spouses
#> <list_of<character>[2]>
#> [[1]]
#> [[1]][[1]]
#> [1] "Michelle Robinson"
#>
#>
#> [[2]]
#> [[2]][[1]]
#> [1] "Allegra Mostyn-Owen"
#>
#> [[2]][[2]]
#> [1] "Marina Wheeler"You can use tidyr::unnest() or tidyr::unnest_longer() to flatten these columns to regular columns.
A list column is used when you have a field with mixed elements.
Analogue to readr::col_guess() and readr::col_skip() you can specify that you want to guess the column type with lcol_guess() respectively skip a field with lcol_skip(). Skipping a column can be useful when you set a default column type or you want to make clear that you know about the field and intentionally skip it.
Guessing a column is useful in interactive sessions but you shouldn’t rely on it in automated scripts.
If a field contains is a named list where each element has length 1 or 0 the field is converted to a tibble column. This is for example the case for the parents field:
leaders_tibble <- tibblify(
politicians,
lcols(
lcol_chr("surname"),
lcol_guess("parents")
)
)
leaders_tibble
#> # A tibble: 2 x 2
#> surname parents$mother $father
#> <chr> <chr> <chr>
#> 1 Obama Ann Dunham Barack Obama Sr.
#> 2 Johnson <NA> Stanley JohnsonTibble columns are a relatively new concept in the tidyverse. You can unpack a tibble column into regular columns with tidyr::unpack().
tibblify provides shortcuts for a couple of common types. To parse a vector or record type without a parser use lcol_vec(). Let’s say you have a list with difftimes
now <- Sys.time()
past <- now - c(100, 200)
x <- list(
list(timediff = now - past[1]),
list(timediff = now - past[2])
)
x
#> [[1]]
#> [[1]]$timediff
#> Time difference of 1.666667 mins
#>
#>
#> [[2]]
#> [[2]]$timediff
#> Time difference of 3.333333 minsYou need to define a prototype
ptype <- as.difftime(0, units = "secs")
ptype
#> Time difference of 0 secsand then use it in lcol_vec()
tibblify(
x,
lcols(
lcol_vec("timediff", ptype = ptype)
)
)
#> # A tibble: 2 x 1
#> timediff
#> <drtn>
#> 1 100 secs
#> 2 200 secsYou can use the .default argument of lcols() to define a parser used for all unspecified fields.
tibblify(
politicians,
lcols(
lcol_chr("name"),
lcol_chr("surname"),
.default = lcol_lst(path = zap(), .default = NULL)
)
)
#> # A tibble: 2 x 8
#> name surname id dob n_children parents spouses offices
#> <chr> <chr> <list> <list> <list> <list> <list> <list>
#> 1 Barack Obama <int [1… <chr [1… <dbl [1]> <named list [… <list [1… <list [2…
#> 2 Boris Johnson <int [1… <chr [1… <NULL> <named list [… <list [2… <list [3…