
Transfer REDCap data to a database in chunks and use in R without exceeding available memory. Whether you have a large or small project, enjoy features such as data labeling, converting coded values, and hearing a “quack” sound on success. Compatible with all databases but optimized for DuckDB.
Is your project outgrowing your computer? Have you seen this error when using the REDCap API to retrieve data?
Error: vector memory limit of 16.0 GB reached, see mem.maxVSize()
What does it mean? Well, R objects a stored in your random access memory (RAM). When your data gets too big, you hit your memory limit. redquack’s solution to this error is to store the data out of memory in a local database for easy retrieval in R.
The solution:
API requests are handled by httr2, which persistently retries to ensure your data is transferred successfully.
From CRAN:
# install.packages("pak")
pak::pak("redquack")From GitHub (development version):
pak::pak("dylanpieper/redquack")These packages are also used in the examples:
pak::pak(c("keyring", "dplyr"))Your API token allows R to interface with your REDCap instance, and it should be stored securely. I recommend using the keyring package to store your API token. For example:
keyring::key_set("redcap", "test")Use this token to run the examples: 9A81268476645C4E5F03428B8AC3AA7B
Data from REDCap is transferred to a database connection in chunks of record IDs:
library(redquack)
library(dplyr)
conn <- use_duckdb()
result <- redcap_to_db(
conn,
url = "https://bbmc.ouhsc.edu/redcap/api/",
token = keyring::key_get("redcap", "test")
)redcap_to_db() returns a list of metadata with class
redcap_transfer_result:
success: Logical if the transfer was completed with no
failed processingerror_chunks: Vector of chunk numbers that failed
processingtime_s: Numeric value for total seconds to process all
dataThe database created by redcap_to_db() contains up to
four tables:
data: Contains the raw REDCap records
data <- tbl_redcap(conn) |> collect()metadata: Contains project metadata for labeling and
coded value conversion
meta <- metadata(conn)redcap_log: Contains REDCap audit logs from the past
week (default)
redcap_log <- redcap_log(conn)transfer_log: Contains operation logs of the
transfer process
transfer_log <- transfer_log(conn)Data is initially inserted into the database as VARCHAR/TEXT for consistent handling across chunks.
For DuckDB, data types are optimized after transfer to improve query performance:
You can also query the database to inspect the data types:
inspect(conn)Retrieve and manipulate your REDCap data with familiar dplyr verbs and tbl-like
syntax. The key difference is you reference the database table first
using tbl_redcap().
Retrieve your REDCap data as a single table:
data <- tbl_redcap(conn) |>
collect()Retrieve data organized by REDCap instruments as a list:
instruments <- tbl_redcap(conn) |>
collect_list() Assign instrument data frames to the global environment:
tbl_redcap(conn) |>
collect_list() |>
list_to_env()Use dplyr verbs to filter and select data before collecting:
# Filter to specific participants
nutmouse_data <- tbl_redcap(conn) |>
filter(name_last == "Nutmouse") |>
collect_labeled_list()
# Select specific columns across all instruments
key_vars <- tbl_redcap(conn) |>
select(record_id, email, sex, bmi) |>
collect_list()
# Simple column selection
analysis_data <- tbl_redcap(conn) |>
select(email, sex) |>
collect_labeled()Perform complex data manipulation with grouping and filtering:
# Group by sex and filter to below-average BMI
below_avg_bmi <- tbl_redcap(conn) |>
select(record_id, sex, bmi) |>
group_by(sex) |>
filter(bmi < mean(bmi)) |>
arrange(bmi) |>
collect_list()Apply column and coded value labels and control value conversion
(i.e., choice mappings). collect_labeled() and
collect_labeled_list() give you control over how these
labels are applied:
Full labeling and coded value conversion (default):
data <- tbl_redcap(conn) |> collect_labeled()
data$sex
#> [1] "Female" "Male" "Male" "Female" "Male"
attr(,"label")
#> [1] "Gender"Keep raw coded values:
data <- tbl_redcap(conn) |> collect_labeled(convert = FALSE)
data$sex
#> <labelled<integer>[5]>: Gender
#> [1] 0 1 1 0 1
#>
#> Labels:
#> value label
#> 0 Female
#> 1 MaleColumn labels only:
data <- tbl_redcap(conn) |> collect_labeled(vals = FALSE)
data$sex
#> [1] 0 1 1 0 1
#> attr(,"label")
#> [1] "Gender"Value conversion only:
data <- tbl_redcap(conn) |> collect_labeled(cols = FALSE)
data$sex
#> [1] "Female" "Male" "Male" "Female" "Male"
attr(data$sex, "label")
#> NULLYou can also write a Parquet file directly from DuckDB. A Parquet file will be much smaller than a DuckDB file and easy to share:
save_parquet(conn, "redcap.parquet")When you’re finished working with your data, clean up your connection:
close_duckdb(conn)Or, if you do not need to access the data again, remove/delete the file:
remove_duckdb(conn)