This vignette details how to effectively use the
{cmdstanr} package within a {rixpress}
pipeline for Bayesian statistical modelling with Stan. For a general
introduction to {rixpress} and its core concepts, please
refer to vignette("intro-concepts") and
vignette("core-functions").
{cmdstanr} provides a user-friendly R interface to
cmdstan, Stan’s command-line interface. While powerful, its
reliance on external processes and file system interactions requires
careful handling within the hermetic build environment of
{rixpress}.
As with any {rixpress} pipeline, the first step is to
define the execution environment using {rix}:
library(rix)
rix(
date = "2025-04-29",
r_pkgs = c("readr", "dplyr", "ggplot2"), # Add other R packages as needed
system_pkgs = "cmdstan", # Crucial: include cmdstan as a system dependency
git_pkgs = list(
list(
package_name = "cmdstanr",
repo_url = "https://github.com/stan-dev/cmdstanr",
commit = "79d37792d8e4ffcf3cf721b8d7ee4316a1234b0c" # Pin to a specific commit
),
list(
package_name = "rixpress",
repo_url = "https://github.com/ropensci/rixpress",
commit = "HEAD" # Or pin to a specific commit
)
),
ide = "none", # Or your preferred IDE
project_path = ".",
overwrite = TRUE
)Key points in this environment definition:
cmdstan is included in system_pkgs. This
makes the cmdstan executables available to the
pipeline.{cmdstanr} is installed from its GitHub repository, as
it’s not available on CRAN. Pinning to a specific commit is recommended
for maximum reproducibility.With the environment set up, we can define the pipeline:
The Stan model code itself should reside in a .stan
file. We use rxp_r_file() to bring its contents into the
pipeline as a character string.
Next, we define parameters and simulate some data for our model.
rxp_r(
parameters,
list(
N = 100,
alpha = 2,
beta = -0.5,
sigma = 1.e-1
)
),
rxp_r(
x,
rnorm(parameters$N, 0, 1)
),
rxp_r(
y,
rnorm(
n = parameters$N,
mean = parameters$alpha + parameters$beta * x,
sd = parameters$sigma
)
),
rxp_r(
# Prepare the data list for cmdstanr
inputs,
list(N = parameters$N, x = x, y = y)
),Interfacing with cmdstan from within
{rixpress} requires a specific strategy due to the hermetic
nature of Nix sandboxes. We’ll use a wrapper function to handle model
compilation and sampling within a single rxp_r() step.
First, let’s define the wrapper function (e.g., in a
functions.R file that we’ll include):
# In functions.R
cmdstan_model_wrapper <- function(
stan_string = NULL, # The Stan model code as a character string
inputs, # Data list for the model
seed, # Seed for reproducibility
... # Additional arguments for cmdstan_model or sample
) {
# Create a temporary .stan file within the sandbox
stan_file <- tempfile(pattern = "model_", fileext = ".stan")
writeLines(stan_string, con = stan_file)
# Compile the Stan model
# cmdstanr will find cmdstan via the CMDSTAN environment variable
model <- cmdstanr::cmdstan_model(
stan_file = stan_file,
...
)
# Sample from the posterior
fitted_model <- model$sample(
data = inputs,
seed = seed,
...
)
return(fitted_model)
}Now, we use this wrapper in our pipeline:
# ... (continuation of pipeline_steps list)
rxp_r(
model, # Target name for the fitted model object
cmdstan_model_wrapper(
stan_string = bayesian_linear_regression_model,
inputs = inputs,
seed = 22
),
user_functions = "functions.R",
encoder = "save_model",
env_var = c("CMDSTAN" = "${defaultPkgs.cmdstan}/opt/cmdstan")
)Explanation of the Wrapper Approach:
stan_string = bayesian_linear_regression_model:
We pass the model code (read by rxp_r_file) as a string to
our wrapper.writeLines(stan_string, con = stan_file):
Inside the wrapper, the Stan code is written to a temporary
.stan file. This file exists within the sandbox of
the current rxp_r step. This is crucial because
cmdstan_model needs a file path. Attempting to pass the
original model.stan path directly via
additional_files to cmdstan_model can lead to
permission or path issues when cmdstan tries to compile it
from a different working directory or context.cmdstanr::cmdstan_model(): Compiles
the model from the temporary stan_file.model$sample(): Samples from the
compiled model.rxp_r step (and thus
the same sandbox). This is because the model object
returned by cmdstan_model() contains paths to the compiled
executable. If these were separate steps, the paths from the compilation
sandbox wouldn’t be valid in the sampling sandbox.env_var = c("CMDSTAN" = "${defaultPkgs.cmdstan}/opt/cmdstan"):
This sets the CMDSTAN environment variable within the
sandbox for this specific step. {cmdstanr} uses this
variable to locate the cmdstan installation. The
${defaultPkgs.cmdstan} is a Nix interpolation that resolves
to the path of the cmdstan package in the Nix store. If the
environment providing cmdstan were named differently, for
example cmdstan-env.nix, then you would need to use
${cmdstan_envPkgs.cmdstan}.{cmdstanr} provides a specific method for saving fitted
model objects to ensure all necessary components are preserved. We
define a simple wrapper for this to use with
{rixpress}.
By specifying encoder = "save_model" in the
rxp_r() call, {rixpress} will use this
function instead of the default saveRDS(). The fitted model
can then be read using rxp_read("model"), which will
internally use readRDS().
Using {cmdstanr} with {rixpress} involves
these key considerations:
Include cmdstan in system_pkgs and
{cmdstanr} (from Git) in your {rix}
environment definition.
Read your .stan file into the pipeline using
rxp_r_file().
Implement a wrapper function that:
.stan file inside the wrapper.cmdstanr::cmdstan_model() on this temporary
file.model$sample() to fit the model.Perform model compilation and sampling within the same
rxp_r() call using the wrapper.
Set the CMDSTAN environment variable for the
rxp_r() step that runs the wrapper, pointing to the Nix
store path of cmdstan.
Use {cmdstanr}’s $save_object() method
via a custom encoder for robust saving of the fitted
model.
This approach ensures that cmdstan can operate correctly
within the isolated and reproducible environment provided by
{rixpress} and Nix.