--- title: "PSsurvival: Propensity Score Methods for Survival Analysis" output: rmarkdown::html_vignette: toc: true toc_depth: 3 vignette: > %\VignetteIndexEntry{PSsurvival: Propensity Score Methods for Survival Analysis} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 6, fig.height = 4 ) ``` ## Introduction PSsurvival implements propensity score methods for causal inference with time-to-event outcomes. The package provides two main functions: - **`surveff()`**: Estimates counterfactual survival functions and survival differences - **`marCoxph()`**: Estimates marginal hazard ratios via weighted Cox regression Both functions support binary and multiple treatment groups, multiple weighting schemes, and propensity score trimming. ```{r load-package} library(PSsurvival) ``` ## Data The package includes two simulated datasets: ```{r data} data(simdata_bin) # Binary treatment (A vs B) data(simdata_multi) # Four treatment groups (A, B, C, D) # Binary treatment data head(simdata_bin) table(simdata_bin$Z) # Multiple treatment data table(simdata_multi$Z) ``` Both datasets contain: - `X1`, `X2`, `X3`: Continuous covariates - `B1`, `B2`: Binary covariates - `Z`: Treatment group - `time`: Follow-up time (range 0-20) - `event`: Event indicator (1 = event, 0 = censored) ## Counterfactual Survival Analysis with `surveff()` ### Key Arguments **Data and model specification:** - `data`: Data frame containing treatment, outcome, and covariates - `ps_formula`: Propensity score model (`treatment ~ covariates`) - `censoring_formula`: Censoring model (`Surv(time, event) ~ covariates`) - `eval_times`: Time points for evaluation (default: all unique event times) - `contrast_matrix`: Matrix for pairwise comparisons (multiple groups only; column names must match treatment levels) **Weighting and trimming:** - `estimand`: Target estimand ("ATE", "ATT", or "overlap") - `att_group`: Target group for ATT (required if `estimand = "ATT"`) - `trim`: Trimming method ("symmetric" or "asymmetric", default: no trimming) - `delta`: Threshold for symmetric trimming - `alpha`: Percentile for asymmetric trimming **Variance estimation:** - `censoring_method`: Method for censoring adjustment ("weibull" or "cox") - `variance_method`: Variance estimation ("analytical" or "bootstrap") - `boot_level`: Bootstrap sampling level ("full" or "strata") - `B`: Number of bootstrap iterations - `parallel`, `mc.cores`, `seed`: Parallel computing and reproducibility ### Basic Usage: Binary Treatment ```{r surveff-basic} result_bin <- surveff( data = simdata_bin, ps_formula = Z ~ X1 + X2 + X3 + B1 + B2, censoring_formula = survival::Surv(time, event) ~ X1 + B1, estimand = "overlap", censoring_method = "weibull" ) print(result_bin, max.len = 3) ``` The `ps_formula` specifies the propensity score model (treatment ~ covariates). The `censoring_formula` models the censoring distribution within each treatment group. ### Multiple Treatment Groups For multiple groups, survival differences require a `contrast_matrix` specifying which comparisons to make. Each row represents one contrast with exactly two non-zero elements: 1 and -1. **Column names must match treatment levels exactly.** ```{r surveff-multi} # Define pairwise comparisons contrast_mat <- matrix( c(1, -1, 0, 0, # A vs B 1, 0, -1, 0, # A vs C 1, 0, 0, -1), # A vs D nrow = 3, byrow = TRUE ) colnames(contrast_mat) <- c("A", "B", "C", "D") # Must match treatment levels rownames(contrast_mat) <- c("A vs B", "A vs C", "A vs D") # row names are optional result_multi <- surveff( data = simdata_multi, ps_formula = Z ~ X1 + X2 + X3 + B1 + B2, censoring_formula = survival::Surv(time, event) ~ X1 + B1, estimand = "ATE", contrast_matrix = contrast_mat, censoring_method = "cox", variance_method = "bootstrap", B = 50, seed = 123 ) print(result_multi, max.len = 3) ``` Without `contrast_matrix`, only group-specific survival functions are estimated (no differences in survivals). ### Estimand Options Three estimands are supported and estimated by corresponding weighting methods: - **ATE** (default): Average treatment effect for the entire population using inverse probability weighting (IPW) - **ATT**: Average treatment effect on the treated (requires `att_group`) using IPW with target-group specific tilting - **overlap**: Targets the overlap population where treatment groups have most similar covariate distributions using overlap weights ```{r estimands, eval=FALSE} # ATE (default) surveff(..., estimand = "ATE") # ATT targeting group B surveff(..., estimand = "ATT", att_group = "B") # Overlap weights surveff(..., estimand = "overlap") ``` ### Propensity Score Trimming Trimming excludes observations with extreme propensity scores. Not available with overlap weights (which already downweight extremes). - **Symmetric** (Crump): Removes observations with PS below `delta` or above `1-delta` for any treatment - **Asymmetric** (Sturmer): Removes based on treatment-specific quantiles ```{r trimming, eval=FALSE} # Symmetric trimming surveff(..., estimand = "ATE", trim = "symmetric", delta = 0.1) # Asymmetric trimming surveff(..., estimand = "ATE", trim = "asymmetric", alpha = 0.05) ``` ### Censoring Methods Two methods are available for estimating censoring distributions: - **`"weibull"`** (default): Parametric Weibull AFT model. Efficient when censoring follows Weibull distribution. Allows analytical variance for binary treatment. - **`"cox"`**: Semi-parametric Cox proportional hazards model. More flexible but requires bootstrap variance. ```{r censoring-methods, eval=FALSE} # Weibull censoring model surveff(..., censoring_method = "weibull") # Cox censoring model (requires bootstrap) surveff(..., censoring_method = "cox", variance_method = "bootstrap", B = 200) ``` **No censoring adjustment**: To skip censoring adjustment entirely (e.g., when there is no censoring), use `censoring_formula = Surv(time, event) ~ 0`. In this case, all censoring scores are set to 1. ### Variance Estimation Two variance estimation methods are available: - **`"analytical"`**: M-estimation theory-based variance. Only available for binary treatment with Weibull censoring. Fast and efficient when applicable. - **`"bootstrap"`**: Resamples the entire estimation pipeline. Required for Cox censoring or multiple treatment groups. More computationally intensive but broadly applicable. ```{r variance-methods, eval=FALSE} # Analytical variance (binary + weibull only) surveff(..., censoring_method = "weibull", variance_method = "analytical") # Bootstrap variance surveff(..., variance_method = "bootstrap", B = 200) ``` **Bootstrap sampling level** (`boot_level`): Controls how bootstrap samples are drawn. - **`"full"`** (default): Resamples from entire dataset. Standard choice for observational studies. - **`"strata"`**: Resamples within treatment groups, preserving group sizes. Useful when treatment assignment follows a stratified or fixed-ratio design. ```{r boot-level, eval=FALSE} # Stratified bootstrap surveff(..., variance_method = "bootstrap", boot_level = "strata", B = 200) ``` ### Summary Output The `summary()` method provides two output styles: **Printed output** (`style = "prints"`, default): Displays formatted tables with confidence intervals. Customize with: - `conf_level`: Confidence level (default 0.95) - `max.len`: Maximum rows to display - `round.digits`: Number of decimal places **Programmatic output** (`style = "returns"`): Returns a list of matrices for further processing: - `survival_summary`: List of survival estimates by group - `difference_summary`: List of treatment effect estimates ```{r summary-surveff} summary(result_bin, conf_level = 0.95, max.len = 5) ``` For programmatic access: ```{r summary-returns} summ <- summary(result_bin, style = "returns") names(summ) head(summ$survival_summary$A) head(summ$difference_summary$`B vs A`) ``` ### Plotting Results The `plot()` method produces survival curves (`type = "surv"`) or treatment effect curves (`type = "survdiff"`). ```{r plot-surv, fig.width=7, fig.height=5} # Survival curves for all groups plot(result_multi, type = "surv", curve_width = 0.5) ``` ```{r plot-survdiff, fig.width=7, fig.height=5} # Treatment effect curves plot(result_multi, type = "survdiff", curve_width = 0.5) ``` **Selecting specific strata**: Use `strata_to_plot` to display a subset of groups or contrasts. ```{r plot-subset-surv, fig.width=7, fig.height=5} # Only groups A and C plot(result_multi, type = "surv", curve_width = 0.5, strata_to_plot = c("A", "C")) ``` ```{r plot-subset-diff, fig.width=7, fig.height=5} # Only specific contrasts (names must match contrast_matrix rownames exactly) plot(result_multi, type = "survdiff", curve_width = 0.5, strata_to_plot = c("A vs B", "A vs D")) ``` **Customization options**: - `strata_to_plot`: Vector of strata names to display - `strata_colors`: Custom colors for each stratum (length must match `strata_to_plot`) - `max_time`: Maximum time on x-axis - `include_CI`: Show confidence interval ribbons (TRUE/FALSE) - `conf_level`: Confidence level for intervals - `curve_width`: Line width for curves - `CI_alpha`: Transparency of CI ribbons (0-1) - `legend_position`: "right" or "bottom" - `legend_title`, `plot_title`: Custom titles ```{r plot-custom, fig.width=7, fig.height=5} plot(result_multi, type = "surv", curve_width = 0.5, strata_to_plot = c("A", "B"), strata_colors = c("steelblue", "coral"), max_time = 15, include_CI = TRUE, legend_position = "bottom", plot_title = "Survival by Treatment Group") ``` ## Marginal Hazard Ratios with `marCoxph()` `marCoxph()` fits a weighted Cox model to estimate marginal hazard ratios. ### Key Arguments **Data and model specification:** - `data`: Data frame containing treatment, outcome, and covariates - `ps_formula`: Propensity score model (`treatment ~ covariates`) - `time_var`: Name of time-to-event variable (character string) - `event_var`: Name of event indicator variable (character string, 1=event, 0=censored) - `reference_level`: Reference treatment level (MANDATORY) **Weighting and trimming:** - `estimand`: Target estimand ("ATE", "ATT", or "overlap") - `att_group`: Target group for ATT (required if `estimand = "ATT"`) - `trim`: Trimming method ("symmetric" or "asymmetric", default: no trimming) - `delta`: Threshold for symmetric trimming - `alpha`: Percentile for asymmetric trimming **Variance estimation:** - `variance_method`: "bootstrap" (default) or "robust" - `boot_level`: Bootstrap sampling level ("full" or "strata") - `B`: Number of bootstrap iterations - `parallel`, `mc.cores`, `seed`: Parallel computing and reproducibility - `robust`: Use robust variance in Cox model (default TRUE) ### Basic Usage: Binary Treatment ```{r marCoxph-basic} hr_result <- marCoxph( data = simdata_bin, ps_formula = Z ~ X1 + X2 + X3 + B1 + B2, time_var = "time", event_var = "event", reference_level = "A", estimand = "overlap" ) print(hr_result) ``` The `reference_level` argument is mandatory and specifies the baseline group for hazard ratio comparisons. ### Multiple Treatment Groups ```{r marCoxph-multi} hr_multi <- marCoxph( data = simdata_multi, ps_formula = Z ~ X1 + X2 + X3 + B1 + B2, time_var = "time", event_var = "event", reference_level = "A", estimand = "ATE", variance_method = "bootstrap", B = 50, seed = 456 ) print(hr_multi) ``` ### Estimand Options Same as `surveff()`: ATE, ATT, and overlap weights are supported. Trimming options (symmetric/asymmetric) are also available. ```{r marcoxph-estimands, eval=FALSE} # ATE (default) marCoxph(..., estimand = "ATE") # ATT targeting specific group marCoxph(..., estimand = "ATT", att_group = "treated") # Overlap weights marCoxph(..., estimand = "overlap") ``` ### Trimming Options Propensity score trimming works identically to `surveff()`: ```{r marcoxph-trimming, eval=FALSE} # Symmetric trimming marCoxph(..., estimand = "ATE", trim = "symmetric", delta = 0.1) # Asymmetric trimming marCoxph(..., estimand = "ATE", trim = "asymmetric", alpha = 0.05) ``` ### Variance Options Two variance estimation methods: - **`"bootstrap"`** (default): Resamples the entire analysis pipeline (propensity scores, weights, and Cox model) - **`"robust"`**: Uses sandwich variance from `coxph()` directly (faster, no resampling) ```{r marCoxph-variance, eval=FALSE} # Bootstrap variance (default) marCoxph(..., variance_method = "bootstrap", B = 200) # Robust sandwich variance (no bootstrap) marCoxph(..., variance_method = "robust") ``` The `boot_level` argument (`"full"` or `"strata"`) also applies here when using bootstrap variance, with the same interpretation as in `surveff()`. ### Summary Output The `summary()` method provides detailed results with confidence intervals on both log and original scales: ```{r summary-marcoxph} summary(hr_multi, conf_level = 0.95) ``` **Output styles**: - **`style = "prints"`** (default): Displays formatted tables with log hazard ratios and exponentiated hazard ratios - **`style = "returns"`**: Returns a list containing: - `logHR`, `logHR_CI_lower`, `logHR_CI_upper`: Log-scale estimates and CIs - `HR`, `HR_CI_lower`, `HR_CI_upper`: Original-scale estimates and CIs - `SE`: Standard errors for log-scale hazard ratio (from specified variance method) - `n_per_group`, `events_per_group`: Sample sizes and event counts ```{r summary-marcoxph-returns} summ_hr <- summary(hr_multi, style = "returns") names(summ_hr) summ_hr$HR # Exponentiated hazard ratios ``` ## Summary | Feature | `surveff()` | `marCoxph()` | |---------|-------------|--------------| | Output | Survival curves, survival differences | Hazard ratios | | Estimands | ATE, ATT, overlap | ATE, ATT, overlap | | Censoring adjustment | Weibull or Cox | None (standard Cox) | | Variance | Analytical or bootstrap | Robust or bootstrap | | Trimming | Symmetric, asymmetric | Symmetric, asymmetric | | Multiple groups | Yes (with contrast_matrix) | Yes | ## References Cheng, C., Li, F., Thomas, L. E., & Li, F. (2022). Addressing extreme propensity scores in estimating counterfactual survival functions via the overlap weights. *American Journal of Epidemiology*, 191(6), 1140-1151. Li, F., & Li, F. (2019). Propensity score weighting for causal inference with multiple treatments. *The Annals of Applied Statistics*, 13(4), 2389-2415.