Survival Analysis with causaldef

causaldef supports survival analysis and competing risks. This is critical for clinical applications where “time-to-event” is the primary outcome.

The executable survival examples in this vignette require a compatible survival runtime. In the current support matrix, that means R >= 4.0.

Survival Specification

We use causal_spec_survival() to handle censoring and event times.

library(causaldef)
library(ggplot2) # Ensure ggplot2 is loaded

# DAG Helper
plot_dag <- function(coords, edges, title = NULL) {
  edges_df <- merge(edges, coords, by.x = "from", by.y = "name")
  colnames(edges_df)[c(3,4)] <- c("x_start", "y_start")
  edges_df <- merge(edges_df, coords, by.x = "to", by.y = "name")
  colnames(edges_df)[c(5,6)] <- c("x_end", "y_end")
  
  ggplot2::ggplot(coords, ggplot2::aes(x = x, y = y)) +
    ggplot2::geom_segment(data = edges_df, ggplot2::aes(x = x_start, y = y_start, xend = x_end, yend = y_end),
                 arrow = ggplot2::arrow(length = ggplot2::unit(0.3, "cm"), type = "closed"), 
                 color = "gray40", size = 1, alpha = 0.8) +
    ggplot2::geom_point(size = 14, color = "white", fill = "#CD5C5C", shape = 21, stroke = 1.5) + # IndianRed
    ggplot2::geom_text(ggplot2::aes(label = name), fontface = "bold", size = 3.5, color = "white") +
    ggplot2::ggtitle(title) + 
    ggplot2::theme_void(base_size = 14) +
    ggplot2::theme(plot.title = ggplot2::element_text(hjust = 0.5, face = "bold", margin = ggplot2::margin(b = 10))) +
    ggplot2::coord_fixed()
}

data(hct_outcomes)

head(hct_outcomes)
#>   id age disease_status kps  donor_type conditioning_intensity time_to_event
#> 1  1  57          Early  89 HLA-Matched          Myeloablative         14.31
#> 2  2  50          Early  85  Mismatched                Reduced         11.05
#> 3  3  59   Intermediate  97  Mismatched          Myeloablative         20.23
#> 4  4  65          Early  84   Unrelated          Myeloablative          2.90
#> 5  5  54       Advanced  96   Unrelated                Reduced          2.22
#> 6  6  48          Early  64  Mismatched          Myeloablative          1.25
#>   event_status
#> 1      Relapse
#> 2      Relapse
#> 3      Relapse
#> 4      Relapse
#> 5      Relapse
#> 6        Death

We want to estimate the effect of conditioning_intensity on time_to_event for Death, treating “Relapse” as a competing event.

spec_surv <- causal_spec_survival(
  data = hct_outcomes,
  treatment = "conditioning_intensity",
  time = "time_to_event",
  event = "event_status",
  # Note: The event setup needs binary 0/1 or Surv object logic
  # We will convert it internally or preprocess
  covariates = c("age", "disease_status", "kps"),
  estimand = "ATE" # Difference in survival probability
)
#> Warning: Event column 'event_status' has multiple non-censor levels ('death',
#> 'relapse'). Coercing to a binary indicator where any non-censored level is
#> treated as an event. For competing risks, use causal_spec_competing() or create
#> an explicit 0/1 event indicator.
#> ✔ Created survival causal specification: n=800, 677 events

Note: In the current version, ensure your event column follows standard R survival conventions (0=censored, 1=event) or use proper factor levels.

Analysis of HCT Outcomes: Competing Risks

We will analyze the hct_outcomes dataset to estimate the effect of conditioning intensity on the risk of relapse, while accounting for the competing risk of death without relapse.

Data Preparation

In a competing risks setting, we need to distinguish between the event of interest (Relapse) and the competing event (Death). We create separate indicators for proper specification.

# Create indicator for Primary Event (Relapse)
hct_outcomes$relapse <- ifelse(hct_outcomes$event_status == "Relapse", 1, 0)

# Create indicator for Competing Event (Death)
# Patients who die without relapse are removed from the risk set for relapse
# but are NOT merely censored in the same way as loss-to-follow-up.
hct_outcomes$death_cr <- ifelse(hct_outcomes$event_status == "Death", 1, 0)

# Check the distribution of events
# Check the distribution of events
table(hct_outcomes$event_status)
#> 
#> Censored    Death  Relapse 
#>      123      131      546

# Visualize Competing Risks
coords <- data.frame(
  name = c("Covariates", "CondIntensity", "Relapse", "Death"),
  x = c(0, -1.5, 1.5, 1.5),
  y = c(1, 0, 0.5, -0.5)
)
edges <- data.frame(
  from = c("Covariates", "Covariates", "Covariates", "CondIntensity", "CondIntensity"),
  to =   c("CondIntensity", "Relapse", "Death", "Relapse", "Death")
)
plot_dag(coords, edges, title = "Competing Risks Structure")

Specification

We using causal_spec_survival() to fully define the structure, including the competing event.

spec_hct <- causal_spec_survival(
  data = hct_outcomes,
  treatment = "conditioning_intensity",
  time = "time_to_event",
  event = "relapse",           # Primary event of interest
  competing_event = "death_cr", # Competing event
  covariates = c("age", "disease_status", "kps", "donor_type"),
  estimand = "ATE"
)
#> ✔ Created survival causal specification: n=800, 546 events

print(spec_hct)
#> 
#> -- Survival Causal Specification ----------------------------------------
#> 
#> * Treatment: conditioning_intensity ( binary )
#> * Time: time_to_event (max = 86.08 )
#> * Event: relapse ( 546 events)
#> * Covariates: age, disease_status, kps, donor_type 
#> * Sample size: 800 
#> * Estimand: ATE 
#> * Competing event: death_cr

Deficiency Estimation

We compare unadjusted analysis (Naive) with Inverse Probability of Treatment Weighting (IPTW) to see if we can reduce the causal deficiency.

# Estimate deficiency
# Note: In survival settings, this measures how well we can recover the 
# interventional survival curves.
results_hct <- estimate_deficiency(
  spec_hct,
  methods = c("unadjusted", "iptw"),
  n_boot = 0 # Set > 0 for bootstrap confidence intervals
)
#> ℹ Inferred treatment value: Reduced
#> ℹ Estimating deficiency: unadjusted
#> ℹ Estimating deficiency: iptw

print(results_hct)
#> 
#> -- Deficiency Proxy Estimates (PS-TV) ------
#> 
#>      Method  Delta SE            Quality
#>  unadjusted 0.3030  - Insufficient (Red)
#>        iptw 0.0644  -   Caution (Yellow)
#> Note: delta is a propensity-score TV proxy (overlap/balance diagnostic).
#> 
#> Best method: iptw (delta = 0.0644 )

Clinical Interpretation of Deficiency:

The deficiency \(\delta\) quantifies how much information is lost by using observational data instead of a randomized trial:

For this HCT Study:

Unadjusted analysis (high \(\delta\)): Observed distribution differs significantly from target population (High TV distance) -> Biased comparison
IPTW adjusted (lower \(\delta\)): Reweights patients to balance baseline characteristics (Low TV distance) -> Closer to “what-if-randomized” comparison

Clinical Significance:

\(\delta\) < 0.05: Excellent control – observational analysis approximates RCT-quality evidence
0.05 \(\leq\) \(\delta\) < 0.10: Adequate – reasonable for clinical decision support, acknowledge limitations
\(\delta\) \(\geq\) 0.10: Caution – substantial bias risk, consider additional covariates or sensitivity analysis

Why This Matters:
In HCT, patients receiving myeloablative conditioning are typically younger and healthier (confounding by indication). IPTW attempts to create a “synthetic randomization” by upweighting underrepresented patient types. Low \(\delta\) validates this approach.

Survival Modeling and RMST

To validate our findings and translate them into clinical metrics, we use standard survival models. We calculate the Restricted Mean Survival Time (RMST), which provides an interpretable effect size (difference in life expectancy up to a specific time horizon).

Why RMST and not Hazard Ratios? In a decision-theoretic framework, we need an estimand that maps directly to a utility function (e.g., “months of life gained”). Hazard ratios are relative measures that are difficult to translate into absolute utility bounds (\(M\)) needed to interpret regret bounds (transfer penalty \(M\delta\) and minimax floor \((M/2)\delta\)). RMST, being on the time scale (e.g., “months gained”), allows us to define \(M\) concretely (e.g., the horizon), making these guarantees interpretable.

# 1. Visualize Survival Curves
# We can use the standard survival package to visualize the unadjusted curves first
library(survival)
hct_outcomes$rfs_event <- ifelse(hct_outcomes$event_status != "Censored", 1, 0)
surv_obj <- Surv(time = hct_outcomes$time_to_event, event = hct_outcomes$rfs_event)
km_fit_naive <- survfit(surv_obj ~ conditioning_intensity, data = hct_outcomes)

plot(km_fit_naive, col = c("blue", "red"), lwd = 2, 
     main = "Relapse-Free Survival (Unadjusted)",
     xlab = "Time (Months)", ylab = "Survival Probability")
legend("topright", legend = levels(hct_outcomes$conditioning_intensity), 
       col = c("blue", "red"), lwd = 2)


# 2. Estimate Causal Effect (RMST Difference)
# We use estimate_effect() on our deficiency object to compute the RMST difference
# using the IPTW weights we found were effective (low deficiency).

# Horizon = 24 months
horizon <- 24
# We reuse the deficiency object (and its learned weights) for the new estimand.
# Since deficiency measures the distributional match, the same weights apply 
# to any outcome derived from the survival curve.
results_hct$spec$estimand <- "RMST"
results_hct$spec$horizon <- 24 

effect_iptw <- estimate_effect(
  results_hct, 
  target_method = "iptw",
  contrast = c("Myeloablative", "Reduced") # Defined as Treated vs Control
)

print(effect_iptw)
#> 
#> -- Causal Effect Estimate ----------------------
#> Method:    iptw 
#> Type:      RMST Difference 
#> Contrast:  Myeloablative vs Reduced 
#> Estimate:  2.9452 
#> Horizon:   24

# Compare with Unadjusted Effect
effect_naive <- estimate_effect(
  results_hct, 
  target_method = "unadjusted",
  contrast = c("Myeloablative", "Reduced")
)

print(effect_naive)
#> 
#> -- Causal Effect Estimate ----------------------
#> Method:    unadjusted 
#> Type:      RMST Difference 
#> Contrast:  Myeloablative vs Reduced 
#> Estimate:  4.4556 
#> Horizon:   24

cat(sprintf("\nCausal RMST Difference (IPTW): %.2f months\n", effect_iptw$estimate))
#> 
#> Causal RMST Difference (IPTW): 2.95 months
cat(sprintf("Biased RMST Difference (Unadjusted): %.2f months\n", effect_naive$estimate))
#> Biased RMST Difference (Unadjusted): 4.46 months

Interpreting RMST (Restricted Mean Survival Time) Results:

RMST is the average time patients survive (relapse-free) up to the time horizon (here, 24 months). The difference in RMST is the average survival benefit of one conditioning regimen over another.

What These Numbers Mean:

Unadjusted RMST difference (biased):
- Compares patients as-treated, ignoring baseline imbalances
- Confounded by age, disease status, performance status
- May overestimate or underestimate true causal effect
IPTW-adjusted RMST difference (causal estimate):
- Balances baseline covariates via weighting
- Estimates what the difference would be if patients were randomized
- Valid if: (a) no unobserved confounding, (b) positivity holds, (c) correct propensity model

Clinical Decision-Making:

Positive RMST difference (Myeloablative > Reduced): Myeloablative conditioning extends relapse-free survival on average
Negative RMST difference (Reduced > Myeloablative): Reduced conditioning is superior
Magnitude: Each month of RMST difference represents average survival gain/loss across the population

Example Interpretation:
If IPTW RMST difference = +2.5 months favoring Myeloablative:
“After adjusting for age, disease status, and comorbidity index, patients receiving myeloablative conditioning had 2.5 additional months of relapse-free survival on average within the first 24 months post-transplant, compared to reduced-intensity conditioning.”

Caution:
RMST is truncated at the horizon (24 months). Differences beyond 24 months are not captured. Choose the horizon based on clinical relevance (e.g., 1-year, 2-year, 5-year survival).

Quantifying Uncertainty: Policy Regret Bounds

The RMST difference provides the observed treatment effect. However, this is biased if treatment assignment was confounded.

The deficiency \(\delta\) we calculated earlier tells us how much we can trust this estimate.

If \(\delta \approx 0\) (after IPTW), we can interpret the weighted RMST difference as causal (up to sampling error).
If \(\delta > 0\), policy_regret_bound() converts this information gap into decision-relevant regret bounds on the same scale as the utility (here: months of RMST up to the horizon).

# Calculate policy regret bounds (transfer penalty + minimax floor)
# Utility range is [0, horizon] (months of survival).
# The bound is tied to the previously estimated IPTW deficiency proxy.
bound <- policy_regret_bound(results_hct, utility_range = c(0, horizon), method = "iptw")
#> ℹ Transfer penalty: 1.5457 (delta = 0.0644)
print(bound)
#> 
#> -- Policy Regret Bounds -------------------------------------------------
#> 
#> * Deficiency delta: 0.0644 
#> * Delta mode: point 
#> * Delta method: iptw 
#> * Delta selection: pre-specified method 
#> * Utility range: [0, 24]
#> * Transfer penalty: 1.5457 (additive regret upper bound)
#> * Minimax floor: 0.7728 (worst-case lower bound)
#> 
#> Note: this is a plug-in bound using a deficiency proxy rather than an identified exact deficiency.
#> 
#> Interpretation: Transfer penalty is 6.4 % of utility range given delta

Clinical Interpretation (Transfer Penalty and Minimax Safety Floor):

The bounds answer: “How much regret could confounding add to my decision, and what regret is unavoidable without stronger assumptions?”

Formulas:

Transfer penalty = \(M \times \delta\) (additive worst-case inflation term in the regret upper bound)
Minimax safety floor = \((M/2) \times \delta\) (irreducible worst-case regret lower bound)

where \(M\) is the utility range (here, the RMST horizon in months) and \(\delta\) is the deficiency proxy.

What It Means:

The transfer penalty is the conservative quantity for deployment: it upper-bounds how much worse your deployed policy could be (in regret) due to the observational–interventional information gap.
The minimax safety floor is a fundamental limit: if this floor is already clinically unacceptable, no algorithm can guarantee safe decisions from the observational data alone.

Clinical Decision Framework (use Transfer Penalty):

If Transfer Penalty < 1 month:
[PASS] Proceed with confidence – Observational evidence is decision-relevant at this horizon.
If 1 \(\leq\) Transfer Penalty < 3 months:
[CAUTION] Proceed with acknowledgment – Often acceptable, but document the bound: “Based on observational evidence with transfer penalty of X months…”
If Transfer Penalty \(\geq\) 3 months:
[STOP] High uncertainty – Consider:
- Improving adjustment (add covariates, better models)
- Conducting a prospective RCT
- Using shared decision-making (acknowledge uncertainty to patients)
- Limiting recommendations to subgroups with better data

Example:
If \(\delta\) = 0.02 (excellent IPTW performance) and \(M\) = 24 months:
Transfer Penalty = \(24 \times 0.02\) = 0.48 months
Minimax Safety Floor = \((24/2) \times 0.02\) = 0.24 months

Interpretation: If the observed RMST difference is 4 months, then even the conservative transfer-penalty ratio is roughly \(4/0.48 \approx 8:1\), suggesting the decision is robust at this horizon (subject to the modeling assumptions).

Reporting to Clinicians:
“Our observational analysis suggests a 4-month survival advantage for myeloablative conditioning (95% CI: [2.1, 5.9]). After adjustment, the deficiency proxy is \(\delta \approx 0.02\) at a 24-month horizon, implying a transfer penalty of about 0.5 months and a minimax safety floor of about 0.25 months. These bounds are well below the observed benefit, supporting the recommendation, though an RCT would eliminate this residual uncertainty.”

Connection to Deficiency

The key insight:

Low \(\delta\) (successful adjustment) -> Low transfer penalty / minimax floor (trustworthy decisions)
High \(\delta\) (poor adjustment) -> High transfer penalty / minimax floor (risky decisions)

The deficiency \(\delta\) and regret bounds work together:

\(\delta\) tells you how well you approximated an RCT (information quality)
Transfer penalty / minimax floor translate that gap into decision risk on your utility scale

Both should be reported to provide complete transparency about observational evidence quality.