| Title: | Decision-Theoretic Causal Diagnostics via Le Cam Deficiency |
|---|---|
| Description: | Implements Le Cam deficiency theory for causal inference, as described in Akdemir (2026) <doi:10.5281/zenodo.18367347>. Provides theorem-backed bounds together with computable proxy diagnostics for information loss from confounding, selection bias, and distributional shift. Supports continuous, binary, count, survival, and competing risks outcomes. Key features include propensity-score total-variation deficiency proxies, negative control diagnostics, policy regret bounds, and sensitivity analysis via confounding frontiers. |
| Authors: | Deniz Akdemir [aut, cre] |
| Maintainer: | Deniz Akdemir <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.2.0 |
| Built: | 2026-05-31 07:44:56 UTC |
| Source: | https://github.com/denizakdemir/causaldef |
Theory-forward causal diagnostics organized around Le Cam deficiency.
The package centers causal inference around the question: how much information separates the observational study at hand from the interventional experiment we would ideally like to run?
Scientific contract
The package exposes four kinds of quantities:
Theorem-backed utilities: closed-form or theorem-aligned bounds such
as policy_regret_bound(), policy_regret_bound_vc(), confounding_frontier(),
sharp_lower_bound(), and wasserstein_deficiency_gaussian().
Computable deficiency proxies: estimate_deficiency() currently returns
a propensity-score total-variation proxy (metric = "ps_tv"), not a generic
nonparametric estimator of the exact Le Cam deficiency.
Sensitivity diagnostics: functions such as nc_diagnostic() combine
observable diagnostics with user-supplied sensitivity parameters.
Experimental heuristics: some modules currently expose effect estimates together with heuristic risk scores or proxy summaries. These should be read as diagnostics unless the individual help page states a theorem-backed identification result.
Recommended workflow
Specify the causal problem with causal_spec() or a survival variant.
Compute a pre-specified diagnostic or proxy with estimate_deficiency() or
a specialized module.
Stress-test assumptions with sensitivity tools such as nc_diagnostic() or
confounding_frontier().
Translate the resulting information gap into decision consequences with
policy_regret_bound().
Maintainer: Deniz Akdemir [email protected]
Akdemir, D. (2026). Constraints on Causal Inference as Experiment Comparison: A Framework for Identification, Transportability, and Policy Learning. DOI: 10.5281/zenodo.18367347
Le Cam, L., & Yang, G. L. (2000). Asymptotics in Statistics: Some Basic Concepts. Springer.
Useful links:
Report bugs at https://github.com/denizakdemir/causaldef/issues
Automatically scans a dataset for causal validity issues using
negative control diagnostics (manuscript thm:nc_bound). Identifies potential
confounders, instruments, and negative control variables.
audit_data( data, treatment, outcome, covariates = NULL, negative_controls = NULL, alpha = 0.05, verbose = TRUE )audit_data( data, treatment, outcome, covariates = NULL, negative_controls = NULL, alpha = 0.05, verbose = TRUE )
data |
A data.frame containing the analysis data |
treatment |
Character: name of the treatment variable |
outcome |
Character: name of the outcome variable |
covariates |
Character vector: specific covariates to audit (NULL = all non-treatment/outcome columns) |
negative_controls |
Character vector: known negative control outcomes to validate |
alpha |
Numeric: significance level for tests (default 0.05) |
verbose |
Logical: print progress messages |
The auditor tests each variable for:
Independence from treatment (correlation test)
Correlation with outcome
Based on these tests, variables are classified as:
Correlates with treatment but NOT outcome
Correlates with BOTH treatment and outcome
No significant correlations
If negative_controls are specified, they are validated to ensure they do not show spurious treatment effects (negative control logic).
Object of class "data_audit_report" containing:
issues: data.frame with columns (variable, issue_type, p_value, recommendation)
recommendations: character vector of action items
summary_stats: list with n_vars_audited, n_issues, etc.
Akdemir, D. (2026). Constraints on Causal Inference as Experiment Comparison.
DOI: 10.5281/zenodo.18367347. See thm:nc_bound (Negative Control Sensitivity Bound).
nc_diagnostic(), causal_spec()
# Create sample data with known structure n <- 300 U <- rnorm(n) W <- U + rnorm(n, sd = 0.5) # Confounder Z <- rnorm(n) # Potential instrument A <- rbinom(n, 1, plogis(0.5 * Z + 0.3 * U)) Y <- 2 * A + 1.5 * U + rnorm(n) df <- data.frame(W = W, Z = Z, A = A, Y = Y) # Run audit report <- audit_data(df, treatment = "A", outcome = "Y") print(report)# Create sample data with known structure n <- 300 U <- rnorm(n) W <- U + rnorm(n, sd = 0.5) # Confounder Z <- rnorm(n) # Potential instrument A <- rbinom(n, 1, plogis(0.5 * Z + 0.3 * U)) Y <- 2 * A + 1.5 * U + rnorm(n) df <- data.frame(W = W, Z = Z, A = A, Y = Y) # Run audit report <- audit_data(df, treatment = "A", outcome = "Y") print(report)
Defines the causal inference problem including treatment, outcome, covariates, and optional diagnostic variables.
causal_spec( data, treatment, outcome, covariates = NULL, negative_control = NULL, instrument = NULL, estimand = c("ATE", "ATT", "ATC"), outcome_type = c("continuous", "binary", "count"), na.action = na.omit )causal_spec( data, treatment, outcome, covariates = NULL, negative_control = NULL, instrument = NULL, estimand = c("ATE", "ATT", "ATC"), outcome_type = c("continuous", "binary", "count"), na.action = na.omit )
data |
A data.frame or data.table containing the analysis data |
treatment |
Character: name of the treatment variable |
outcome |
Character: name of the outcome variable |
covariates |
Character vector: names of adjustment covariates (NULL for none) |
negative_control |
Character: name of negative control outcome (optional) |
instrument |
Character: name of instrumental variable (optional) |
estimand |
Character: target estimand, one of "ATE", "ATT", or "ATC" |
outcome_type |
Character: type of outcome, one of "continuous", "binary", "count" |
na.action |
Function: how to handle missing values (default: na.omit) |
Object of class "causal_spec" containing:
data: the analysis dataset after applying na.action to the
required variables
treatment: name of the treatment variable
outcome: name of the outcome variable
covariates: adjustment covariates used to define the causal problem
negative_control: optional negative control outcome name
instrument: optional instrument name
estimand: target causal estimand
outcome_type: declared outcome scale
treatment_type: inferred treatment scale ("binary",
"categorical", or "continuous")
n: retained sample size after preprocessing
This object is the package's core problem specification: it records which columns define the observational causal study and is consumed by downstream estimators and diagnostics.
Note: while causal_spec() can describe categorical or continuous
treatments, the current downstream estimation functions
(estimate_deficiency(), estimate_effect()) require a binary treatment.
estimate_deficiency(), nc_diagnostic(), policy_regret_bound()
# Create sample data n <- 200 W <- rnorm(n) A <- rbinom(n, 1, plogis(0.5 * W)) Y <- 1 + 2 * A + W + rnorm(n) df <- data.frame(W = W, A = A, Y = Y) # Create causal specification spec <- causal_spec( data = df, treatment = "A", outcome = "Y", covariates = "W" ) print(spec)# Create sample data n <- 200 W <- rnorm(n) A <- rbinom(n, 1, plogis(0.5 * W)) Y <- 1 + 2 * A + W + rnorm(n) df <- data.frame(W = W, A = A, Y = Y) # Create causal specification spec <- causal_spec( data = df, treatment = "A", outcome = "Y", covariates = "W" ) print(spec)
Creates a specification object for time-to-event data with competing events. Extends standard survival analysis to handle multiple event types.
causal_spec_competing( data, treatment, time, event, covariates = NULL, event_of_interest = 1, horizon = NULL, estimand = c("cif", "cshr") )causal_spec_competing( data, treatment, time, event, covariates = NULL, event_of_interest = 1, horizon = NULL, estimand = c("cif", "cshr") )
data |
Data frame containing the variables |
treatment |
Character: name of the treatment variable |
time |
Character: name of the time-to-event variable |
event |
Character: name of the event indicator. Accepts either:
|
covariates |
Character vector: names of covariate variables |
event_of_interest |
Integer or character: which event type is the primary outcome (default 1).
If |
horizon |
Numeric: time horizon for cumulative incidence |
estimand |
Character: "cif" (cumulative incidence) or "cshr" (cause-specific HR) |
In competing risks, standard survival methods can be biased because:
Censoring the competing event underestimates cumulative incidence
Cause-specific hazards don't translate directly to probabilities
The Aalen-Johansen estimator handles this by modeling all transitions:
Object of class
c("causal_spec_competing", "causal_spec_survival", "causal_spec")
containing:
data: the analysis dataset, with the event column normalized to
integer codes (0 = censored, 1,2,... = event types)
treatment, time, event: names of the core analysis variables
covariates: adjustment covariates
event_of_interest: integer code of the primary event type
event_types: observed non-censor event codes
n_events: number of distinct non-censor event types
event_map: optional mapping from original factor/character labels
to integer event codes
event_levels: original non-censor event labels when the input
event variable was character or factor
horizon: target time horizon for cumulative incidence summaries
estimand: requested competing-risks estimand ("cif" or "cshr")
treatment_type: inferred treatment scale
n: retained sample size
outcome_type: fixed label "competing_risks"
This object records how the competing-risks problem is encoded and is the input consumed by competing-risks deficiency estimation.
estimate_deficiency(), causal_spec_survival()
# Simulate competing risks: death (1) vs transplant (2) set.seed(42) n <- 500 W <- rnorm(n) A <- rbinom(n, 1, plogis(0.3 * W)) # Event times rate_death <- exp(-0.5 * A + 0.2 * W) rate_transplant <- exp(0.3 * A - 0.1 * W) time_death <- rexp(n, rate_death) time_transplant <- rexp(n, rate_transplant) time_censor <- runif(n, 0, 3) observed_time <- pmin(time_death, time_transplant, time_censor) event <- ifelse(observed_time == time_death, 1, ifelse(observed_time == time_transplant, 2, 0)) df <- data.frame(W = W, A = A, time = observed_time, event = event) spec <- causal_spec_competing( df, "A", "time", "event", "W", event_of_interest = 1, horizon = 2 ) print(spec)# Simulate competing risks: death (1) vs transplant (2) set.seed(42) n <- 500 W <- rnorm(n) A <- rbinom(n, 1, plogis(0.3 * W)) # Event times rate_death <- exp(-0.5 * A + 0.2 * W) rate_transplant <- exp(0.3 * A - 0.1 * W) time_death <- rexp(n, rate_death) time_transplant <- rexp(n, rate_transplant) time_censor <- runif(n, 0, 3) observed_time <- pmin(time_death, time_transplant, time_censor) event <- ifelse(observed_time == time_death, 1, ifelse(observed_time == time_transplant, 2, 0)) df <- data.frame(W = W, A = A, time = observed_time, event = event) spec <- causal_spec_competing( df, "A", "time", "event", "W", event_of_interest = 1, horizon = 2 ) print(spec)
Defines a causal inference problem with time-to-event outcomes.
causal_spec_survival( data, treatment, time, event, covariates = NULL, competing_event = NULL, estimand = c("ATE", "RMST", "HR"), horizon = NULL, na.action = na.omit )causal_spec_survival( data, treatment, time, event, covariates = NULL, competing_event = NULL, estimand = c("ATE", "RMST", "HR"), horizon = NULL, na.action = na.omit )
data |
A data.frame containing survival data |
treatment |
Character: name of the treatment variable |
time |
Character: name of the time-to-event variable |
event |
Character: name of the event indicator (1 = event, 0 = censored).
If the column is a factor/character (e.g., |
covariates |
Character vector: names of adjustment covariates |
competing_event |
Character: name of competing event indicator (optional) |
estimand |
Character: target estimand ("ATE", "RMST", "HR") |
horizon |
Numeric: time horizon for RMST (required if estimand = "RMST") |
na.action |
Function: how to handle missing values |
For survival outcomes, the deficiency measures how well we can simulate the interventional survival distribution from observational data.
Object of class c("causal_spec_survival", "causal_spec")
containing:
data: the survival analysis dataset after applying na.action
and coercing the event indicators when needed
treatment: name of the treatment variable
time: name of the follow-up time variable
event: name of the primary event indicator
covariates: adjustment covariates
competing_event: optional competing-event indicator used by
survival workflows that model it separately
estimand: target survival estimand ("ATE", "RMST", or "HR")
horizon: RMST horizon when relevant
treatment_type: inferred treatment scale
n: retained sample size
n_events: number of observed primary events
max_time: maximum observed follow-up time
The returned object defines a time-to-event causal problem for downstream deficiency estimation and effect estimation.
Average treatment effect on survival probability at horizon
Restricted mean survival time difference
Hazard ratio (log scale)
causal_spec(), estimate_deficiency()
# Simulate survival data n <- 200 W <- rnorm(n) A <- rbinom(n, 1, plogis(0.5 * W)) time <- rexp(n, rate = exp(-0.5 * A + 0.3 * W)) event <- rbinom(n, 1, 0.8) df <- data.frame(W = W, A = A, time = time, event = event) spec <- causal_spec_survival( data = df, treatment = "A", time = "time", event = "event", covariates = "W", estimand = "RMST", horizon = 5 )# Simulate survival data n <- 200 W <- rnorm(n) A <- rbinom(n, 1, plogis(0.5 * W)) time <- rexp(n, rate = exp(-0.5 * A + 0.3 * W)) event <- rbinom(n, 1, 0.8) df <- data.frame(W = W, A = A, time = time, event = event) spec <- causal_spec_survival( data = df, treatment = "A", time = "time", event = "event", covariates = "W", estimand = "RMST", horizon = 5 )
Computes deficiency as a function of confounding strengths (, )
for linear Gaussian models.
confounding_frontier( spec = NULL, alpha_range = c(-2, 2), gamma_range = c(-2, 2), grid_size = 25, model = "gaussian" )confounding_frontier( spec = NULL, alpha_range = c(-2, 2), gamma_range = c(-2, 2), grid_size = 25, model = "gaussian" )
spec |
A causal_spec object (optional, for parameter estimation) |
alpha_range |
Numeric vector c(min, max): range of U->A confounding strength |
gamma_range |
Numeric vector c(min, max): range of U->Y confounding strength |
grid_size |
Integer: resolution per dimension |
model |
Character: model type ("gaussian" for closed-form) |
For the linear Gaussian structural causal model:
The manuscript's Confounding Lower Bound theorem (thm:confounding_lb) states the scaling of the causal deficiency:
for some universal constant (typically after rescaling so ).
This function computes a concrete, rigorous lower bound using the underlying
Le Cam two-point construction (i.e., between two
observationally indistinguishable parameterizations).
This creates a "confounding frontier" - the boundary where identification
transitions from possible () to impossible ().
Object of class "confounding_frontier" containing:
grid: Data frame with columns alpha, gamma, delta
frontier: Subset of grid where delta 0
model: Model type used
params: Parameters used in computation
Strength of confounding path U -> A
Strength of confounding path U -> Y
Product determines the confounding bias
When alpha=0 OR gamma=0, the confounder has no effect and .
Akdemir, D. (2026). Constraints on Causal Inference as Experiment Comparison.
DOI: 10.5281/zenodo.18367347. See thm:confounding_lb (Confounding Lower Bound).
estimate_deficiency(), policy_regret_bound()
# Basic frontier frontier <- confounding_frontier( alpha_range = c(-2, 2), gamma_range = c(-2, 2), grid_size = 25 ) # With data-based parameter estimation df <- data.frame(A = rnorm(100), Y = rnorm(100), W = rnorm(100)) spec <- causal_spec(df, "A", "Y", "W") frontier <- confounding_frontier(spec, grid_size = 50)# Basic frontier frontier <- confounding_frontier( alpha_range = c(-2, 2), gamma_range = c(-2, 2), grid_size = 25 ) # With data-based parameter estimation df <- data.frame(A = rnorm(100), Y = rnorm(100), W = rnorm(100)) spec <- causal_spec(df, "A", "Y", "W") frontier <- confounding_frontier(spec, grid_size = 50)
Generates a plumber API definition file for deploying causaldef as a REST service. This enables integration with web applications, dashboards, and other programming languages.
create_plumber_api(path = "causaldef_api", port = 8080)create_plumber_api(path = "causaldef_api", port = 8080)
path |
Directory to write the API files |
port |
Integer: default port for the API (default 8080) |
The API provides the following endpoints:
POST /deficiency: Estimate deficiency from uploaded data
POST /policy-bound: Calculate policy regret bounds
POST /frontier: Generate confounding frontier
GET /health: Health check endpoint
GET /methods: List available methods
Invisibly returns the path to the created files
## Not run: # Create API files create_plumber_api("my_api") # Run the API plumber::plumb("my_api/plumber.R")$run(port = 8080) ## End(Not run)## Not run: # Create API files create_plumber_api("my_api") # Run the API plumber::plumb("my_api/plumber.R")$run(port = 8080) ## End(Not run)
Generates app.R and global.R files for standalone deployment (e.g., shinyapps.io, Shiny Server)
create_shiny_app_files(path = "causaldef_app")create_shiny_app_files(path = "causaldef_app")
path |
Directory to write the app files |
Invisibly returns path, the directory where the generated
standalone app files were written. In the current implementation this is
the folder containing the generated app.R, which can then be deployed
with Shiny hosting tools.
Computes a computable proxy for the (population) Le Cam deficiency between observational and interventional regimes under various adjustment strategies.
estimate_deficiency( spec, methods = c("iptw", "aipw"), treatment_value = NULL, n_boot = 200, ci_level = 0.95, verbose = TRUE, parallel = FALSE )estimate_deficiency( spec, methods = c("iptw", "aipw"), treatment_value = NULL, n_boot = 200, ci_level = 0.95, verbose = TRUE, parallel = FALSE )
spec |
A causal_spec object |
methods |
Character vector: adjustment methods to compare
|
treatment_value |
Numeric: intervention value for do(A=a) |
n_boot |
Integer: bootstrap replicates for CI (0 = none) |
ci_level |
Numeric: confidence level (default 0.95) |
verbose |
Logical: show progress |
parallel |
Logical: use parallel processing for bootstrap (requires future.apply) |
The (population) Le Cam deficiency is defined as:
This quantity is not directly estimable nonparametrically from finite samples without additional structure. The current implementation returns a proxy based on the total variation distance between propensity-score distributions under:
the method-induced reweighting (pseudo-population), and
the unweighted target population.
This PS-TV proxy is an overlap/balance diagnostic: values near 0 indicate near-perfect balance under the estimated propensity model, while large values flag positivity/overlap issues.
Importantly, this is a proxy and should not be interpreted as the exact Le Cam deficiency unless supplemented by a result linking the proxy to the specific decision class of interest.
Object of class "deficiency" containing:
estimates: Named vector of deficiency proxy estimates per method
se: Standard errors (if n_boot > 0)
ci: Confidence intervals
method: Methods used
kernel: Fitted kernels for each method
metric: Identifier for the proxy metric used
causal_spec(), nc_diagnostic(), policy_regret_bound()
# Create sample data n <- 200 W <- rnorm(n) A <- rbinom(n, 1, plogis(0.5 * W)) Y <- 1 + 2 * A + W + rnorm(n) df <- data.frame(W = W, A = A, Y = Y) spec <- causal_spec(df, "A", "Y", "W") results <- estimate_deficiency(spec, methods = c("unadjusted", "iptw"), n_boot = 50) print(results)# Create sample data n <- 200 W <- rnorm(n) A <- rbinom(n, 1, plogis(0.5 * W)) Y <- 1 + 2 * A + W + rnorm(n) df <- data.frame(W = W, A = A, Y = Y) spec <- causal_spec(df, "A", "Y", "W") results <- estimate_deficiency(spec, methods = c("unadjusted", "iptw"), n_boot = 50) print(results)
Computes Le Cam deficiency for competing risks outcomes using cause-specific or subdistribution hazard approaches.
estimate_deficiency_competing( spec, method = c("cshr", "fg"), n_boot = 100, ci_level = 0.95 )estimate_deficiency_competing( spec, method = c("cshr", "fg"), n_boot = 100, ci_level = 0.95 )
spec |
A causal_spec_competing object |
method |
Character: "cshr" (cause-specific) or "fg" (Fine-Gray subdistribution) |
n_boot |
Integer: bootstrap replicates |
ci_level |
Numeric: confidence level |
Object of class c("deficiency_competing", "deficiency")
containing:
estimates: named vector with the competing-risks deficiency proxy
estimate
se: bootstrap standard error for that estimate
ci: bootstrap confidence interval matrix
method: the competing-risks estimation strategy used
cif: weighted and unweighted cumulative-incidence calculations by
event type
spec: the original causal_spec_competing object
kernel: fitted nuisance quantities used by the estimator,
including propensity scores and weights
The reported deficiency summarizes how different the weighted and unweighted cumulative-incidence behavior is for the event-of-interest analysis under the selected competing-risks approach.
Uses the fitted causal kernels (from estimate_deficiency) to estimate
the causal effect (ATE, RMST, etc.).
estimate_effect(object, ...) ## S3 method for class 'deficiency' estimate_effect( object, target_method = NULL, method = NULL, contrast = NULL, ... )estimate_effect(object, ...) ## S3 method for class 'deficiency' estimate_effect( object, target_method = NULL, method = NULL, contrast = NULL, ... )
object |
A |
... |
Additional arguments passed to specific estimators |
target_method |
Deprecated alias for |
method |
Character: which adjustment method to use. If |
contrast |
Vector: c(treated, control) values for contrast. Defaults to c(treatment_value, control_value). |
An object of class "causal_effect". Depending on the underlying
estimand, the object contains:
estimate: scalar causal contrast when a point effect is computed
mu1, mu0: adjusted outcome means for treated and control
groups in standard outcome settings
rmst1, rmst0: restricted mean survival times under the two
contrast levels for RMST analyses
fit: a fitted survival::survfit object when survival curves are
returned
horizon: RMST horizon when relevant
type: human-readable label for the returned estimand
method: adjustment method used to construct the effect estimate
contrast: treated and control values used in the comparison
The object represents the causal effect implied by the fitted kernel stored
in the deficiency object for the chosen method.
set.seed(42) n <- 200 W <- rnorm(n) A <- rbinom(n, 1, plogis(0.5 * W)) Y <- 1 + 2 * A + W + rnorm(n) df <- data.frame(W = W, A = A, Y = Y) spec <- causal_spec(df, "A", "Y", "W") def <- estimate_deficiency(spec, methods = "iptw") effect <- estimate_effect(def) print(effect)set.seed(42) n <- 200 W <- rnorm(n) A <- rbinom(n, 1, plogis(0.5 * W)) Y <- 1 + 2 * A + W + rnorm(n) df <- data.frame(W = W, A = A, Y = Y) spec <- causal_spec(df, "A", "Y", "W") def <- estimate_deficiency(spec, methods = "iptw") effect <- estimate_effect(def) print(effect)
Implements the front-door criterion for causal identification when a mediator blocks all paths from treatment to outcome while being unaffected by confounders.
frontdoor_effect( spec, mediator, method = c("plugin", "dr"), n_boot = 200, ci_level = 0.95 )frontdoor_effect( spec, mediator, method = c("plugin", "dr"), n_boot = 200, ci_level = 0.95 )
spec |
A causal_spec object with mediator specified |
mediator |
Character: name of the mediator variable M |
method |
Character: estimation method for the front-door formula
|
n_boot |
Integer: bootstrap replicates for standard errors (default 200) |
ci_level |
Numeric: confidence level (default 0.95) |
The front-door criterion (Pearl, 1995) identifies causal effects through a mediator M when:
A -> M is unconfounded (M intercepts all directed paths from A to Y)
M -> Y has no direct A effect except through M
There's no unblocked back-door path from M to Y
The front-door formula is:
When these assumptions hold, the manuscript's Front-Door Kernel Existence theorem
(thm:frontdoor) implies .
The current implementation estimates the front-door effect, but the reported
deficiency_proxy is not an estimator of the exact Le Cam deficiency.
It is a heuristic discrepancy score meant for exploratory diagnostics.
Object of class "frontdoor_effect" containing:
estimate: The front-door causal effect
se: Standard error (if bootstrap > 0)
ci: Confidence interval
deficiency_proxy: Heuristic proxy based on disagreement between the naive and front-door effect estimates
deficiency: Backward-compatible alias of deficiency_proxy
Pearl, J. (1995). Causal diagrams for empirical research. Biometrika.
Akdemir, D. (2026). Constraints on Causal Inference as Experiment Comparison.
See thm:frontdoor (Front-Door Kernel Existence).
estimate_deficiency(), causal_spec()
# Simulate front-door scenario n <- 500 U <- rnorm(n) # Unmeasured confounder A <- rbinom(n, 1, plogis(0.5 * U)) M <- 0.5 + 1.2 * A + rnorm(n, sd = 0.5) # Mediator (unconfounded by U) Y <- 1 + 0.8 * M + 0.5 * U + rnorm(n) # Outcome df <- data.frame(A = A, M = M, Y = Y) spec <- causal_spec(df, "A", "Y", covariates = NULL) fd_result <- frontdoor_effect(spec, mediator = "M") print(fd_result)# Simulate front-door scenario n <- 500 U <- rnorm(n) # Unmeasured confounder A <- rbinom(n, 1, plogis(0.5 * U)) M <- 0.5 + 1.2 * A + rnorm(n, sd = 0.5) # Mediator (unconfounded by U) Y <- 1 + 0.8 * M + 0.5 * U + rnorm(n) # Outcome df <- data.frame(A = A, M = M, Y = Y) spec <- causal_spec(df, "A", "Y", covariates = NULL) fd_result <- frontdoor_effect(spec, mediator = "M") print(fd_result)
Simulated high-throughput screening data with potential batch effects. Contains a negative control outcome ("housekeeping_gene").
data(gene_perturbation)data(gene_perturbation)
A data frame with 500 rows and 6 variables:
Sample identifier
Experimental batch (1-4)
Total read count
Intervention (Control vs Knockout)
Expression level of target gene (outcome)
Expression of unrelated housekeeping gene (negative control)
Used to demonstrate negative control diagnostics. The housekeeping gene shares batch effects and library size confounders with the target gene but is not biologically affected by the knockout.
Simulated survival data comparing conditioning intensities in HCT. Illustrates competing risks (Relapse vs Death).
data(hct_outcomes)data(hct_outcomes)
A data frame with 800 rows and 7 variables:
Patient ID
Patient age in years
Disease stage (Early, Intermediate, Advanced)
Karnofsky Performance Score (0-100)
HLA matching status
Treatment group (Myeloablative vs Reduced)
Time to first event or censoring
Type of event (Death, Relapse, Censored)
This dataset mimics a retrospective registry study where treatment assignment is confounded by patient health status (e.g., sicker patients get Reduced intensity).
Estimates causal effects using instrumental variables (IV) together with first-stage strength diagnostics.
iv_effect( spec, instrument = NULL, method = c("2sls", "wald", "liml"), n_boot = 200, ci_level = 0.95, weak_iv_threshold = 10 )iv_effect( spec, instrument = NULL, method = c("2sls", "wald", "liml"), n_boot = 200, ci_level = 0.95, weak_iv_threshold = 10 )
spec |
A causal_spec object with instrument specified |
instrument |
Character: name of the instrumental variable (or use spec$instrument) |
method |
Character: IV estimation method
|
n_boot |
Integer: bootstrap replicates for CI |
ci_level |
Numeric: confidence level |
weak_iv_threshold |
Numeric: F-statistic threshold for weak IV (default 10) |
Instrumental variables identify the Local Average Treatment Effect (LATE) for compliers when:
Relevance: Z affects A (testable via F-stat)
Exclusion: Z affects Y only through A (untestable)
Independence: Z is independent of unmeasured confounders
Weak instruments (small first-stage F-statistics) lead to unstable IV inference.
The current implementation reports a deficiency_proxy derived from instrument
strength; it should be interpreted as a screening diagnostic rather than an exact
Le Cam deficiency.
Object of class "iv_effect" containing:
estimate: The IV causal effect (LATE)
se: Standard error
ci: Confidence interval
f_stat: First-stage F-statistic
weak_iv: Logical, whether instrument is weak
deficiency_proxy: Heuristic proxy derived from first-stage strength
deficiency: Backward-compatible alias of deficiency_proxy
Angrist, J. D., Imbens, G. W., & Rubin, D. B. (1996). Identification of causal effects using instrumental variables. JASA.
causal_spec(), estimate_deficiency()
# Simulate IV setting set.seed(42) n <- 1000 U <- rnorm(n) # Unmeasured confounder Z <- rbinom(n, 1, 0.5) # Instrument (randomized encouragement) A <- 0.3 + 0.4 * Z + 0.3 * U + rnorm(n, sd = 0.3) # Treatment (continuous) A <- as.numeric(A > 0.5) # Dichotomize Y <- 1 + 2 * A + 0.8 * U + rnorm(n) # Outcome df <- data.frame(Z = Z, A = A, Y = Y) spec <- causal_spec(df, "A", "Y", instrument = "Z") iv_result <- iv_effect(spec) print(iv_result)# Simulate IV setting set.seed(42) n <- 1000 U <- rnorm(n) # Unmeasured confounder Z <- rbinom(n, 1, 0.5) # Instrument (randomized encouragement) A <- 0.3 + 0.4 * Z + 0.3 * U + rnorm(n, sd = 0.3) # Treatment (continuous) A <- as.numeric(A > 0.5) # Dichotomize Y <- 1 + 2 * A + 0.8 * U + rnorm(n) # Outcome df <- data.frame(Z = Z, A = A, Y = Y) spec <- causal_spec(df, "A", "Y", instrument = "Z") iv_result <- iv_effect(spec) print(iv_result)
Screens for residual association with a negative control outcome and computes
the corresponding -based sensitivity bound.
nc_diagnostic( spec, method = "iptw", kappa = NULL, kappa_range = NULL, alpha = 0.05, n_boot = 200 )nc_diagnostic( spec, method = "iptw", kappa = NULL, kappa_range = NULL, alpha = 0.05, n_boot = 200 )
spec |
A causal_spec with negative_control specified |
method |
Character: adjustment method to test |
kappa |
Numeric: alignment constant in the theoretical bound. If |
kappa_range |
Numeric vector: range of kappa values for sensitivity analysis. If provided, returns bounds for each kappa and adds class "nc_diagnostic_sensitivity". |
alpha |
Numeric: significance level for the screening test |
n_boot |
Integer: number of permutation draws used for the screening test |
The diagnostic uses the bound: , where
is the negative control deficiency.
A negative control outcome Y' is a variable that:
Shares confounders with Y (affected by U)
Is NOT affected by treatment A
If adjustment correctly removes confounding, the residual association between A and Y' should be zero. Non-zero association indicates confounding remains.
The current implementation separates two tasks:
a screening test based on a weighted correlation statistic and a permutation null distribution, and
a sensitivity bound of the form
based on the observed residual-association proxy delta_nc.
This keeps the observable screening step distinct from the theorem's
user-supplied alignment parameter kappa.
When kappa is unknown, use kappa_range to perform sensitivity analysis across plausible values. The resulting plot shows how the deficiency bound varies with the sensitivity parameter.
Object of class "nc_diagnostic" containing:
delta_nc: Observable residual-association proxy based on a weighted treatment/negative-control association
delta_bound: Upper bound on true deficiency (kappa * delta_nc)
falsified: Logical indicating if assumptions are falsified
p_value: Permutation p-value for the observed residual association
kappa: Alignment constant used
screening: List with the observed association statistic, permutation p-value, and null standard deviation
sensitivity: (if kappa_range provided) data.frame with bounds for each kappa
Akdemir, D. (2026). Constraints on Causal Inference as Experiment Comparison.
DOI: 10.5281/zenodo.18367347. See thm:nc_bound (Negative Control Sensitivity Bound).
Lipsitch, M., Tchetgen, E., & Cohen, T. (2010). Negative controls: A tool for detecting confounding and bias. Epidemiology, 21(3), 383-388.
causal_spec(), estimate_deficiency()
# Create data with negative control n <- 200 U <- rnorm(n) W <- U + rnorm(n, sd = 0.5) A <- rbinom(n, 1, plogis(0.5 * W)) # Binary treatment Y <- 1 + 2 * A + U + rnorm(n) Y_nc <- U + rnorm(n) # Shares U but no effect from A df <- data.frame(W = W, A = A, Y = Y, Y_nc = Y_nc) spec <- causal_spec(df, "A", "Y", "W", negative_control = "Y_nc") # Standard diagnostic nc_result <- nc_diagnostic(spec, method = "iptw", n_boot = 50) print(nc_result) # Sensitivity analysis nc_sens <- nc_diagnostic(spec, method = "iptw", kappa_range = seq(0.5, 2.0, by = 0.25), n_boot = 50) plot(nc_sens)# Create data with negative control n <- 200 U <- rnorm(n) W <- U + rnorm(n, sd = 0.5) A <- rbinom(n, 1, plogis(0.5 * W)) # Binary treatment Y <- 1 + 2 * A + U + rnorm(n) Y_nc <- U + rnorm(n) # Shares U but no effect from A df <- data.frame(W = W, A = A, Y = Y, Y_nc = Y_nc) spec <- causal_spec(df, "A", "Y", "W", negative_control = "Y_nc") # Standard diagnostic nc_result <- nc_diagnostic(spec, method = "iptw", n_boot = 50) print(nc_result) # Sensitivity analysis nc_sens <- nc_diagnostic(spec, method = "iptw", kappa_range = seq(0.5, 2.0, by = 0.25), n_boot = 50) plot(nc_sens)
The classical causal inference benchmark dataset constructed by Dehejia and Wahba (1999) from the original Lalonde (1986) study. This version combines the experimental NSW groups with the observational CPS and PSID comparison groups, enabling precise calculation of the "deficiency" between observational and experimental inference.
data(nsw_benchmark)data(nsw_benchmark)
A data frame with 2915 rows and 11 variables:
Treatment indicator (1 = Job Training, 0 = Control)
Age in years
Years of schooling
Indicator for Black race
Indicator for Hispanic race
Indicator for marital status
Indicator for no high school degree
Real earnings in 1974 (pre-treatment)
Real earnings in 1975 (pre-treatment)
Real earnings in 1978 (outcome)
Source of the observation: "nsw_treated", "nsw_control", "cps_control", or "psid_control".
This dataset allows you to verify causal methods by using the experimental subset ("nsw_treated" vs "nsw_control") as the ground truth, and comparing the results obtained by adjusting the observational subsets ("nsw_treated" vs "cps_control").
LaLonde, R. J. (1986). Evaluating the econometric evaluations of training programs with experimental data. The American Economic Review, 604-620.
Dehejia, R. H., & Wahba, S. (1999). Causal effects in nonexperimental studies: Reevaluating the evaluation of training programs. Journal of the American statistical Association, 94(448), 1053-1062.
Summarizes overlap/positivity issues via propensity score distribution and effective sample size under IPTW weights.
overlap_diagnostic(spec, trim = 0.01)overlap_diagnostic(spec, trim = 0.01)
spec |
A |
trim |
Trimming threshold in (0, 0.5). Values outside |
An object of class "overlap_diagnostic" containing:
trim: trimming threshold used to classify extreme propensity scores
n: sample size
extreme_n: number of observations with propensity scores outside
the acceptable overlap region
kept_n: number of observations remaining after trimming
ps_summary: selected propensity-score quantiles
ess_iptw: effective sample size under IPTW weights
These quantities summarize how much practical overlap is available for weighted causal estimation.
Converts a deficiency radius into a conservative identified interval
for bounded functionals under TV.
partial_id_set( estimate, delta, estimand = c("ate", "mean"), outcome_range = c(0, 1) )partial_id_set( estimate, delta, estimand = c("ate", "mean"), outcome_range = c(0, 1) )
estimate |
Point estimate of the functional (numeric). |
delta |
Deficiency radius |
estimand |
One of |
outcome_range |
Numeric vector c(min, max) bounding the outcome. |
For a mean functional with , TV control yields
.
For an ATE (difference of two means), the half-width doubles again.
An object of class "partial_id_set" containing:
estimand: whether the interval applies to a bounded mean or an ATE
estimate: the supplied point estimate
delta: the deficiency radius used to widen the estimate
outcome_range: assumed outcome bounds
half_width: worst-case expansion implied by delta
lower, upper: conservative identification limits
The interval describes all values consistent with the supplied estimate and the chosen total-variation deficiency radius.
Plot Causal Effect
## S3 method for class 'causal_effect' plot(x, ...)## S3 method for class 'causal_effect' plot(x, ...)
x |
A causal_effect object |
... |
Additional arguments passed to plot functions |
If x contains a fitted survival::survfit object, the function is
called for its side effect of drawing base-R survival curves and returns
NULL invisibly. Otherwise it returns a ggplot2 object showing the
point estimate for x$estimate and, when available, its confidence
interval.
Plot Confounding Frontier
## S3 method for class 'confounding_frontier' plot(x, type = c("heatmap", "contour"), threshold = c(0.05, 0.1, 0.2), ...)## S3 method for class 'confounding_frontier' plot(x, type = c("heatmap", "contour"), threshold = c(0.05, 0.1, 0.2), ...)
x |
A confounding_frontier object |
type |
Character: plot type ("heatmap", "contour") |
threshold |
Numeric vector: threshold values for contour lines |
... |
Additional arguments |
A ggplot2 object visualizing the confounding sensitivity surface
over the (alpha, gamma) grid. type = "heatmap" returns a tile plot of
x$grid$delta with optional contour lines and benchmark points, while
type = "contour" returns filled contours of the same surface.
Plot Deficiency Estimates
## S3 method for class 'deficiency' plot(x, type = c("bar", "forest"), ...)## S3 method for class 'deficiency' plot(x, type = c("bar", "forest"), ...)
x |
A deficiency object |
type |
Character: plot type ("bar", "forest") |
... |
Additional arguments passed to ggplot2 functions |
A ggplot2 object. With type = "bar", the plot shows the
method-specific deficiency estimates as bars with optional confidence
intervals and interpretation thresholds. With type = "forest", the plot
shows the same estimates horizontally with optional confidence intervals.
Visualizes how the deficiency bound varies across different values of
the sensitivity parameter kappa. This implements the sensitivity analysis
approach from the Negative Control Sensitivity Bound in the manuscript (thm:nc_bound).
## S3 method for class 'nc_diagnostic_sensitivity' plot(x, threshold = 0.05, ...)## S3 method for class 'nc_diagnostic_sensitivity' plot(x, threshold = 0.05, ...)
x |
An nc_diagnostic_sensitivity object (from nc_diagnostic with kappa_range) |
threshold |
Numeric: threshold line to add (default 0.05) |
... |
Additional arguments passed to plot functions |
If ggplot2 is available, a ggplot2 object showing how
delta_bound varies across kappa values, with an optional threshold
line. Otherwise the function is called for its side effect of producing a
base-R plot and returns NULL invisibly.
Plot Policy Regret Bound
## S3 method for class 'policy_bound' plot(x, type = c("decomposition", "safety_curve"), ...)## S3 method for class 'policy_bound' plot(x, type = c("decomposition", "safety_curve"), ...)
x |
A policy_bound object |
type |
Character: plot type |
... |
Additional arguments |
A ggplot2 object. With type = "decomposition", the plot
decomposes the policy-regret bound into observed regret, transfer penalty,
and minimax floor components. With type = "safety_curve", the plot shows
how the transfer penalty and minimax floor vary as a function of
deficiency, with fitted-method points overlaid when available.
Plot method for transport_deficiency
## S3 method for class 'transport_deficiency' plot(x, type = c("shift", "weights"), ...)## S3 method for class 'transport_deficiency' plot(x, type = c("shift", "weights"), ...)
x |
A transport_deficiency object |
type |
Character: type of plot ("shift" or "weights") |
... |
Additional arguments passed to plot |
A ggplot2 object. With type = "shift", the returned plot is a
horizontal bar chart of x$covariate_shift, where bar height is the
variable-specific shift metric and fill encodes the severity category.
With type = "weights", the returned plot is a histogram of
x$weights with a reference line at 1 and a subtitle summarizing the
effective sample size after weighting.
Given deficiency , computes worst-case bounds on policy regret.
The bound states:
policy_regret_bound( deficiency, utility_range = c(0, 1), obs_regret = NULL, policy_class = NULL, delta_mode = c("point", "upper"), method = NULL )policy_regret_bound( deficiency, utility_range = c(0, 1), obs_regret = NULL, policy_class = NULL, delta_mode = c("point", "upper"), method = NULL )
deficiency |
A deficiency object or numeric |
utility_range |
Numeric vector c(min, max) of utility bounds |
obs_regret |
Numeric: observed regret from policy on observational data (optional) |
policy_class |
Character: description of policy class (for reporting) |
delta_mode |
Character: how to extract
|
method |
Character: when |
For any policy learned from observational data, the regret bound is:
The "transfer penalty" quantifies worst-case regret inflation from
the information gap between observational and interventional experiments.
When a multi-method deficiency object is supplied, the safest workflow is to
pre-specify method before calling policy_regret_bound(). If method is omitted,
the current implementation falls back to the smallest available estimate (or upper
confidence limit), which is convenient for exploration but optimistic after
post-selection.
Separately, a minimax lower bound ("minimax floor") of holds for
bounded utilities (see manuscript theorem thm:safety_floor).
Object of class "policy_bound" containing:
regret_bound: Upper bound on interventional regret (if obs_regret provided)
safety_floor: Backward-compatible alias for transfer_penalty
transfer_penalty: Additive regret penalty = M*
minimax_floor: Minimax lower bound = (M/2)*
delta: Deficiency value used
delta_selection: How the value was selected
utility_range: Range of utility function
M: Utility range (max - min)
If (unobserved confounding exists), no algorithm can guarantee
zero regret. The deficiency is the "price of safety" for deploying
policies without randomized experimentation.
Akdemir, D. (2026). Constraints on Causal Inference as Experiment Comparison.
DOI: 10.5281/zenodo.18367347. See thm:policy_regret (Policy Regret Transfer)
and thm:safety_floor (Minimax Safety Floor).
estimate_deficiency(), confounding_frontier()
# From a deficiency estimate # From a deficiency estimate df <- data.frame(W=rnorm(100), A=rbinom(100,1,0.5), Y=rnorm(100)) spec <- causal_spec(df, "A", "Y", "W") def <- estimate_deficiency(spec, methods = "iptw", n_boot = 0) bound <- policy_regret_bound(def, utility_range = c(0, 1)) # From a numeric value bound <- policy_regret_bound( deficiency = 0.1, utility_range = c(0, 100), obs_regret = 5 ) print(bound)# From a deficiency estimate # From a deficiency estimate df <- data.frame(W=rnorm(100), A=rbinom(100,1,0.5), Y=rnorm(100)) spec <- causal_spec(df, "A", "Y", "W") def <- estimate_deficiency(spec, methods = "iptw", n_boot = 0) bound <- policy_regret_bound(def, utility_range = c(0, 1)) # From a numeric value bound <- policy_regret_bound( deficiency = 0.1, utility_range = c(0, 100), obs_regret = 5 ) print(bound)
Extends policy_regret_bound() with the explicit VC-complexity penalty from
Result 3: .
policy_regret_bound_vc( deficiency, vc_dim, n, xi = 0.05, utility_range = c(0, 1), obs_regret = NULL, delta_mode = c("point", "upper"), C = 2 )policy_regret_bound_vc( deficiency, vc_dim, n, xi = 0.05, utility_range = c(0, 1), obs_regret = NULL, delta_mode = c("point", "upper"), C = 2 )
deficiency |
A |
vc_dim |
VC dimension of the policy class. |
n |
Sample size used to learn the policy. |
xi |
Failure probability ( |
utility_range |
Numeric vector c(min, max) of utility bounds. |
obs_regret |
Optional observed regret under the observational regime. |
delta_mode |
Passed through to |
C |
Universal constant in the bound (default 2). |
An object of class policy_bound with additional field complexity_penalty.
Print method for causal_effect
## S3 method for class 'causal_effect' print(x, ...)## S3 method for class 'causal_effect' print(x, ...)
x |
A causal_effect object |
... |
Additional arguments (unused) |
Invisibly returns x, unchanged. The printed summary reports the
estimation method, estimand type, treatment contrast, scalar effect
estimate when available, and the horizon for RMST analyses.
Print method for causal_spec
## S3 method for class 'causal_spec' print(x, ...)## S3 method for class 'causal_spec' print(x, ...)
x |
A causal_spec object |
... |
Additional arguments (unused) |
Invisibly returns x, unchanged. The printed summary reports the
treatment and outcome variables with their inferred types, the adjustment
covariates, sample size, estimand, and any registered negative control or
instrument.
Print method for causal_spec_competing
## S3 method for class 'causal_spec_competing' print(x, ...)## S3 method for class 'causal_spec_competing' print(x, ...)
x |
A causal_spec_competing object |
... |
Additional arguments (unused) |
Invisibly returns x, unchanged. The printed output summarizes the
competing-risks specification: treatment, time and event variables,
event-of-interest coding, time horizon, estimand, optional event-label
mapping, covariates, sample size, and the observed event-count breakdown.
Print method for causal_spec_survival
## S3 method for class 'causal_spec_survival' print(x, ...)## S3 method for class 'causal_spec_survival' print(x, ...)
x |
A causal_spec_survival object |
... |
Additional arguments (unused) |
Invisibly returns x, unchanged. The printed summary reports the
treatment, follow-up time, event indicator, number of events, covariates,
sample size, estimand, and any specified horizon or competing event.
Print method for confounding_frontier
## S3 method for class 'confounding_frontier' print(x, ...)## S3 method for class 'confounding_frontier' print(x, ...)
x |
A confounding_frontier object |
... |
Additional arguments (unused) |
Invisibly returns x, unchanged. The printed summary reports the
sensitivity-grid dimensions, alpha and gamma ranges, model label, summary
statistics of the deficiency surface, and the proportion of the grid with
near-zero deficiency.
Print method for data_audit_report
## S3 method for class 'data_audit_report' print(x, ...)## S3 method for class 'data_audit_report' print(x, ...)
x |
A data_audit_report object |
... |
Additional arguments (unused) |
Invisibly returns x, unchanged. The printed report summarizes the
audited treatment/outcome pair, number of variables reviewed, number of
flagged issues, a table of non-safe findings, and the recommended next
actions.
Print method for deficiency
## S3 method for class 'deficiency' print(x, ...)## S3 method for class 'deficiency' print(x, ...)
x |
A deficiency object |
... |
Additional arguments (unused) |
Invisibly returns x, unchanged. The printed summary shows the
method-specific deficiency estimates, optional standard errors and
confidence intervals, qualitative traffic-light labels, any PS-TV proxy
note, and the method with the smallest reported deficiency.
Print method for frontdoor_effect
## S3 method for class 'frontdoor_effect' print(x, ...)## S3 method for class 'frontdoor_effect' print(x, ...)
x |
A frontdoor_effect object |
... |
Additional arguments (unused) |
Invisibly returns x, unchanged. The printed output reports the
mediator, estimation method, front-door effect estimate, optional
bootstrap standard error and confidence interval, and the heuristic
discrepancy proxy stored in x$deficiency_proxy.
Print method for iv_effect
## S3 method for class 'iv_effect' print(x, ...)## S3 method for class 'iv_effect' print(x, ...)
x |
An iv_effect object |
... |
Additional arguments (unused) |
Invisibly returns x, unchanged. The printed output reports the
instrument and estimation method, first-stage F-statistic, weak-IV flag,
IV effect estimate, optional bootstrap uncertainty, and the heuristic
strength proxy stored in x$deficiency_proxy.
Print method for nc_diagnostic
## S3 method for class 'nc_diagnostic' print(x, ...)## S3 method for class 'nc_diagnostic' print(x, ...)
x |
An nc_diagnostic object |
... |
Additional arguments (unused) |
Invisibly returns x, unchanged. The printed summary reports the
screening statistic, observed negative-control proxy delta_nc, the
translated deficiency bound delta_bound, the chosen kappa, the
permutation p-value, and whether the negative-control screen was rejected.
Print method for overlap_diagnostic
## S3 method for class 'overlap_diagnostic' print(x, ...)## S3 method for class 'overlap_diagnostic' print(x, ...)
x |
An overlap_diagnostic object |
... |
Additional arguments (unused) |
Invisibly returns x, unchanged. The printed summary reports the
sample size, trimming threshold, number of extreme propensity scores,
number retained after trimming, IPTW effective sample size, and propensity
score quantiles.
Print method for partial_id_set
## S3 method for class 'partial_id_set' print(x, ...)## S3 method for class 'partial_id_set' print(x, ...)
x |
A partial_id_set object |
... |
Additional arguments (unused) |
Invisibly returns x, unchanged. The printed summary reports the
target estimand, point estimate, chosen deficiency radius, resulting
half-width, and the conservative identification interval implied by that
radius.
Print method for policy_bound
## S3 method for class 'policy_bound' print(x, ...)## S3 method for class 'policy_bound' print(x, ...)
x |
A policy_bound object |
... |
Additional arguments (unused) |
Invisibly returns x, unchanged. The printed summary reports the
deficiency value used in the regret bound, how that value was selected,
the utility range, transfer-penalty and minimax components, any observed
regret supplied by the user, and the resulting interpretation on the
utility scale.
Print method for transport_deficiency
## S3 method for class 'transport_deficiency' print(x, ...)## S3 method for class 'transport_deficiency' print(x, ...)
x |
A transport_deficiency object |
... |
Additional arguments (unused) |
Invisibly returns x, unchanged. The printed summary reports the
transport-risk proxy, optional bootstrap uncertainty, effective sample
size under the transport weights, the per-variable covariate-shift table,
and the source versus transported ATE estimates.
Data from the SUPPORT study (Connors et al., 1996) examining the effectiveness of Right Heart Catheterization (RHC) in the management of critically ill patients. This dataset is a classic example of high-dimensional confounding in medical observational studies.
data(rhc)data(rhc)
A data frame with 5735 rows and 63 variables.
The original study found that RHC was associated with higher mortality, contradicting potential benefits. Careful adjustment for the rich set of covariates (indicating sickness severity) is required. This dataset serves as a testbed for sensitivity analysis and policy bounds.
Key variables include swang1 (treatment), dth30 (30-day mortality),
t3d30 (survival time up to 30 days), and aps1 (APACHE III score).
Connors, A. F., Speroff, T., Dawson, N. V., Thomas, C., Harrell, F. E., Wagner, D., ... & Goldman, L. (1996). The effectiveness of right heart catheterization in the initial care of critically ill patients. JAMA, 276(11), 889-897.
Computes the explicit rate bound stated in Result 1 (finite-sample RKHS rate).
rkhs_rate_bound(n, beta, d_w, eta, xi = 0.05)rkhs_rate_bound(n, beta, d_w, eta, xi = 0.05)
n |
Sample size. |
beta |
Smoothness exponent ( |
d_w |
Covariate dimension ( |
eta |
Positivity constant ( |
xi |
Failure probability ( |
Numeric rate bound (up to universal constants).
Convenience function to start the plumber API server
run_causaldef_api(port = 8080, swagger = TRUE, path = NULL)run_causaldef_api(port = 8080, swagger = TRUE, path = NULL)
port |
Integer: port number (default 8080) |
swagger |
Logical: enable Swagger UI (default TRUE) |
path |
Character: path to API files (will create if missing) |
This function is called primarily for its side effect of starting a
blocking Plumber API server on port. No stable structured return value is
guaranteed by causaldef; when control returns to R, the function returns
whatever value is produced by plumber::plumb(api_file)$run(...).
Launches an interactive Shiny application for deficiency analysis. Provides a point-and-click interface for the full causaldef workflow.
run_causaldef_app(data = NULL, port = 3838, launch.browser = TRUE)run_causaldef_app(data = NULL, port = 3838, launch.browser = TRUE)
data |
Optional data frame to preload |
port |
Integer: port for the Shiny server (default 3838) |
launch.browser |
Logical: open in browser? (default TRUE) |
The dashboard provides:
Data upload and variable selection
Method comparison (unadjusted, IPTW, AIPW, etc.)
Interactive visualization of deficiency estimates
Confounding frontier explorer
Policy regret calculator
Exportable reports
Invisibly returns the Shiny app object
## Not run: # Launch with example data df <- data.frame( W = rnorm(200), A = rbinom(200, 1, 0.5), Y = rnorm(200) ) run_causaldef_app(data = df) ## End(Not run)## Not run: # Launch with example data df <- data.frame( W = rnorm(200), A = rbinom(200, 1, 0.5), Y = rnorm(200) ) run_causaldef_app(data = df) ## End(Not run)
Computes matching lower/upper bounds (sharpness) in the linear Gaussian model,
for either TV-based (two-point construction underlying thm:confounding_lb) or Wasserstein-1-based
(Result 6 two-point construction) deficiencies.
sharp_lower_bound( alpha, gamma, sigma_A = 1, sigma_Y = 1, a = 1, metric = c("tv", "wasserstein") )sharp_lower_bound( alpha, gamma, sigma_A = 1, sigma_Y = 1, a = 1, metric = c("tv", "wasserstein") )
alpha |
Confounding strength U -> A. |
gamma |
Confounding strength U -> Y. |
sigma_A |
Treatment noise standard deviation. |
sigma_Y |
Outcome noise standard deviation (TV case). |
a |
Intervention level. |
metric |
One of |
List with fields: lower, upper, ratio, and metric.
Summary method for iv_effect
## S3 method for class 'iv_effect' summary(object, ...)## S3 method for class 'iv_effect' summary(object, ...)
object |
An iv_effect object |
... |
Additional arguments (unused) |
Invisibly returns object, unchanged. The printed summary includes
the instrument, estimation method, sample size, the coefficient table from
the first-stage regression stored in object$first_stage, the IV effect
estimate with optional bootstrap uncertainty, and the weak-instrument
diagnostics.
Performs diagnostic tests for instrumental variable assumptions
test_instrument(spec, instrument = NULL, overid_test = TRUE)test_instrument(spec, instrument = NULL, overid_test = TRUE)
spec |
A causal_spec with instrument |
instrument |
Instrument name |
overid_test |
Logical: perform overidentification test if multiple instruments |
A named list of IV screening diagnostics with components such as:
relevance: first-stage F-statistic, its p-value, and a logical
pass/fail flag relative to the weak-IV threshold of 10
balance: when covariates are present, per-covariate balance
p-values and an overall pass/fail flag for instrument-covariate balance
reduced_form: coefficient, standard error, p-value, and
significance flag for the reduced-form association of the instrument with
the outcome
These outputs are diagnostics for testable parts of the IV design. They do not by themselves prove the exclusion restriction or full instrument validity.
Computes a transport diagnostic for extrapolating causal effects from a source population to a target population.
transport_deficiency( source_spec, target_data, transport_vars = NULL, method = c("iptw", "calibration", "sace"), n_boot = 100, ci_level = 0.95 )transport_deficiency( source_spec, target_data, transport_vars = NULL, method = c("iptw", "calibration", "sace"), n_boot = 100, ci_level = 0.95 )
source_spec |
A causal_spec object for the source (training) population |
target_data |
Data frame: the target population (may have fewer variables) |
transport_vars |
Character vector: variables that may shift between populations |
method |
Character: method for transport estimation
|
n_boot |
Integer: bootstrap replicates (default 100) |
ci_level |
Numeric: confidence level (default 0.95) |
A transport problem can be framed through deficiency, but the current release does not estimate an exact transport deficiency. Instead, it returns a heuristic transport-risk proxy that combines observable covariate shift, transport-weight instability, and source/target sample-size imbalance.
Key assumptions for valid transport:
Treatment effects are identifiable in source
Effect homogeneity or sufficient transport variables
Positivity: overlap between source and target covariate distributions
Object of class "transport_deficiency" containing:
delta_transport_proxy: Heuristic transport-risk proxy combining covariate shift, weight instability, and sample-size imbalance
delta_transport: Backward-compatible alias of delta_transport_proxy
covariate_shift: Per-variable shift diagnostics
weights: Transport weights for source observations
ate_source: ATE estimate in source population
ate_target: Transported ATE estimate for target
estimate_deficiency(), policy_regret_bound()
# Source population (RCT) set.seed(42) n_source <- 500 age_s <- rnorm(n_source, 50, 10) A_s <- rbinom(n_source, 1, 0.5) # Randomized Y_s <- 10 + 2 * A_s - 0.1 * age_s + rnorm(n_source) source_df <- data.frame(age = age_s, A = A_s, Y = Y_s, S = 1) # Target population (different age distribution) n_target <- 300 age_t <- rnorm(n_target, 65, 8) # Older population target_df <- data.frame(age = age_t, S = 0) source_spec <- causal_spec(source_df, "A", "Y", "age") transport <- transport_deficiency( source_spec, target_data = target_df, transport_vars = "age" ) print(transport)# Source population (RCT) set.seed(42) n_source <- 500 age_s <- rnorm(n_source, 50, 10) A_s <- rbinom(n_source, 1, 0.5) # Randomized Y_s <- 10 + 2 * A_s - 0.1 * age_s + rnorm(n_source) source_df <- data.frame(age = age_s, A = A_s, Y = Y_s, S = 1) # Target population (different age distribution) n_target <- 300 age_t <- rnorm(n_target, 65, 8) # Older population target_df <- data.frame(age = age_t, S = 0) source_spec <- causal_spec(source_df, "A", "Y", "age") transport <- transport_deficiency( source_spec, target_data = target_df, transport_vars = "age" ) print(transport)
Validates a causal_spec object
validate_causal_spec(x)validate_causal_spec(x)
x |
A causal_spec object |
The validated object (invisibly)
Closed-form Wasserstein-1 deficiency for the two-point construction in Result 6:
.
wasserstein_deficiency_gaussian(alpha, gamma, sigma_A = 1, a = 1)wasserstein_deficiency_gaussian(alpha, gamma, sigma_A = 1, a = 1)
alpha |
Confounding strength U -> A. |
gamma |
Confounding strength U -> Y. |
sigma_A |
Treatment noise standard deviation. |
a |
Intervention level |
Numeric deficiency value.