StaggeredDifferenceInDifferences#
- class causalpy.experiments.staggered_did.StaggeredDifferenceInDifferences[source]#
A class to analyse data from staggered adoption Difference-in-Differences settings.
This class implements the Borusyak, Jaravel, and Spiess (BJS, 2024) imputation estimator for staggered adoption settings. It fits a model on untreated observations only (pre-treatment periods for eventually-treated units plus all periods for never-treated units), then predicts counterfactual outcomes for all observations. Treatment effects are computed as the difference between observed and predicted outcomes for treated observations.
Assumptions#
This estimator requires the following identifying assumptions:
Absorbing treatment: Once a unit receives treatment, it must remain treated in all subsequent periods. Treatment cannot be reversed or temporarily suspended. This is validated at runtime.
Parallel trends: In the absence of treatment, treated and control units would have followed parallel outcome trajectories.
No anticipation: Units do not change their behavior in anticipation of future treatment.
- type data:
- param data:
A pandas dataframe with panel data (unit x time observations).
- type data:
pd.DataFrame
- type formula:
- param formula:
A statistical model formula. Recommended: “y ~ 1 + C(unit) + C(time)” for unit and time fixed effects.
- type formula:
str
- type unit_variable_name:
- param unit_variable_name:
Name of the column identifying units.
- type unit_variable_name:
str
- type time_variable_name:
- param time_variable_name:
Name of the column identifying time periods.
- type time_variable_name:
str
- type treated_variable_name:
- param treated_variable_name:
Name of the column indicating treatment status (0/1). Defaults to “treated”.
- type treated_variable_name:
str, optional
- type treatment_time_variable_name:
- param treatment_time_variable_name:
Name of the column containing unit-level treatment time (G_i). If None, treatment time is inferred from the treated_variable_name column.
- type treatment_time_variable_name:
str, optional
- type never_treated_value:
- param never_treated_value:
Value indicating never-treated units in treatment_time column. Defaults to np.inf.
- type never_treated_value:
Any, optional
- type model:
- param model:
A model for the untreated outcome. Defaults to LinearRegression.
- type model:
PyMCModel or RegressorMixin, optional
- type event_window:
- param event_window:
Tuple (min_event_time, max_event_time) to restrict event-time aggregation. If None, uses all available event-times.
- type event_window:
tuple[int, int], optional
- type reference_event_time:
- param reference_event_time:
Event-time index associated with plots (reserved for future use). Defaults to -1.
- type reference_event_time:
int, optional
- data_#
Augmented data with G (treatment time), event_time, y_hat0 (counterfactual), and tau_hat (treatment effect) columns.
- Type:
pd.DataFrame
- att_group_time_#
Group-time ATT estimates: ATT(g, t) for each cohort g and calendar time t.
- Type:
pd.DataFrame
- att_event_time_#
Event-time ATT estimates: ATT(e) for each event-time e = t - G.
- Type:
pd.DataFrame
Notes
Panel Balance: This implementation supports both balanced and unbalanced panel data. While balanced panels (where each unit is observed in every time period) are common in staggered DiD applications, the imputation-based approach of Borusyak et al. (2024) can accommodate unbalanced panels. The key requirement is that treatment timing is well-defined for each unit, not that all units are observed in all periods. Unit and observation counts in the summary output are computed without assuming balanced panels.
Example
>>> import causalpy as cp >>> from causalpy.data.simulate_data import generate_staggered_did_data >>> df = generate_staggered_did_data(n_units=30, n_time_periods=15, seed=42) >>> result = cp.StaggeredDifferenceInDifferences( ... df, ... formula="y ~ 1 + C(unit) + C(time)", ... unit_variable_name="unit", ... time_variable_name="time", ... treated_variable_name="treated", ... treatment_time_variable_name="treatment_time", ... model=cp.pymc_models.LinearRegression( ... sample_kwargs={ ... "tune": 100, ... "draws": 200, ... "chains": 2, ... "progressbar": False, ... } ... ), ... )
References
Borusyak, K., Jaravel, X., & Spiess, J. (2024). Revisiting Event Study Designs: Robust and Efficient Estimation. Review of Economic Studies.
Methods
Run the experiment algorithm: fit model, predict counterfactuals, and aggregate effects.
Generate a decision-ready summary of causal effects for Staggered Difference-in-Differences.
StaggeredDifferenceInDifferences.fit(*args, ...)Recover the data of an experiment along with the prediction and causal impact information.
StaggeredDifferenceInDifferences.get_plot_data_bayesian([...])Get plotting data for Bayesian model.
Get plotting data for OLS model.
Validate the input data and parameters.
Plot the model.
Ask the model to print its coefficients.
Print summary of main results.
Attributes
idataReturn the InferenceData object of the model.
supports_bayessupports_olslabels- __init__(data, formula, unit_variable_name, time_variable_name, treated_variable_name='treated', treatment_time_variable_name=None, never_treated_value=inf, model=None, event_window=None, reference_event_time=-1, **kwargs)[source]#
- Parameters:
- Return type:
None
- classmethod __new__(*args, **kwargs)#