Difference-in-Differences (DiD)

Definition

Core Statement

Difference-in-Differences (DiD) is a quasi-experimental design that estimates causal effects by comparing the change in outcomes over time between a treatment group and a control group. The key insight is that the control group's change captures the counterfactual trend.

Purpose

Estimate the causal effect of a policy or intervention.
Control for both time-invariant group differences and common time trends.

When to Use

Use DiD When...

You have panel data (observations over time for both groups).
A treatment is applied to one group but not another at a specific point in time.
Parallel trends assumption is plausible.

Theoretical Background

The Core Logic

Simple comparisons are biased:

Before vs After (Treatment Group): Ignores natural time trends.
Treatment vs Control (Post-Period): Ignores baseline differences.

DiD removes both biases:

{\hat{δ}}_{D i D} = ({\bar{Y}}_{T, p o s t} - {\bar{Y}}_{T, p r e}) - ({\bar{Y}}_{C, p o s t} - {\bar{Y}}_{C, p r e})

The Regression Model

Y_{i t} = α + β_{1} \cdot T r e a t_{i} + β_{2} \cdot P o s t_{t} + β_{3} \cdot (T r e a t_{i} \times P o s t_{t}) + ε_{i t}

Coefficient	Interpretation
$β_{1}$	Baseline difference between groups.
$β_{2}$	Time trend common to both groups.
$β_{3}$	The DiD Estimator. Causal effect of treatment.

Parallel Trends Assumption

Critical Assumption

In the absence of treatment, the treatment and control groups would have followed the same trend.

This cannot be tested directly for the post-period, but can be assessed in the pre-period:

If trends diverge before treatment, DiD is invalid.

Assumptions Checklist

Parallel Trends: Pre-treatment trends are the same for both groups.
No Spillover: Treatment only affects the treated group.
Common Shocks: Both groups are affected equally by external events.
Stable Unit Treatment Value Assumption (SUTVA).

Limitations

Pitfalls

Parallel Trends Violation: If trends differ pre-treatment, the estimate is biased.
Anticipation Effects: If treatment is anticipated, behavior may change before implementation.
Simultaneous Events: Other events coinciding with treatment can confound results.

Python Implementation

import statsmodels.formula.api as smf

# Data: Outcome, Treat (0/1), Post (0/1)
model = smf.ols("Outcome ~ Treat * Post", data=df).fit()
print(model.summary())

# The coefficient on 'Treat:Post' is the DiD estimate.

# Visualization: Check Parallel Trends
import matplotlib.pyplot as plt
df_grouped = df.groupby(['Time', 'Treat'])['Outcome'].mean().unstack()
df_grouped.plot(marker='o')
plt.axvline(x=treatment_time, linestyle='--', color='grey')
plt.title("Parallel Trends Check")
plt.show()

R Implementation

# DiD Regression
model <- lm(Outcome ~ Treat * Post, data = df)
summary(model)

# The 'Treat:Post' coefficient is the DiD effect.

# Visualize Parallel Trends
library(ggplot2)
ggplot(df, aes(x = Time, y = Outcome, color = factor(Treat))) +
  stat_summary(fun = mean, geom = "line") +
  geom_vline(xintercept = treatment_time, linetype = "dashed") +
  labs(title = "Parallel Trends Visualization")

Interpretation Guide

Output	Interpretation
$β_{3}$ = 5.0, p < 0.05	The treatment caused a 5-unit increase in the outcome.
Pre-trends diverge	Parallel trends assumption violated. DiD estimate is biased.