Regression Discontinuity Design (RDD)
Regression Discontinuity Design (RDD)
Overview
Definition
RDD is a method that estimates causal effects by exploiting a precise cutoff (threshold) rule for treatment assignment.
- Example: Scholarship given if GPA > 3.50.
- We compare students with 3.49 (Control) vs 3.51 (Treated). They are virtually identical, so any jump in outcome is causal.
1. Types of RDD
- Sharp RDD: Probability of treatment jumps from 0 to 1 at the cutoff. (Deterministic).
- Fuzzy RDD: Probability of treatment jumps, but not perfectly. (Probabilistic). Use IV methods.
2. Assumptions Checklist
3. Python Implementation
import statsmodels.formula.api as smf
# 1. Center the Running Variable (Distance from cutoff)
df['dist'] = df['score'] - cutoff
# 2. Run Regression with Interaction
# We allow different slopes on either side of cutoff (score * treated)
model = smf.ols("outcome ~ dist * treated", data=df).fit()
print(model.summary())
# The coefficient for 'treated' is the Jump (Local Average Treatment Effect)
4. R Implementation
library(rdrobust)
# y: outcome
# x: running variable
# c: cutoff
rd_model <- rdrobust(y = df$outcome, x = df$score, c = 3.5)
summary(rd_model)
# Plotting the Discontinuity
rdplot(y = df$outcome, x = df$score, c = 3.5,
title = "Regression Discontinuity Plot",
y.label = "Outcome",
x.label = "Score")
5. Related Concepts
- Instrumental Variables (IV) - Used for Fuzzy RDD.
- Causal Inference
- Local Linear Regression