Propensity Score Matching (PSM)
Propensity Score Matching (PSM)
Definition
Core Statement
Propensity Score Matching (PSM) is a quasi-experimental method that creates comparable treatment and control groups from observational data by matching units with similar propensity scores (the probability of receiving treatment given observed covariates). It mimics a randomized experiment by balancing confounders.
Purpose
- Estimate Average Treatment Effect on the Treated (ATT) from observational data.
- Reduce selection bias when treatment assignment is not random.
- Create balanced groups for causal inference.
When to Use
Use PSM When...
- Treatment assignment is not random (observational study).
- You have data on covariates that predict treatment selection.
- You want to estimate a causal effect without a natural experiment.
Limitations
- Cannot address unobserved confounders. If unmeasured variables affect both treatment and outcome, PSM fails.
Theoretical Background
The Propensity Score
Key Insight: Instead of matching on many covariates (curse of dimensionality), match on a single summary: the propensity score.
Matching Procedure
- Estimate Scores: Fit Binary Logistic Regression with Treatment as outcome, covariates as predictors.
- Match: Pair treated units with control units having similar scores.
- Check Balance: Verify covariates are balanced after matching. (Standardized Mean Difference < 0.1).
- Estimate Effect: Compare outcomes between matched treated and control groups.
Assumptions
Critical Assumptions
- Conditional Independence Assumption (CIA): Given covariates
, treatment assignment is independent of potential outcomes. (No unobserved confounders). - Common Support (Overlap): For every treated unit, there exists a control with a similar propensity score.
Assumptions Checklist
Limitations
Pitfalls
- Unobserved Confounders: If a key variable is missing, estimates are biased.
- Overlap Violations: If treated and control have very different characteristics, matching is impossible.
- Sensitivity to Model: Propensity score model misspecification can bias results.
Python Implementation
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import NearestNeighbors
import pandas as pd
import numpy as np
# 1. Estimate Propensity Scores
logit = LogisticRegression(max_iter=1000)
logit.fit(df[['Age', 'Income', 'Health']], df['Treated'])
df['ps'] = logit.predict_proba(df[['Age', 'Income', 'Health']])[:, 1]
# 2. Nearest Neighbor Matching
treated = df[df['Treated'] == 1]
control = df[df['Treated'] == 0]
nn = NearestNeighbors(n_neighbors=1).fit(control[['ps']])
distances, indices = nn.kneighbors(treated[['ps']])
matched_control = control.iloc[indices.flatten()].reset_index(drop=True)
matched_treated = treated.reset_index(drop=True)
# 3. Check Balance
print("Treated Mean Age:", matched_treated['Age'].mean())
print("Control Mean Age:", matched_control['Age'].mean())
# 4. Estimate ATT
att = matched_treated['Outcome'].mean() - matched_control['Outcome'].mean()
print(f"ATT: {att:.3f}")
R Implementation
library(MatchIt)
# 1. Matching
m_out <- matchit(Treated ~ Age + Income + Health, data = df,
method = "nearest", distance = "glm")
# 2. Check Balance
summary(m_out)
# 3. Get Matched Data
matched <- match.data(m_out)
# 4. Estimate Effect (Regression on Matched Data)
model <- lm(Outcome ~ Treated + Age + Income + Health, data = matched)
summary(model)
Interpretation Guide
| Output | Interpretation |
|---|---|
| Standardized Mean Diff < 0.1 | Good balance after matching. |
| ATT = 5.2 | On average, treated units have outcomes 5.2 units higher than matched controls. |
| Poor overlap (no matches) | Treated and control too different. Results unreliable. |
Related Concepts
- Binary Logistic Regression - Estimates propensity score.
- Instrumental Variables (IV) - Alternative for endogeneity.
- Difference-in-Differences (DiD)