Instrumental Variables (IV)

Instrumental Variables (IV)

Definition

Core Statement

Instrumental Variables (IV) is a method to estimate causal effects when there is endogeneity (correlation between a predictor and the error term) due to omitted variables, measurement error, or simultaneity. It uses a third variable (instrument) that affects the outcome only through the endogenous predictor.


Purpose

  1. Estimate causal relationships in observational data when experiments are impossible.
  2. Address omitted variable bias and reverse causality.
  3. Isolate the "clean" variation in the endogenous variable.

When to Use

Use IV When...

  • You suspect endogeneity (predictor is correlated with error).
  • You have a valid instrument.
  • OLS would give biased and inconsistent estimates.

Limitations

  • Finding a valid instrument is extremely difficult.
  • Weak instruments lead to biased IV estimates (worse than OLS).


Theoretical Background

The Problem

In the model Y=βX+u, if Cov(X,u)0, OLS is biased.

Example: Effect of Education on Wages.

The Solution: Instrument (Z)

An instrument Z must satisfy:

  1. Relevance: Z is correlated with X. (Cov(Z,X)0).
  2. Exclusion Restriction: Z affects Y only through X. (Cov(Z,u)=0).

Example Instrument: Distance to nearest college.

Two-Stage Least Squares (2SLS)

  1. Stage 1: Regress X on Z (and controls). Obtain X^.
  2. Stage 2: Regress Y on X^. Coefficient is the IV estimate.

Assumptions


Limitations

Pitfalls

  1. Weak Instruments: If F-stat < 10, IV estimate is biased and unreliable. Can be worse than biased OLS.
  2. Exclusion Restriction is Untestable: You must justify it theoretically.
  3. Local Average Treatment Effect (LATE): IV estimates the effect for compliers (those whose X is affected by Z), not the entire population.


Python Implementation

from linearmodels.iv import IV2SLS
import pandas as pd

# Data: Y = Wage, X = Education (Endog), Z = Distance (Instrument), W = Experience (Control)
# Formula: dependent ~ exogenous + [endogenous ~ instruments]
model = IV2SLS.from_formula('Wage ~ 1 + Experience + [Education ~ Distance]', data=df)
results = model.fit()
print(results.summary)

# First-Stage Diagnostics
print(results.first_stage.diagnostics)
# Check F-statistic > 10

R Implementation

library(AER)

# Formula: Y ~ Exog + Endog | Exog + Instruments
model <- ivreg(Wage ~ Experience + Education | Experience + Distance, data = df)

summary(model, diagnostics = TRUE)

# Diagnostics:
# - Weak Instruments: F-stat should be > 10.
# - Wu-Hausman: Tests if OLS is consistent. If significant, IV is needed.

Interpretation Guide

Output Interpretation
IV Coef (Education) = 0.08 Each additional year of education increases wages by 8%, controlling for endogeneity.
First-Stage F = 45 Strong instrument. IV estimate is reliable.
First-Stage F = 3 Weak instrument. IV estimate is biased. Do not trust.
Wu-Hausman p < 0.05 OLS was biased. IV is necessary.