Auto-Correlation (ACF & PACF)

Auto-Correlation (ACF & PACF)

Definition

Core Statement

ACF (Auto-Correlation Function) measures the correlation between a time series and its lagged values (e.g., Yt vs Yt1).
PACF (Partial Auto-Correlation Function) measures the correlation between Yt and Ytk after removing the effects of the intermediate lags (Yt1Ytk+1).

These are the primary tools for identifying the order (p, q) of ARIMA Models.


Purpose

  1. Identify Seasonality: Spikes at regular intervals (e.g., every 12 lags for monthly data).
  2. Determine Model Order: Use the shape of ACF/PACF plots to choose p (AR) and q (MA).
  3. Residual Checking: Are the errors "White Noise"? (Ideally, ACF should be zero for all lags > 0).

The Rules of Thumb (Box-Jenkins)

Plot AR Process (p) MA Process (q) ARMA (p,q)
ACF Decays gradually (Geometric/Sinusoidal) Cuts off after lag q Decays gradually
PACF Cuts off after lag p Decays gradually Decays gradually
Mnemonic

  • AR(p): Look at PACF. Significant spike at p, then zero.
  • MA(q): Look at ACF. Significant spike at q, then zero.


Worked Example: Identifying a Model

Problem

You plot ACF and PACF for a stationary series.

Observation 1 (PACF):

  • Huge spike at Lag 1 (r=0.8).
  • Spike at Lag 2 is small/insignificant.
  • Conclusion: This suggests AR(1).

Observation 2 (ACF):

  • ACF starts high (0.8) and slowly decays (0.64, 0.51...).
  • This confirms it is an AR process (gradual decay).

Model Proposal: ARIMA(1, 0, 0).


Assumptions


Python Implementation

import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf

# Load Data
# data = pd.read_csv(...)['Sales']

# Plot
fig, ax = plt.subplots(1, 2, figsize=(12, 4))
plot_acf(data, lags=20, ax=ax[0])
plot_pacf(data, lags=20, ax=ax[1])
plt.show()

# Interpretation:
# Blue shaded area is the 95% Confidence Interval. 
# Anything outside the blue zone is statistically significant.

Common Pitfall

The "Intermediate" Trap

Why do we need PACF?

  • If Yt1 causes Yt, and Yt2 causes Yt1...
  • Then Yt2 will correlate with Yt purely because of the chain reaction.
  • ACF shows this "echo" (Lag 2 is correlated).
  • PACF removes the middleman (Yt1) and shows the pure correlation of Lag 2. (Result: Zero).
  • Mistake: Using ACF to set AR order usually leads to picking a p that is way too high.