ARIMA Models
ARIMA Models
Definition
Core Statement
ARIMA (AutoRegressive Integrated Moving Average) is a class of models for forecasting stationary time series. It combines three components:
- AR(p): Autoregressive (past values).
- I(d): Integrated (differencing for stationarity).
- MA(q): Moving Average (past errors).
Purpose
- Forecast future values of a time series.
- Understand the temporal structure of data.
- Benchmark model for univariate forecasting.
When to Use
Use ARIMA When...
- Data is a univariate time series.
- Series is stationary (or can be made so via differencing).
- Goal is short-term forecasting.
Alternatives
- Multiple predictors: Use Vector Autoregression (VAR) or regression.
- Volatility clustering: Use GARCH Models.
- Seasonality: Use SARIMA (Seasonal ARIMA).
Theoretical Background
The Components
| Component | Notation | Meaning |
|---|---|---|
| AR(p) | Regress on past values. | |
| I(d) | Difference |
|
| MA(q) | Regress on past errors. |
Model Equation (ARIMA(p,d,q))
Identification (Box-Jenkins Method)
- Plot ACF/PACF: Use patterns to guess
and . - Test Stationarity: Stationarity (ADF & KPSS). Apply differencing if needed.
- Fit Model: Estimate parameters.
- Diagnose Residuals: Should be white noise (no autocorrelation).
- Forecast.
Assumptions
Limitations
Pitfalls
- Requires Stationarity: Non-stationary series must be differenced.
- Univariate: Does not incorporate external predictors. Use ARIMAX or VAR.
- Short-Term Forecasts Only: Long-term forecasts revert to the mean.
- Manual Order Selection: Box-Jenkins requires expertise. Use
auto.arima()in R.
Python Implementation
import pmdarima as pm
from statsmodels.tsa.arima.model import ARIMA
import matplotlib.pyplot as plt
# Auto-ARIMA (Automatic Order Selection)
auto_model = pm.auto_arima(series, seasonal=False, stepwise=True, trace=True)
print(auto_model.summary())
# Manual ARIMA
model = ARIMA(series, order=(2, 1, 2)).fit()
print(model.summary())
# Forecast
forecast = model.get_forecast(steps=10)
print(forecast.predicted_mean)
forecast.plot_predict()
plt.show()
R Implementation
library(forecast)
# Auto-ARIMA (Best practice)
fit <- auto.arima(ts_data)
summary(fit)
# Forecast
fc <- forecast(fit, h = 12)
plot(fc)
# Check Residuals (Should show no autocorrelation)
checkresiduals(fit)
Worked Numerical Example
Forecasting Weekly Sales
Data: 100 weeks of sales data.
Process:
- Visual Check: Trend is upward (Non-stationary).
- Differencing (d=1): Values become stationary (
). - ACF Plot: Sharp cutoff after Lag 1. Suggests MA(1).
- PACF Plot: Exponential decay. Suggests MA(1).
Model: ARIMA(0, 1, 1).
- Equation:
. - This is "Simple Exponential Smoothing".
Forecast: Next week's sales are a weighted average of recent sales, with more weight on the most recent week.
Interpretation Guide
| Output | Interpretation | Edge Case Notes |
|---|---|---|
| ARIMA(1,1,1) | 1 AR term, 1 diff, 1 MA term. | Standard robust baseline. |
| ARIMA(0,1,0) | "Random Walk". Forecast = Last Value. | Common for stock prices. Hard to beat. |
| AIC = 300 vs 350 | AIC 300 is superior. | Improvement > 2 is significant. |
| p-value > 0.05 (Ljung-Box) | Residuals are white noise (Good). | Model has captured all signal. |
| Coef close to 1 | Unit root issue? | Series might need more differencing. |
Common Pitfall Example
The Stock Price Fallacy
Scenario: Trying to forecast Google Stock Price (
Mistake: Fit ARIMA(2,1,2). Get great "in-sample" fit (
Reality Check:
- Plotting the forecast shows it just lags the real price by 1 day.
- The best predictor of tomorrow's price is often today's price (Random Walk).
- ARIMA cannot predict "shocks" or news.
Lesson: ARIMA works best for inertial systems (sales, temperature, inventory), not efficient markets (stocks).
Related Concepts
- Stationarity (ADF & KPSS) - Prerequisite.
- Vector Autoregression (VAR) - Multivariate extension.
- GARCH Models - For volatility.
- Granger Causality - Testing predictive relationships.