Bias-Variance Trade-off

Definition

Core Statement

The Bias-Variance Trade-off describes the fundamental tension in predictive modeling: simple models have high bias (systematic error), while complex models have high variance (sensitivity to noise). The goal is to find the sweet spot that minimizes total error.

Purpose

Understand why models fail (underfitting vs overfitting).
Guide model selection and regularization strategies.
Explain why more complexity is not always better.

When to Use

This is a conceptual framework for interpreting model performance:

Why does my model perform poorly on test data?
Should I add more features or simplify the model?
How do regularization methods (Ridge Regression, Lasso Regression) help?

Theoretical Background

Decomposition of Prediction Error

For a given model $\hat{f} (x)$ predicting true function $f (x)$ :

Expected Test Error = \underset{Systematic Error}{\underset{⏟}{(Bias [\hat{f} (x)])^{2}}} + \underset{Sensitivity to Data}{\underset{⏟}{Variance [\hat{f} (x)]}} + \underset{Irreducible Error}{\underset{⏟}{σ^{2}}}

Component	Meaning	Cause
Bias	Error from wrong assumptions. Model is too simple.	Underfitting. Missing important patterns.
Variance	Error from sensitivity to training data. Model is too complex.	Overfitting. Fitting noise.
Irreducible Error	Noise in the data itself.	Cannot be reduced by any model.

The Trade-off

Model Complexity	Bias	Variance	Total Error
Too Simple	High (underfitting)	Low	High
Optimal	Moderate	Moderate	Minimum
Too Complex	Low	High (overfitting)	High

Goldilocks Zone

The best model is neither too simple nor too complex. It balances bias and variance to minimize total error.

Visual Intuition

Test Error
    │
    │     ╱‾‾‾╲
    │    ╱     ╲ Variance
    │   ╱       ╲
    │  ╱         ╲___
    │ ╱               ╲___
    │╱_____________________╲___ Bias²
    │                          
    └──────────────────────── Model Complexity
    Simple              Complex

Assumptions

This is a mathematical decomposition, not a test with assumptions. However:

Model class is appropriate (e.g., don't fit a linear model to exponential data).

Limitations

Pitfalls

Cannot directly measure bias and variance separately on real data (only the total error).
Trade-off is not always smooth. Discontinuities can occur (e.g., adding a critical variable).
Depends on data size: With infinite data, variance goes to zero and only bias matters.

Addressing the Trade-off

Problem	Solution
High Bias (Underfitting)	Add features, increase model complexity, reduce regularization.
High Variance (Overfitting)	Use Cross-Validation, add Ridge Regression/Lasso Regression, reduce features, collect more data.

Python Implementation

import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

# Generate Data
np.random.seed(42)
X = np.linspace(0, 10, 100).reshape(-1, 1)
y_true = np.sin(X).ravel()
y = y_true + np.random.normal(0, 0.2, 100)

# Fit Polynomials of Different Degrees
degrees = [1, 3, 15]
fig, axes = plt.subplots(1, 3, figsize=(15, 4))

for i, degree in enumerate(degrees):
    poly = PolynomialFeatures(degree=degree)
    X_poly = poly.fit_transform(X)
    model = LinearRegression().fit(X_poly, y)
    
    y_pred = model.predict(X_poly)
    mse = mean_squared_error(y, y_pred)
    
    axes[i].scatter(X, y, alpha=0.5, label='Data')
    axes[i].plot(X, y_pred, 'r-', label=f'Degree {degree}')
    axes[i].set_title(f"Degree {degree}\nMSE = {mse:.3f}")
    axes[i].legend()

plt.tight_layout()
plt.show()

# Degree 1: High Bias (Underfitting)
# Degree 3: Good Balance
# Degree 15: High Variance (Overfitting)

R Implementation

set.seed(42)

# True function: sine wave
x <- seq(0, 10, length.out = 100)
y_true <- sin(x)
y <- y_true + rnorm(100, 0, 0.2)

# Fit Polynomial Models
par(mfrow = c(1, 3))

for (degree in c(1, 3, 15)) {
  model <- lm(y ~ poly(x, degree))
  y_pred <- predict(model)
  
  plot(x, y, main = paste("Degree", degree),
       xlab = "x", ylab = "y", pch = 16, col = "gray")
  lines(x, y_pred, col = "red", lwd = 2)
}

Interpretation Guide

Scenario	Diagnosis	Action
Training Error = 0.01, Test Error = 0.50	High Variance (Overfitting).	Regularize, simplify model, more data.
Training Error = 0.30, Test Error = 0.32	High Bias (Underfitting).	Add features, increase complexity.
Training Error = 0.10, Test Error = 0.12	Good fit. Bias and variance balanced.	Deploy model.

Overfitting & Underfitting
Cross-Validation - Detects overfitting.
Ridge Regression / Lasso Regression - Reduce variance.
Model Selection