Bias-Variance Trade-off
Bias-Variance Trade-off
Definition
Core Statement
The Bias-Variance Trade-off describes the fundamental tension in predictive modeling: simple models have high bias (systematic error), while complex models have high variance (sensitivity to noise). The goal is to find the sweet spot that minimizes total error.
Purpose
- Understand why models fail (underfitting vs overfitting).
- Guide model selection and regularization strategies.
- Explain why more complexity is not always better.
When to Use
This is a conceptual framework for interpreting model performance:
- Why does my model perform poorly on test data?
- Should I add more features or simplify the model?
- How do regularization methods (Ridge Regression, Lasso Regression) help?
Theoretical Background
Decomposition of Prediction Error
For a given model
| Component | Meaning | Cause |
|---|---|---|
| Bias | Error from wrong assumptions. Model is too simple. | Underfitting. Missing important patterns. |
| Variance | Error from sensitivity to training data. Model is too complex. | Overfitting. Fitting noise. |
| Irreducible Error | Noise in the data itself. | Cannot be reduced by any model. |
The Trade-off
| Model Complexity | Bias | Variance | Total Error |
|---|---|---|---|
| Too Simple | High (underfitting) | Low | High |
| Optimal | Moderate | Moderate | Minimum |
| Too Complex | Low | High (overfitting) | High |
Goldilocks Zone
The best model is neither too simple nor too complex. It balances bias and variance to minimize total error.
Visual Intuition
Test Error
│
│ ╱‾‾‾╲
│ ╱ ╲ Variance
│ ╱ ╲
│ ╱ ╲___
│ ╱ ╲___
│╱_____________________╲___ Bias²
│
└──────────────────────── Model Complexity
Simple Complex
Assumptions
This is a mathematical decomposition, not a test with assumptions. However:
Limitations
Pitfalls
- Cannot directly measure bias and variance separately on real data (only the total error).
- Trade-off is not always smooth. Discontinuities can occur (e.g., adding a critical variable).
- Depends on data size: With infinite data, variance goes to zero and only bias matters.
Addressing the Trade-off
| Problem | Solution |
|---|---|
| High Bias (Underfitting) | Add features, increase model complexity, reduce regularization. |
| High Variance (Overfitting) | Use Cross-Validation, add Ridge Regression/Lasso Regression, reduce features, collect more data. |
Python Implementation
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
# Generate Data
np.random.seed(42)
X = np.linspace(0, 10, 100).reshape(-1, 1)
y_true = np.sin(X).ravel()
y = y_true + np.random.normal(0, 0.2, 100)
# Fit Polynomials of Different Degrees
degrees = [1, 3, 15]
fig, axes = plt.subplots(1, 3, figsize=(15, 4))
for i, degree in enumerate(degrees):
poly = PolynomialFeatures(degree=degree)
X_poly = poly.fit_transform(X)
model = LinearRegression().fit(X_poly, y)
y_pred = model.predict(X_poly)
mse = mean_squared_error(y, y_pred)
axes[i].scatter(X, y, alpha=0.5, label='Data')
axes[i].plot(X, y_pred, 'r-', label=f'Degree {degree}')
axes[i].set_title(f"Degree {degree}\nMSE = {mse:.3f}")
axes[i].legend()
plt.tight_layout()
plt.show()
# Degree 1: High Bias (Underfitting)
# Degree 3: Good Balance
# Degree 15: High Variance (Overfitting)
R Implementation
set.seed(42)
# True function: sine wave
x <- seq(0, 10, length.out = 100)
y_true <- sin(x)
y <- y_true + rnorm(100, 0, 0.2)
# Fit Polynomial Models
par(mfrow = c(1, 3))
for (degree in c(1, 3, 15)) {
model <- lm(y ~ poly(x, degree))
y_pred <- predict(model)
plot(x, y, main = paste("Degree", degree),
xlab = "x", ylab = "y", pch = 16, col = "gray")
lines(x, y_pred, col = "red", lwd = 2)
}
Interpretation Guide
| Scenario | Diagnosis | Action |
|---|---|---|
| Training Error = 0.01, Test Error = 0.50 | High Variance (Overfitting). | Regularize, simplify model, more data. |
| Training Error = 0.30, Test Error = 0.32 | High Bias (Underfitting). | Add features, increase complexity. |
| Training Error = 0.10, Test Error = 0.12 | Good fit. Bias and variance balanced. | Deploy model. |
Related Concepts
- Overfitting & Underfitting
- Cross-Validation - Detects overfitting.
- Ridge Regression / Lasso Regression - Reduce variance.
- Model Selection