Regularization

Definition

Core Statement

Regularization is a technique used to prevent overfitting by adding a penalty term to the model's loss function. This penalty discourages complex models (large coefficients), biasedly "shrinking" estimates towards zero to reduce Variance.

Loss = Data Fit Error + λ \times Complexity Penalty

Purpose

Bias-Variance Trade-off: Intentionally introduce a small amount of Bias to achieve a large reduction in Variance.
Generalization: Helps the model perform better on unseen data.
Ill-Posed Problems: Solves problems where there are more features than observations ( $p > n$ ).

Key Methods

Method	Penalty	Effect	Usage
Ridge Regression	L2 ( $\sum β^{2}$ )	Shrinks all coeffs; none to zero.	Multicollinearity, Dense data.
Lasso Regression	L1 ($\sum	\beta	$)
Elastic Net	L1 + L2	Best of both worlds.	Correlated features, Feature selection.

Conceptual Example: Polynomial Fitting

Fitting a Line to Noisy Data

Data: 10 points that roughly follow a line, but with noise.

Linear Model: Underfits slightly.
10th Degree Polynomial: Hits every single point perfectly. $R^{2} = 1.0$ .
- Problem: The curve goes wild between points. Huge variance.
- Coefficients: $β_{10} = 5, 000, 000$ .
Regularized Polynomial: Fits the curve, but penalty prevents $β = 5, 000, 000$ .
- Coefficients kept small. Curve is smooth.
- Result: Good fit ( $R^{2} = 0.9$ ) and stable predictions.

When to Use

Always Consider Regularization When...

Model is Overfitting (Train score >> Test score).
Sample size is small relative to number of features.
Collinearity is high.
You want a robust deployment model.

Bias-Variance Trade-off
Overfitting
Cross-Validation - Essential for choosing $λ$ (strength of penalty).