Outlier Analysis (Standardized Residuals)
Outlier Analysis (Standardized Residuals)
Overview
Definition
Outliers are data points that deviate significantly from the rest of the observations.
- Standardized Residuals: The model errors divided by their standard deviation. This puts errors on a Z-score scale.
Rule of Thumb
: Suspect. (Occurs 5% of time). : Outlier. (Occurs 0.3% of time).
1. Python Implementation
import numpy as np
# 1. Get Standardized Residuals
influence = model.get_influence()
standardized_residuals = influence.resid_studentized_internal
# 2. Filter Outliers (|Z| > 3)
outliers = np.where(np.abs(standardized_residuals) > 3)[0]
print(f"Outlier Indices: {outliers}")
2. R Implementation
# 1. Get Standardized Residuals (rstandard)
res_std <- rstandard(model)
# 2. Identify
outliers <- which(abs(res_std) > 3)
print(outliers)
# 3. Plot
plot(model, 3) # Scale-Location plot helps see them
3. Related Concepts
- Cook's Distance - Measuring Influence (Does the outlier matter?).
- Leverage (Hat Matrix)