Log Transformations
Log Transformations
Overview
Definition
Log Transformation involves applying the natural logarithm (
- Goal: Make skewed data more Normal.
- Goal: Stabilize Variance (Homoscedasticity).
- Goal: Linearize Multiplicative relationships.
Constraint
You cannot take the log of 0 or negative numbers.
- Fix: Use
or Box-Cox Transformation.
1. Interpretation
Using Log-Transformed Variables changes the interpretation of regression coefficients.
| Model | Equation | Interpretation of |
|---|---|---|
| Log-Level | 1 unit change in |
|
| Level-Log | 1% change in |
|
| Log-Log | 1% change in |
2. Python Implementation
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
# Skewed Data
data = np.random.exponential(size=1000)
# Log Transform
# log1p calculates log(1+x) to handle zeros safely
log_data = np.log1p(data)
# Visual Check
fig, ax = plt.subplots(1, 2)
ax[0].hist(data, bins=30); ax[0].set_title("Original (Skewed)")
ax[1].hist(log_data, bins=30); ax[1].set_title("Log-Transformed (Normal-ish)")
plt.show()
# Box-Cox (Automated best power fit)
boxcox_data, lam = stats.boxcox(data + 0.001) # Data must be positive
print(f"Optimal Lambda: {lam:.3f}") # If lambda ~ 0, Log is best.
3. R Implementation
# 1. Log Transform
df$log_income <- log(df$income)
# 2. Log(x+1)
df$log_income_safe <- log1p(df$income)
# 3. Box-Cox
library(MASS)
boxcox(lm(income ~ 1, data=df))