Log Transformations

Log Transformations

Overview

Definition

Log Transformation involves applying the natural logarithm (ln(x)) or log-base-10 to a variable.

  • Goal: Make skewed data more Normal.
  • Goal: Stabilize Variance (Homoscedasticity).
  • Goal: Linearize Multiplicative relationships.
Constraint

You cannot take the log of 0 or negative numbers.

  • Fix: Use ln(x+1) or Box-Cox Transformation.

1. Interpretation

Using Log-Transformed Variables changes the interpretation of regression coefficients.

Model Equation Interpretation of β
Log-Level ln(Y)=βX 1 unit change in X β×100% change in Y.
Level-Log Y=βln(X) 1% change in X β/100 unit change in Y.
Log-Log ln(Y)=βln(X) 1% change in X β% change in Y (Elasticity).

2. Python Implementation

import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

# Skewed Data
data = np.random.exponential(size=1000)

# Log Transform
# log1p calculates log(1+x) to handle zeros safely
log_data = np.log1p(data)

# Visual Check
fig, ax = plt.subplots(1, 2)
ax[0].hist(data, bins=30); ax[0].set_title("Original (Skewed)")
ax[1].hist(log_data, bins=30); ax[1].set_title("Log-Transformed (Normal-ish)")
plt.show()

# Box-Cox (Automated best power fit)
boxcox_data, lam = stats.boxcox(data + 0.001) # Data must be positive
print(f"Optimal Lambda: {lam:.3f}") # If lambda ~ 0, Log is best.

3. R Implementation

# 1. Log Transform
df$log_income <- log(df$income)

# 2. Log(x+1)
df$log_income_safe <- log1p(df$income)

# 3. Box-Cox
library(MASS)
boxcox(lm(income ~ 1, data=df))