Log Transformations

Overview

Definition

Log Transformation involves applying the natural logarithm ( $\ln (x)$ ) or log-base-10 to a variable.

Goal: Make skewed data more Normal.
Goal: Stabilize Variance (Homoscedasticity).
Goal: Linearize Multiplicative relationships.

Constraint

You cannot take the log of 0 or negative numbers.

Fix: Use $\ln (x + 1)$ or Box-Cox Transformation.

1. Interpretation

Using Log-Transformed Variables changes the interpretation of regression coefficients.

Model	Equation	Interpretation of $β$
Log-Level	$\ln (Y) = β X$	1 unit change in $X$ $\to$ $β \times 100$ % change in $Y$ .
Level-Log	$Y = β \ln (X)$	1% change in $X$ $\to$ $β / 100$ unit change in $Y$ .
Log-Log	$\ln (Y) = β \ln (X)$	1% change in $X$ $\to$ $β$ % change in $Y$ (Elasticity).

2. Python Implementation

import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

# Skewed Data
data = np.random.exponential(size=1000)

# Log Transform
# log1p calculates log(1+x) to handle zeros safely
log_data = np.log1p(data)

# Visual Check
fig, ax = plt.subplots(1, 2)
ax[0].hist(data, bins=30); ax[0].set_title("Original (Skewed)")
ax[1].hist(log_data, bins=30); ax[1].set_title("Log-Transformed (Normal-ish)")
plt.show()

# Box-Cox (Automated best power fit)
boxcox_data, lam = stats.boxcox(data + 0.001) # Data must be positive
print(f"Optimal Lambda: {lam:.3f}") # If lambda ~ 0, Log is best.

3. R Implementation

# 1. Log Transform
df$log_income <- log(df$income)

# 2. Log(x+1)
df$log_income_safe <- log1p(df$income)

# 3. Box-Cox
library(MASS)
boxcox(lm(income ~ 1, data=df))