Bonferroni Correction

Definition

Core Statement

The Bonferroni Correction is a method to control the Family-Wise Error Rate (FWER) when performing multiple hypothesis tests. It adjusts the significance threshold by dividing alpha by the number of tests:

α_{a d j u s t e d} = \frac{α}{m}

where $m$ is the number of tests.

Purpose

Prevent inflation of Type I error when running multiple tests.
Ensure the probability of making at least one false positive across all tests remains below $α$ .

When to Use

Use Bonferroni When...

You are running multiple hypothesis tests (e.g., comparing many group pairs).
You want to strictly control the probability of any false positive.
Tests are pre-planned and not exploratory.

Limitations

Bonferroni is very conservative. With many tests, $α_{a d j u s t e d}$ becomes tiny, reducing Power (ability to detect true effects).
For exploratory analysis with many tests (e.g., genomics), consider False Discovery Rate (FDR) methods like Benjamini-Hochberg.

Theoretical Background

The Problem

If you run 20 tests at $α = 0.05$ :

P (At least 1 Type I Error) = 1 - (1 - 0.05)^{20} \approx 0.64

A 64% chance of at least one false positive.

The Solution

Bonferroni adjusts each test's threshold:

Original: $α = 0.05$
Adjusted: $α_{a d j} = 0.05 / 20 = 0.0025$

Now, each p-value must be < 0.0025 to be considered significant.

Equivalently, you can multiply each p-value by $m$ and compare to 0.05.

Limitations

Pitfalls

Overly Conservative: Increased risk of Type II errors (missing real effects).
Assumes Independence: If tests are correlated, Bonferroni is even more conservative than necessary.
Not Optimal for Large $m$ : For genome-wide studies ( $m$ = 1 million), use FDR instead.

Python Implementation

from scipy import stats
import numpy as np

# Example: 5 p-values from 5 tests
p_values = np.array([0.01, 0.04, 0.03, 0.002, 0.15])

# Bonferroni Correction
m = len(p_values)
bonferroni_threshold = 0.05 / m
bonferroni_adjusted = np.minimum(p_values * m, 1.0)

print(f"Bonferroni Threshold: {bonferroni_threshold}")
print(f"Adjusted p-values: {bonferroni_adjusted}")
print(f"Significant (adjusted): {bonferroni_adjusted < 0.05}")

# Using statsmodels
from statsmodels.stats.multitest import multipletests
reject, pvals_corrected, _, _ = multipletests(p_values, method='bonferroni')
print(f"Corrected p-values: {pvals_corrected}")

R Implementation

# Example p-values
p_values <- c(0.01, 0.04, 0.03, 0.002, 0.15)

# Bonferroni Adjustment
adjusted <- p.adjust(p_values, method = "bonferroni")
print(adjusted)

# Which are still significant?
print(adjusted < 0.05)

Worked Numerical Example

Genetic Association Study

Objective: Test if 100 genes are related to a disease.
Significance Level ( $α$ ): 0.05.
Number of Tests ( $m$ ): 100.

Bonferroni Threshold: $0.05 / 100 = 0.0005$ .

Results:

Gene A: $p = 0.003$ . (Significant at 0.05, but Not Significant after correction).
Gene B: $p = 0.00001$ . (Significant even after correction).

Implication: We discard Gene A finding to prevent false positives, even though $p = 0.003$ looks "good".

Interpretation Guide

Original p	Adjusted p	Significant?	Edge Case Notes
0.01	0.05	Borderline	Just barely significant ( $p \times m$ ).
0.002	0.01	Yes	Robust finding.
0.04	0.20	No	"Disappeared" after correction.
0.00001	0.00005	Yes	Highly significant.
Adj p > 1.0	1.0	No	Formula gives >1, capped at 1.0.

Common Pitfall Example

The Correlation Trap

Scenario: Measuring 5 related outcomes (e.g., Anxiety score Day 1, Day 2, Day 3...).
Tests: 5 t-tests.

Bonferroni: $α / 5 = 0.01$ .

Why it's too harsh:

The outcomes are highly correlated (Day 1 Anxiety predicts Day 2).
The "effective" number of independent tests is < 5.
Bonferroni acts as if they are 5 totally random, independent coin flips.

Consequence: You lose Power. You fail to detect real effects.
Better Option: MANOVA or Mixed Models to handle correlation, or FDR.

Multiple Comparisons Problem
Tukey's HSD - Alternative for all pairwise comparisons.
False Discovery Rate (FDR) - Less conservative alternative.
One-Way ANOVA

Bonferroni Correction

Definition

Purpose

When to Use

Theoretical Background

The Problem

The Solution

Limitations

Python Implementation

R Implementation

Worked Numerical Example

Interpretation Guide

Common Pitfall Example

Related Concepts