Bonferroni Correction
Bonferroni Correction
Definition
The Bonferroni Correction is a method to control the Family-Wise Error Rate (FWER) when performing multiple hypothesis tests. It adjusts the significance threshold by dividing alpha by the number of tests:
where
Purpose
- Prevent inflation of Type I error when running multiple tests.
- Ensure the probability of making at least one false positive across all tests remains below
.
When to Use
- You are running multiple hypothesis tests (e.g., comparing many group pairs).
- You want to strictly control the probability of any false positive.
- Tests are pre-planned and not exploratory.
- Bonferroni is very conservative. With many tests,
becomes tiny, reducing Power (ability to detect true effects). - For exploratory analysis with many tests (e.g., genomics), consider False Discovery Rate (FDR) methods like Benjamini-Hochberg.
Theoretical Background
The Problem
If you run 20 tests at
A 64% chance of at least one false positive.
The Solution
Bonferroni adjusts each test's threshold:
- Original:
- Adjusted:
Now, each p-value must be < 0.0025 to be considered significant.
Equivalently, you can multiply each p-value by
Limitations
- Overly Conservative: Increased risk of Type II errors (missing real effects).
- Assumes Independence: If tests are correlated, Bonferroni is even more conservative than necessary.
- Not Optimal for Large
: For genome-wide studies ( = 1 million), use FDR instead.
Python Implementation
from scipy import stats
import numpy as np
# Example: 5 p-values from 5 tests
p_values = np.array([0.01, 0.04, 0.03, 0.002, 0.15])
# Bonferroni Correction
m = len(p_values)
bonferroni_threshold = 0.05 / m
bonferroni_adjusted = np.minimum(p_values * m, 1.0)
print(f"Bonferroni Threshold: {bonferroni_threshold}")
print(f"Adjusted p-values: {bonferroni_adjusted}")
print(f"Significant (adjusted): {bonferroni_adjusted < 0.05}")
# Using statsmodels
from statsmodels.stats.multitest import multipletests
reject, pvals_corrected, _, _ = multipletests(p_values, method='bonferroni')
print(f"Corrected p-values: {pvals_corrected}")
R Implementation
# Example p-values
p_values <- c(0.01, 0.04, 0.03, 0.002, 0.15)
# Bonferroni Adjustment
adjusted <- p.adjust(p_values, method = "bonferroni")
print(adjusted)
# Which are still significant?
print(adjusted < 0.05)
Worked Numerical Example
Objective: Test if 100 genes are related to a disease.
Significance Level (
Number of Tests (
Bonferroni Threshold:
Results:
- Gene A:
. (Significant at 0.05, but Not Significant after correction). - Gene B:
. (Significant even after correction).
Implication: We discard Gene A finding to prevent false positives, even though
Interpretation Guide
| Original p | Adjusted p | Significant? | Edge Case Notes |
|---|---|---|---|
| 0.01 | 0.05 | Borderline | Just barely significant ( |
| 0.002 | 0.01 | Yes | Robust finding. |
| 0.04 | 0.20 | No | "Disappeared" after correction. |
| 0.00001 | 0.00005 | Yes | Highly significant. |
| Adj p > 1.0 | 1.0 | No | Formula gives >1, capped at 1.0. |
Common Pitfall Example
Scenario: Measuring 5 related outcomes (e.g., Anxiety score Day 1, Day 2, Day 3...).
Tests: 5 t-tests.
Bonferroni:
Why it's too harsh:
- The outcomes are highly correlated (Day 1 Anxiety predicts Day 2).
- The "effective" number of independent tests is < 5.
- Bonferroni acts as if they are 5 totally random, independent coin flips.
Consequence: You lose Power. You fail to detect real effects.
Better Option: MANOVA or Mixed Models to handle correlation, or FDR.
Related Concepts
- Multiple Comparisons Problem
- Tukey's HSD - Alternative for all pairwise comparisons.
- False Discovery Rate (FDR) - Less conservative alternative.
- One-Way ANOVA