Tukey's HSD
Tukey's HSD (Honestly Significant Difference)
Definition
Tukey's HSD (Honestly Significant Difference) is a post-hoc test used after a significant One-Way ANOVA to determine which specific group pairs are significantly different. It controls the Family-Wise Error Rate (FWER), preventing inflation of Type I error from multiple comparisons.
Purpose
- ANOVA tells you that at least one group is different; Tukey tells you which groups differ.
- Make all pairwise comparisons while maintaining an overall alpha level (e.g., 0.05).
When to Use
- ANOVA result is significant (
). - You want to compare all pairs of groups.
- Groups have equal sample sizes (or use Tukey-Kramer for unequal).
- If you have specific hypotheses (not all pairs): Use planned contrasts or Bonferroni Correction.
- If ANOVA assumptions are violated: Use Kruskal-Wallis Test followed by Dunn's test.
Theoretical Background
How It Works
Tukey's HSD calculates a critical difference based on the Studentized Range Distribution:
where
Decision Rule: Two groups are significantly different if
Family-Wise Error Rate (FWER)
Without adjustment, running
Tukey adjusts the critical value so the overall error rate stays at
Assumptions
Inherits assumptions from One-Way ANOVA:
Limitations
- Conservative with Unequal n: For very unequal sample sizes, Tukey-Kramer is used but may be conservative.
- Only for Pairwise Comparisons: Does not handle complex contrasts (e.g., Group A+B vs Group C).
- Requires Significant ANOVA: Running Tukey without a significant F-test is discouraged (fishing for differences).
Python Implementation
from statsmodels.stats.multicomp import pairwise_tukeyhsd
import pandas as pd
# Assume df has columns 'value' and 'group'
tukey = pairwise_tukeyhsd(endog=df['value'], groups=df['group'], alpha=0.05)
print(tukey)
# Visualization
tukey.plot_simultaneous()
R Implementation
# 1. Fit ANOVA
model <- aov(Value ~ Group, data = df)
# 2. Tukey HSD
tukey_result <- TukeyHSD(model)
print(tukey_result)
# 3. Plot Confidence Intervals
plot(tukey_result)
# If the interval crosses 0, the difference is NOT significant.
Worked Numerical Example
ANOVA: Significant (
Means: A=50, B=55, C=60.
HSD Critical Diff: 4.5.
Comparisons:
- |A - B| = 5:
Significant. - |B - C| = 5:
Significant. - |A - C| = 10:
Significant.
Result: All groups are distinct. A < B < C.
(If Critical Diff were 6.0, then A vs B would NOT be significant).
Interpretation Guide
| Comparison | Diff | p-adj | Interpretation | Edge Case Notes |
|---|---|---|---|---|
| G2 - G1 | 5.2 | 0.002 | Significant difference ( |
CI excludes 0. |
| G3 - G1 | 1.5 | 0.45 | NOT significant. | CI includes 0. |
| 95% CI | [-1, 4] | - | Range includes 0. No evidence of diff. | |
| p-adj = 0.051 | - | 0.051 | Not Significant. | Strict thresholding applies. |
Plot Interpretation:
- If the horizontal confidence interval for a pair does not cross 0, the difference is significant.
Common Pitfall Example
Scenario: ANOVA p-value = 0.08 (Not significant).
Analyst: "Let me run Tukey anyway, maybe Group 1 vs Group 5 is different."
Result: Tukey shows G1-G5 difference is significant.
Problem:
- You violated the logic of the test ("Protected Least Significant Difference").
- By ignoring the non-significant ANOVA, you are capitalizing on chance.
- This is p-hacking.
Rule: If ANOVA F-test > 0.05, STOP. Do not run post-hoc tests.
Related Concepts
- One-Way ANOVA
- Bonferroni Correction - More conservative alternative.
- Kruskal-Wallis Test - Non-parametric ANOVA.
- Multiple Comparisons Problem