Kruskal-Wallis Test

Kruskal-Wallis Test

Definition

Core Statement

The Kruskal-Wallis H Test is the non-parametric alternative to One-Way ANOVA. It compares the rank distributions of three or more independent groups to determine if at least one group differs.


Purpose

  1. Test for differences among 3+ groups when data is ordinal or non-normal.
  2. Extend Mann-Whitney to multiple groups.

When to Use

Use Kruskal-Wallis When...

  • Outcome is ordinal or non-normal continuous.
  • There are three or more independent groups.
  • ANOVA assumptions (normality, equal variance) are violated.

Limitations

  • Like ANOVA, a significant result only tells you a difference exists, not which groups differ.
  • Requires post-hoc tests (Dunn's Test) for pairwise comparisons.


Theoretical Background

The H Statistic

H=12N(N+1)j=1kRj2nj3(N+1)

where Rj is the sum of ranks in group j, nj is the sample size of group j, and N is total sample size.

Under H0, H follows a chi-squared distribution with k1 degrees of freedom.


Worked Example: Pain Relief Study

Problem

Comparing 3 drugs for pain relief (Scale 1-10, Ordinal).

  • Drug A: [2, 3, 3, 4] (Low pain)
  • Drug B: [5, 6, 5, 7] (Medium pain)
  • Drug C: [8, 9, 8, 10] (High pain)

Question: Is there a difference in effectiveness?

Solution:

  1. Rank all data (N=12):

    • A: [1, 2.5, 2.5, 4] RA=10.
    • B: [5.5, 7, 5.5, 8] RB=26.
    • C: [9.5, 11, 9.5, 12] RC=42.
  2. Calculate H Statistic:

    H=1212(13)(1024+2624+4224)3(13)H=113(25+169+441)39H=635133948.8439=9.84
  3. Result:

    • df=k1=2. Critical χ2 (0.05, 2) = 5.99.
    • 9.84>5.99. Reject H0.
    • Conclusion: Drug pain levels differ significantly. (A is best, C is worst).

Assumptions


Limitations

Pitfalls

  1. "One-Shot" Fallacy: Reporting a significant Kruskal-Wallis test isn't enough. You must do Dunn's Test to prove A is different from B.
  2. Weak for small samples: With n=3 per group, very hard to find significance.
  3. Shape assumption: If shapes vary widely (one bimodal, one normal), the test is less interpretable as a meaningful comparison.


Python Implementation

from scipy import stats
import scikit_posthocs as sp

group1 = [5, 6, 7, 8]
group2 = [10, 12, 14, 16]
group3 = [20, 22, 24, 26]

# Kruskal-Wallis Test
h_stat, p_val = stats.kruskal(group1, group2, group3)
print(f"H-statistic: {h_stat:.2f}, p-value: {p_val:.4f}")

# Post-Hoc: Dunn's Test (requires scikit-posthocs)
import pandas as pd
data = group1 + group2 + group3
groups = ['G1']*4 + ['G2']*4 + ['G3']*4
dunn = sp.posthoc_dunn([group1, group2, group3], p_adjust='bonferroni')
print(dunn)

R Implementation

# Kruskal-Wallis Test
kruskal.test(Value ~ Group, data = df)

# Post-Hoc: Dunn's Test
library(FSA)
dunnTest(Value ~ Group, data = df, method = "bonferroni")

Interpretation Guide

Output Interpretation
Output Interpretation
-------- ----------------
H = 9.84, p = 0.007 Reject H0. Generally, ranks are not randomly distributed across groups.
High H Value Large separation between sums of ranks (Mean Rank A Mean Rank B).
Dunn p-adj < 0.05 Specific pair (e.g., A vs C) is significantly different.
Effect Size (ηH2) Measure of how much variance is explained by group membership.