Mann-Whitney U Test

Mann-Whitney U Test

Definition

Core Statement

The Mann-Whitney U Test (also called Wilcoxon Rank-Sum Test) is a non-parametric test that compares the distributions of two independent groups. It tests whether one group tends to have larger values (higher ranks) than the other.


Purpose

  1. Compare two groups when the normality assumption of the t-test is violated.
  2. Analyze ordinal data or highly skewed continuous data.
  3. Provide a robust alternative to the independent samples t-test.

When to Use

Use Mann-Whitney When...

  • Outcome is ordinal or continuous but non-normal.
  • Sample size is small and normality cannot be assumed.
  • Data has outliers that would distort the mean.

Limitations

  • Less powerful than the t-test when normality holds.
  • Does not compare means directly; compares ranks/distributions.


Theoretical Background

What It Tests

Mann-Whitney tests whether the probability that a randomly selected value from Group A is greater than a randomly selected value from Group B is different from 0.5.

Equivalently, it tests for a location shift in the distributions (one group being stochastically higher).

Procedure

  1. Combine all observations from both groups.
  2. Rank all observations from lowest to highest.
  3. Sum the ranks for each group.
  4. The U statistic is calculated from these rank sums.

Hypotheses


Worked Example: Salary Disparity

Problem

You compare weekly bonuses in two departments.

  • Dept A: [100, 110, 105, 120, 5000] (Includes CEO outlier).
  • Dept B: [150, 160, 155, 170, 165] (Consistent).

Issue: T-test would see Dept A has huge mean ($1087) vs Dept B ($160) and might find no diff due to huge variance, or wrongly say A > B.
Task: Use Mann-Whitney to check if distributions differ.

Solution:

  1. Rank All Observations (Low to High):

    • 100 (1), 105 (2), 110 (3), 120 (4), 150 (5), 155 (6), 160 (7), 165 (8), 170 (9), 5000 (10).
  2. Assign Ranks:

    • Dept A Ranks: 1, 2, 3, 4, 10. Sum: RA=20.
    • Dept B Ranks: 5, 6, 7, 8, 9. Sum: RB=35.
  3. Calculate U:

    • nA=5,nB=5.
    • UA=nAnB+nA(nA+1)2RA=25+1520=20.
    • UB=nAnB+nB(nB+1)2RB=25+1535=5.
    • Min U = 5.
  4. Result:

    • Critical U for n=5,5,α=0.05 is 2. Since 5>2, result is typically not significant at 0.05, but notice RB>RA consistently except for the outlier. In larger samples, B would be significantly higher.
    • Rank test captures that 4/5 people in B earn more than 4/5 in A, ignoring the outlier.

Assumptions


Limitations

Pitfalls

  1. Testing Medians? Only sometimes. Mann-Whitney tests if P(X>Y)0.5. This is only a test of medians if distributions have the same shape. If one is skewed right and one left, you can have significant U but equal medians.
  2. Ties: Many tied values (e.g., Likert scale) reduce the power of the test.
  3. Sample Size: For n>20, the U statistic is approximated by a Normal Z-score. For tiny samples, use exact tables.


Python Implementation

from scipy import stats
import numpy as np

group_A = np.array([5, 8, 10, 12, 15])
group_B = np.array([20, 22, 25, 28, 30])

# Mann-Whitney U Test
u_stat, p_val = stats.mannwhitneyu(group_A, group_B, alternative='two-sided')

print(f"U-statistic: {u_stat}")
print(f"p-value: {p_val:.4f}")

if p_val < 0.05:
    print("Significant difference in distributions.")

R Implementation

# Wilcoxon Rank-Sum (equivalent to Mann-Whitney)
wilcox.test(group_A, group_B, paired = FALSE)

# For exact p-value (small samples)
wilcox.test(group_A, group_B, paired = FALSE, exact = TRUE)

Interpretation Guide

Output Interpretation
Output Interpretation
-------- ----------------
p < 0.05 Reject H0. The populations are different.
Rank Sum A < Rank Sum B Group A tends to have smaller values than Group B.
Probability Shift P(ValueA>ValueB)0.5. Stochastic dominance.
Robustness Outliers transformed to ranks lose their leverage (5000 becomes just "Highest").