Mann-Whitney U Test
Mann-Whitney U Test
Definition
The Mann-Whitney U Test (also called Wilcoxon Rank-Sum Test) is a non-parametric test that compares the distributions of two independent groups. It tests whether one group tends to have larger values (higher ranks) than the other.
Purpose
- Compare two groups when the normality assumption of the t-test is violated.
- Analyze ordinal data or highly skewed continuous data.
- Provide a robust alternative to the independent samples t-test.
When to Use
- Outcome is ordinal or continuous but non-normal.
- Sample size is small and normality cannot be assumed.
- Data has outliers that would distort the mean.
- Less powerful than the t-test when normality holds.
- Does not compare means directly; compares ranks/distributions.
Theoretical Background
What It Tests
Mann-Whitney tests whether the probability that a randomly selected value from Group A is greater than a randomly selected value from Group B is different from 0.5.
Equivalently, it tests for a location shift in the distributions (one group being stochastically higher).
Procedure
- Combine all observations from both groups.
- Rank all observations from lowest to highest.
- Sum the ranks for each group.
- The U statistic is calculated from these rank sums.
Hypotheses
: The distributions of the two groups are identical. : The distributions differ (one tends to have higher values).
Worked Example: Salary Disparity
You compare weekly bonuses in two departments.
- Dept A: [100, 110, 105, 120, 5000] (Includes CEO outlier).
- Dept B: [150, 160, 155, 170, 165] (Consistent).
Issue: T-test would see Dept A has huge mean ($1087) vs Dept B ($160) and might find no diff due to huge variance, or wrongly say A > B.
Task: Use Mann-Whitney to check if distributions differ.
Solution:
-
Rank All Observations (Low to High):
- 100 (1), 105 (2), 110 (3), 120 (4), 150 (5), 155 (6), 160 (7), 165 (8), 170 (9), 5000 (10).
-
Assign Ranks:
- Dept A Ranks: 1, 2, 3, 4, 10. Sum:
. - Dept B Ranks: 5, 6, 7, 8, 9. Sum:
.
- Dept A Ranks: 1, 2, 3, 4, 10. Sum:
-
Calculate U:
. . . - Min U = 5.
-
Result:
- Critical U for
is 2. Since , result is typically not significant at 0.05, but notice consistently except for the outlier. In larger samples, B would be significantly higher. - Rank test captures that 4/5 people in B earn more than 4/5 in A, ignoring the outlier.
- Critical U for
Assumptions
Limitations
- Testing Medians? Only sometimes. Mann-Whitney tests if
. This is only a test of medians if distributions have the same shape. If one is skewed right and one left, you can have significant U but equal medians. - Ties: Many tied values (e.g., Likert scale) reduce the power of the test.
- Sample Size: For
, the U statistic is approximated by a Normal Z-score. For tiny samples, use exact tables.
Python Implementation
from scipy import stats
import numpy as np
group_A = np.array([5, 8, 10, 12, 15])
group_B = np.array([20, 22, 25, 28, 30])
# Mann-Whitney U Test
u_stat, p_val = stats.mannwhitneyu(group_A, group_B, alternative='two-sided')
print(f"U-statistic: {u_stat}")
print(f"p-value: {p_val:.4f}")
if p_val < 0.05:
print("Significant difference in distributions.")
R Implementation
# Wilcoxon Rank-Sum (equivalent to Mann-Whitney)
wilcox.test(group_A, group_B, paired = FALSE)
# For exact p-value (small samples)
wilcox.test(group_A, group_B, paired = FALSE, exact = TRUE)
Interpretation Guide
| Output | Interpretation |
|---|---|
| Output | Interpretation |
| -------- | ---------------- |
| p < 0.05 | Reject |
| Rank Sum A < Rank Sum B | Group A tends to have smaller values than Group B. |
| Probability Shift | |
| Robustness | Outliers transformed to ranks lose their leverage (5000 becomes just "Highest"). |
Related Concepts
- Student's T-Test - Parametric alternative.
- Wilcoxon Signed-Rank Test - For paired (dependent) samples.
- Kruskal-Wallis Test - For 3+ groups.