Spearman's Rank Correlation
Spearman's Rank Correlation
Definition
Core Statement
Spearman's Rank Correlation (
Purpose
- Measure association when the relationship is monotonic but not necessarily linear.
- Analyze ordinal data (e.g., rankings, Likert scales).
- Provide a robust alternative to Pearson when outliers are present.
When to Use
Use Spearman When...
- Data is ordinal.
- The relationship is monotonic (always increasing or always decreasing), but not necessarily linear.
- Outliers are present.
- Normality is not met.
Monotonic vs Linear
- Linear:
(straight line). - Monotonic:
increases as increases (curve OK). E.g., for .
Theoretical Background
Calculation
- Rank all
values (1 = smallest). Rank all values. - Calculate Pearson correlation on the ranks.
where
Interpretation
Same as Pearson: ranges from -1 to +1.
Assumptions
Limitations
Pitfalls
- Ties reduce precision: Many tied values can distort
. - Does not capture non-monotonic relationships: If the relationship changes direction (e.g., U-shaped), Spearman fails.
Python Implementation
from scipy import stats
rho, p_val = stats.spearmanr(x, y)
print(f"Spearman rho: {rho:.3f}")
print(f"p-value: {p_val:.4f}")
R Implementation
cor.test(x, y, method = "spearman")
Worked Numerical Example
Contest Rankings
Data: 5 Participants.
- Judge A Ranks: [1, 2, 3, 4, 5]
- Judge B Ranks: [1, 3, 2, 5, 4]
Differences (
- Squared diffs (
): . Sum = 4.
Calculation:
.
Interpretation: Strong positive agreement (
Interpretation Guide
| Scenario | Interpretation | Edge Case Notes |
|---|---|---|
| Strong positive monotonic relationship. | X increases |
|
| Moderate negative monotonic relationship. | X increases |
|
| Pearson |
Relationship is monotonic but non-linear (e.g., exponential). | Spearman is better metric here. |
| No monotonic relationship. | Could still be non-monotonic (U-shape). |
Common Pitfall Example
The "Ties" Trap
Scenario: Analyzing Customer Satisfaction (1-5 scale).
Data: Thousands of customers, only 5 possible values (many ties).
Problem:
- The standard formula
assumes no ties. - With heavy ties, this formula is inaccurate.
Solution:
- Use software (Python/R) which automatically uses the complicated "tie-corrected" formula.
- Do not calculate manually using the simplified formula for Likert scale data.
Related Concepts
- Pearson Correlation - Parametric, linear.
- Kendall's Tau - Alternative for small samples.