Welch's T-Test
Welch's T-Test
Definition
Welch's T-Test is a modification of the Student's t-test that is reliable even when the two groups have unequal variances and/or unequal sample sizes. It does not pool variances, making it robust to heteroscedasticity.
Purpose
- Compare means of two independent groups when the homogeneity of variance assumption is violated.
- Serve as a safer default for two-sample mean comparisons.
When to Use
Modern statistical advice recommends using Welch's test by default. It performs as well as Student's t-test when variances are equal, and much better when they are not.
The t.test() function in R uses Welch's t-test by default (var.equal = FALSE).
Theoretical Background
Differences from Student's T
| Feature | Student's T | Welch's T |
|---|---|---|
| Variance Pooling | Yes ( |
No (Separate |
| Degrees of Freedom | Welch-Satterthwaite (Non-integer) | |
| Assumption | Equal variances | No assumption about variances |
Welch-Satterthwaite Degrees of Freedom
This formula yields a non-integer
Assumptions
Limitations
- Still Sensitive to Outliers: The mean is affected by outliers. For robust comparisons, consider Mann-Whitney U Test or bootstrapping.
- Requires Normality: For severely non-normal data with small
, use non-parametric tests.
Python Implementation
from scipy import stats
import numpy as np
group_A = np.array([10, 12, 11, 14, 15])
group_B = np.array([18, 19, 21, 18, 20, 17, 19, 22, 25]) # Different n, different var
# Welch's T-Test (equal_var=False)
t_stat, p_val = stats.ttest_ind(group_A, group_B, equal_var=False)
print(f"Welch t-statistic: {t_stat:.3f}")
print(f"p-value: {p_val:.4f}")
R Implementation
# Welch's T-Test (default in R)
t.test(group_A, group_B) # var.equal = FALSE is default
# Note: df will be a non-integer (e.g., 12.34)
Worked Numerical Example
Data:
- Group A (Interns): Mean = 40k, SD = 5k, n=50
- Group B (CEOs): Mean = 5M, SD = 2M, n=20
Student's t-test (Wrong): Assumes same variance. Finds pooled SD. Might fail to detect diff or exaggerate significance depending on N.
Welch's t-test (Correct):
- Accounts for SD=5k vs SD=2,000k difference.
calculation penalizes the small N of the high-variance group. - Result:
. - Conclusion: Differences are real despite noise.
Interpretation Guide
| Output | Interpretation | Edge Case Notes |
|---|---|---|
| Non-integer df (e.g., 23.45) | Welch-Satterthwaite correction applied. | The closer df is to (N1+N2-2), the more similar variances are. |
| Means are significantly different. | Valid even if SD1 = 1 and SD2 = 1000. | |
| Larger SE than Student's | Welch is being conservatives. | The price of robustness. |
| df drops drastically | Severe heteroscedasticity. | E.g., N=100 total, but df=15. Means variance is driven by small group. |
Common Pitfall Example
Old School Workflow:
- Run Levene's Test.
- If significant
Welch. - If not
Student's.
Modern Best Practice:
- Just use Welch's.
- Why? Testing for variance equality first is a "conditional procedure" that affects error rates.
- Welch's test loses very little power even if variances are equal.
- Recommendation: Set
equal_var=False(Python) or rely ont.testdefault (R) and forget about it.
Related Concepts
- Student's T-Test - Assumes equal variances.
- Levene's Test - Diagnoses unequal variances.
- Mann-Whitney U Test - Non-parametric alternative.