F-Distribution
F-Distribution
Definition
The F-Distribution is a continuous probability distribution that arises as the ratio of two chi-square distributions divided by their respective degrees of freedom. It is the foundation for ANOVA, regression F-tests, and variance ratio tests.
Purpose
- Test equality of variances (Levene's Test uses a related statistic).
- Test overall significance of regression models.
- Compare variance explained by groups in One-Way ANOVA.
- Basis for the F-statistic in multiple testing scenarios.
When to Use
- One-Way ANOVA - Testing if group means differ.
- Multiple Linear Regression - Overall model F-test.
- Variance Ratio Test - Comparing two sample variances.
- Welch's ANOVA - Modified F-test for unequal variances.
Theoretical Background
Definition
If
The F-distribution has two degrees of freedom parameters:
: Numerator degrees of freedom. : Denominator degrees of freedom.
Properties
| Property | Value |
|---|---|
| Mean | |
| Mode | |
| Support | |
| Skewness | Right-skewed, approaches symmetry as |
Shape
- Low df: Extremely right-skewed.
- High df: Approaches Normal Distribution.
- Asymmetry:
. Order matters!
Relationship to T-Distribution
The square of a t-statistic with
Worked Example: Comparing Diet Plans
A researcher compares weight loss from 3 diet plans (A, B, C).
- Between-Group Variability (Signal): Mean Square Between (
) = 50. - Within-Group Variability (Noise): Mean Square Error (
) = 10. - Degrees of Freedom:
(3 groups - 1), (30 subjects - 3).
Question: Is there a significant difference between diets? (
Solution:
-
Calculate F-Statistic:
-
Critical Value:
- Lookup
. - Table value
.
- Lookup
-
Decision:
- Since
, we Reject .
- Since
Conclusion: The variability between potential diet effects is 5 times larger than the random noise. At least one diet is significantly different.
Intuition:
If
If
Assumptions
F-tests assume:
Limitations
- Heteroscedasticity Trap: If group variances are unequal (e.g., one group has huge spread), steady F-test gives false positives. Always Check Levene's Test. If significant, use Welch's F (ANOVA) or Heteroscedasticity-Consistent Standard Errors (Regression).
- Non-Normality: F-test is somewhat robust to non-normality in large samples, but fails for skewed small samples.
- Post-Hoc Amnesia: A significant F only says "Something is different." It doesn't say "A > B". You MUST run post-hoc tests (Tukey's HSD) to find where the difference is.
Python Implementation
from scipy.stats import f
import numpy as np
import matplotlib.pyplot as plt
# F-Distribution with df1=5, df2=20
df1, df2 = 5, 20
dist = f(df1, df2)
# Critical Value (95th percentile, one-tailed)
critical_value = dist.ppf(0.95)
print(f"F Critical Value (df1={df1}, df2={df2}, α=0.05): {critical_value:.3f}")
# P-value for observed F-statistic
observed_f = 3.2
p_value = 1 - dist.cdf(observed_f)
print(f"P-value for F = {observed_f}: {p_value:.4f}")
# Visualize Different df Combinations
x = np.linspace(0, 5, 500)
for (df1, df2) in [(2, 10), (5, 20), (10, 50)]:
plt.plot(x, f(df1, df2).pdf(x), label=f'df1={df1}, df2={df2}')
plt.xlabel('F')
plt.ylabel('Density')
plt.title('F-Distribution for Various df')
plt.legend()
plt.grid(alpha=0.3)
plt.show()
R Implementation
# Critical Value (df1=5, df2=20, α=0.05)
qf(0.95, df1 = 5, df2 = 20)
# P-value for observed F-statistic
observed_f <- 3.2
pf(observed_f, df1 = 5, df2 = 20, lower.tail = FALSE)
# Visualize
curve(df(x, df1 = 2, df2 = 10), from = 0, to = 5, col = "red", lwd = 2,
ylab = "Density", xlab = "F", main = "F-Distributions")
curve(df(x, df1 = 5, df2 = 20), add = TRUE, col = "blue", lwd = 2)
curve(df(x, df1 = 10, df2 = 50), add = TRUE, col = "green", lwd = 2)
legend("topright",
legend = c("(2,10)", "(5,20)", "(10,50)"),
col = c("red", "blue", "green"), lwd = 2, title = "(df1, df2)")
Interpretation Guide
| Output | Interpretation |
|---|---|
| Output | Interpretation |
| -------- | ---------------- |
| F = 1.0 | Signal = Noise. No effect. |
| F < 1.0 | Noise > Signal. Possible model misspecification or insufficient data. |
| F |
Strong Signal. The groups/model explain significant variation. |
| P-value < 0.05 | Reject |
Related Concepts
- One-Way ANOVA - Primary application.
- Chi-Square Distribution - F is the ratio of two chi-squares.
- T-Distribution - Related via
. - Levene's Test - Uses F-like statistic for variance equality.