T-Distribution
T-Distribution
Definition
The Student's t-distribution is a probability distribution used when estimating the mean of a normally distributed population in situations where the sample size is small (
Purpose
- Constructing Confidence Intervals for the mean when
is unknown. - Hypothesis Testing (Student's T-Test, Welch's T-Test) comparing means.
- Regression Inference - Coefficients in Simple Linear Regression are t-distributed under
.
When to Use
- Sample size is small (
). - Population standard deviation (
) is unknown and estimated by sample SD ( ). - Data is approximately normal (or
is large enough for CLT to apply).
Modern software uses the t-distribution by default for mean comparisons, even for large
Theoretical Background
Degrees of Freedom ( )
The shape of the t-distribution is controlled by the degrees of freedom, typically
| Behavior | |
|---|---|
| Small (e.g., 3) | Very heavy tails; accounts for high uncertainty. |
| Medium (e.g., 15) | Moderately heavy tails. |
| Large (e.g., |
Almost indistinguishable from Normal. |
Mathematical Definition
If
Comparison: T vs Normal
| Feature | Normal Distribution ( |
T-Distribution ( |
|---|---|---|
| When |
Known | Unknown (Estimated by |
| Parameter | None (Standard) | Degrees of Freedom ( |
| Tail Thickness | Thin | Heavy (More extreme outcomes) |
| Critical Value (95%, Two-Tailed) | 1.96 (Always) | Varies: 2.57 ( |
Worked Example: Confidence Interval
A factory produces screws with a target length of 50 mm. A quality engineer takes a random sample of
- Sample Mean (
): 50.02 mm - Sample Standard Deviation (
): 0.05 mm
Task: Construct a 95% Confidence Interval for the true mean length.
Solution:
-
Identify Parameters:
, so . (95% confidence). - Since
and is unknown, we use the t-distribution.
-
Find Critical Value (
): - From t-table for
and two-tailed : .
- From t-table for
-
Calculate Margin of Error (ME):
-
Construct Interval:
Interpretation:
We are 95% confident that the true mean length of the screws is between 49.98 mm and 50.06 mm. Since 50 mm is inside this interval, the process is likely on target.
Assumptions
Limitations
- Skewed Data in Small Samples: If
and the data is highly skewed (e.g., reaction times), the t-distribution is not valid and CIs will be incorrect. Use bootstrapping or Wilcoxon tests. - The "Z instead of T" Error: A common mistake is using
-score (1.96) for small samples. This produces an interval that is too narrow, leading to false confidence (underestimating uncertainty). - Outliers: Since standard deviation
is not robust to outliers, the t-statistic can be severely distorted by a single extreme value.
Python Implementation
from scipy import stats
import numpy as np
import matplotlib.pyplot as plt
# --- Critical Values ---
# 95% Confidence, Two-Tailed
df_values = [5, 15, 30, 100]
for df in df_values:
t_crit = stats.t.ppf(0.975, df=df)
print(f"df = {df:3}: t_critical = {t_crit:.3f}")
# Compare: Normal
print(f"Z_critical (Normal): {stats.norm.ppf(0.975):.3f}")
# --- Visualization ---
x = np.linspace(-4, 4, 500)
plt.plot(x, stats.norm.pdf(x), 'k-', lw=2, label='Normal (Z)')
for df in [2, 5, 30]:
plt.plot(x, stats.t.pdf(x, df=df), '--', label=f't (df={df})')
plt.legend()
plt.title("T-Distribution vs Normal: Tail Comparison")
plt.show()
R Implementation
# --- Critical Values ---
# 95% Confidence, Two-Tailed
df_values <- c(5, 15, 30, 100)
for (df in df_values) {
t_crit <- qt(0.975, df = df)
cat(sprintf("df = %3d: t_critical = %.3f\n", df, t_crit))
}
# Compare: Normal
cat(sprintf("Z_critical (Normal): %.3f\n", qnorm(0.975)))
# --- Visualization ---
curve(dnorm(x), from = -4, to = 4, lwd = 2, col = "black",
ylab = "Density", main = "T vs Normal")
curve(dt(x, df = 2), add = TRUE, col = "red", lwd = 2, lty = 2)
curve(dt(x, df = 5), add = TRUE, col = "blue", lwd = 2, lty = 2)
curve(dt(x, df = 30), add = TRUE, col = "green", lwd = 2, lty = 2)
legend("topright",
legend = c("Normal", "t(df=2)", "t(df=5)", "t(df=30)"),
col = c("black", "red", "blue", "green"), lwd = 2, lty = c(1,2,2,2))
Interpretation Guide
| Value | Interpretation |
|---|---|
| Value | Interpretation |
| ------- | ---------------- |
| **$ | t |
| Small |
Fat tails. Much higher chance of extreme values than Normal. Critical values are large (e.g., > 2.5). |
| Large |
T |
| Confidence Interval | Wider than Z-interval reflecting uncertainty about |
Related Concepts
- Normal Distribution - The ideal scenario (known
). - Student's T-Test - Primary application.
- Confidence Intervals - Calculated using t-scores.
- Welch's T-Test - Robust version when variances differ.
- Degrees of Freedom - The shape parameter.