Standard Error
Standard Error
Definition
Core Statement
Standard Error (SE) is the standard deviation of a sampling distribution. It quantifies the precision of a sample statistic (usually the mean) as an estimate of the population parameter. Smaller SE means more precise estimates.
Purpose
- Quantify uncertainty in sample estimates.
- Construct Confidence Intervals.
- Calculate test statistics (t-statistic, z-statistic).
- Distinguish between variability in data (SD) and variability in estimates (SE).
When to Use
Use Standard Error When...
- Reporting the precision of a sample mean.
- Constructing confidence intervals or hypothesis tests.
- Comparing the reliability of different samples.
Theoretical Background
Formula (Standard Error of the Mean)
where:
= sample Standard Deviation = sample size = population SD (usually unknown)
Key Insight
SE Decreases with Sample Size
- Doubling precision requires 4 times the sample size.
- SE captures the Central Limit Theorem (CLT): as
increases, the sampling distribution narrows.
SD vs SE
| Measure | What It Describes | Changes With |
|---|---|---|
| Standard Deviation (SD) | Variability in the data (individual observations). | No. More data doesn't change population spread. |
| Standard Error (SE) | Variability in the sample mean (precision of estimate). | Yes. SE |
Common Confusion
Researchers sometimes report SE when they should report SD (to describe data), or vice versa. Be clear about what you're measuring.
Assumptions
Limitations
Pitfalls
- SE alone is not informative: Always report it with the mean and sample size.
- Assumes IID data: Clustered or dependent data violates assumptions; use robust SE or multilevel models.
Python Implementation
import numpy as np
from scipy import stats
data = np.array([23, 25, 27, 22, 24, 26, 28, 21])
# Mean
mean = np.mean(data)
# Standard Error
se = stats.sem(data) # Equivalent to: np.std(data, ddof=1) / np.sqrt(len(data))
print(f"Mean: {mean:.2f}")
print(f"SE: {se:.2f}")
# 95% Confidence Interval using SE
ci = stats.t.interval(confidence=0.95, df=len(data)-1, loc=mean, scale=se)
print(f"95% CI: [{ci[0]:.2f}, {ci[1]:.2f}]")
R Implementation
data <- c(23, 25, 27, 22, 24, 26, 28, 21)
# Mean
mean(data)
# Standard Error (manual)
se <- sd(data) / sqrt(length(data))
print(se)
# Alternatively, use plotrix package
library(plotrix)
std.error(data)
# 95% CI
t.test(data)$conf.int
Interpretation Guide
| Scenario | Interpretation |
|---|---|
| Mean = 50, SE = 2 | The true population mean is likely within ±4 of 50 (roughly 2 SE). |
| SE = 0.5 with n=100 vs SE = 2 with n=25 | Larger sample gives more precise estimate. |
| SE decreases by half | Precision has doubled (requires 4x sample size). |
Related Concepts
- Standard Deviation - Variability in data.
- Central Limit Theorem (CLT) - Why SE exists.
- Confidence Intervals - Built from SE.
- T-Distribution - Used with SE when
unknown.