Standard Error

Standard Error

Definition

Core Statement

Standard Error (SE) is the standard deviation of a sampling distribution. It quantifies the precision of a sample statistic (usually the mean) as an estimate of the population parameter. Smaller SE means more precise estimates.


Purpose

  1. Quantify uncertainty in sample estimates.
  2. Construct Confidence Intervals.
  3. Calculate test statistics (t-statistic, z-statistic).
  4. Distinguish between variability in data (SD) and variability in estimates (SE).

When to Use

Use Standard Error When...

  • Reporting the precision of a sample mean.
  • Constructing confidence intervals or hypothesis tests.
  • Comparing the reliability of different samples.


Theoretical Background

Formula (Standard Error of the Mean)

SE=sn=σn

where:

Key Insight

SE Decreases with Sample Size

  • Doubling precision requires 4 times the sample size.
  • SE captures the Central Limit Theorem (CLT): as n increases, the sampling distribution narrows.

SD vs SE

Measure What It Describes Changes With n?
Standard Deviation (SD) Variability in the data (individual observations). No. More data doesn't change population spread.
Standard Error (SE) Variability in the sample mean (precision of estimate). Yes. SE 1/n.
Common Confusion

Researchers sometimes report SE when they should report SD (to describe data), or vice versa. Be clear about what you're measuring.


Assumptions


Limitations

Pitfalls

  1. SE alone is not informative: Always report it with the mean and sample size.

  1. Assumes IID data: Clustered or dependent data violates assumptions; use robust SE or multilevel models.

Python Implementation

import numpy as np
from scipy import stats

data = np.array([23, 25, 27, 22, 24, 26, 28, 21])

# Mean
mean = np.mean(data)

# Standard Error
se = stats.sem(data)  # Equivalent to: np.std(data, ddof=1) / np.sqrt(len(data))

print(f"Mean: {mean:.2f}")
print(f"SE: {se:.2f}")

# 95% Confidence Interval using SE
ci = stats.t.interval(confidence=0.95, df=len(data)-1, loc=mean, scale=se)
print(f"95% CI: [{ci[0]:.2f}, {ci[1]:.2f}]")

R Implementation

data <- c(23, 25, 27, 22, 24, 26, 28, 21)

# Mean
mean(data)

# Standard Error (manual)
se <- sd(data) / sqrt(length(data))
print(se)

# Alternatively, use plotrix package
library(plotrix)
std.error(data)

# 95% CI
t.test(data)$conf.int

Interpretation Guide

Scenario Interpretation
Mean = 50, SE = 2 The true population mean is likely within ±4 of 50 (roughly 2 SE).
SE = 0.5 with n=100 vs SE = 2 with n=25 Larger sample gives more precise estimate.
SE decreases by half Precision has doubled (requires 4x sample size).