Standard Error

Definition

Core Statement

Standard Error (SE) is the standard deviation of a sampling distribution. It quantifies the precision of a sample statistic (usually the mean) as an estimate of the population parameter. Smaller SE means more precise estimates.

Purpose

Quantify uncertainty in sample estimates.
Construct Confidence Intervals.
Calculate test statistics (t-statistic, z-statistic).
Distinguish between variability in data (SD) and variability in estimates (SE).

When to Use

Use Standard Error When...

Reporting the precision of a sample mean.
Constructing confidence intervals or hypothesis tests.
Comparing the reliability of different samples.

Theoretical Background

Formula (Standard Error of the Mean)

S E = \frac{s}{\sqrt{n}} = \frac{σ}{\sqrt{n}}

where:

$s$ = sample Standard Deviation
$n$ = sample size
$σ$ = population SD (usually unknown)

Key Insight

SE Decreases with Sample Size

Doubling precision requires 4 times the sample size.
SE captures the Central Limit Theorem (CLT): as $n$ increases, the sampling distribution narrows.

SD vs SE

Measure	What It Describes	Changes With $n$ ?
Standard Deviation (SD)	Variability in the data (individual observations).	No. More data doesn't change population spread.
Standard Error (SE)	Variability in the sample mean (precision of estimate).	Yes. SE $\propto 1 / \sqrt{n}$ .

Common Confusion

Researchers sometimes report SE when they should report SD (to describe data), or vice versa. Be clear about what you're measuring.

Assumptions

Random Sampling: Sample is representative of the population.
Independence: Observations are independent.
For small $n$ : Normality (or use T-Distribution).

Limitations

Pitfalls

SE alone is not informative: Always report it with the mean and sample size.

Assumes IID data: Clustered or dependent data violates assumptions; use robust SE or multilevel models.

Python Implementation

import numpy as np
from scipy import stats

data = np.array([23, 25, 27, 22, 24, 26, 28, 21])

# Mean
mean = np.mean(data)

# Standard Error
se = stats.sem(data)  # Equivalent to: np.std(data, ddof=1) / np.sqrt(len(data))

print(f"Mean: {mean:.2f}")
print(f"SE: {se:.2f}")

# 95% Confidence Interval using SE
ci = stats.t.interval(confidence=0.95, df=len(data)-1, loc=mean, scale=se)
print(f"95% CI: [{ci[0]:.2f}, {ci[1]:.2f}]")

R Implementation

data <- c(23, 25, 27, 22, 24, 26, 28, 21)

# Mean
mean(data)

# Standard Error (manual)
se <- sd(data) / sqrt(length(data))
print(se)

# Alternatively, use plotrix package
library(plotrix)
std.error(data)

# 95% CI
t.test(data)$conf.int

Interpretation Guide

Scenario	Interpretation
Mean = 50, SE = 2	The true population mean is likely within ±4 of 50 (roughly 2 SE).
SE = 0.5 with n=100 vs SE = 2 with n=25	Larger sample gives more precise estimate.
SE decreases by half	Precision has doubled (requires 4x sample size).

Standard Deviation - Variability in data.
Central Limit Theorem (CLT) - Why SE exists.
Confidence Intervals - Built from SE.
T-Distribution - Used with SE when $σ$ unknown.