Confidence Intervals

Definition

Core Statement

A Confidence Interval (CI) is a range of values, derived from sample data, that is likely to contain the true population parameter with a specified level of confidence (e.g., 95%). It quantifies the uncertainty in an estimate.

Purpose

Provide a range estimate instead of a single point estimate.
Communicate the precision of an estimate.
Support hypothesis testing (if CI excludes a null value, reject $H_{0}$ ).

When to Use

Use Confidence Intervals When...

Reporting estimates of means, proportions, or regression coefficients.
Communicating uncertainty to stakeholders.
Comparing groups (CIs for difference or ratio).

Theoretical Background

General Formula (For a Mean)

C I = \bar{X} \pm Z_{α / 2} \cdot \frac{σ}{\sqrt{n}}

When $σ$ is unknown (almost always), use the T-Distribution:

C I = \bar{X} \pm t_{α / 2, n - 1} \cdot \frac{s}{\sqrt{n}}

Interpretation

Correct Interpretation

"If we repeated this experiment many times, 95% of the calculated confidence intervals would contain the true population parameter."

Common Misinterpretation

~~"There is a 95% probability the true parameter is within this interval."~~ (That's a Bayesian credible interval).

CI Width Factors

Factor	Effect on CI Width
Larger $n$	Narrower (more precise).
Larger Confidence Level (99% vs 95%)	Wider.
Larger Variability ( $s$ )	Wider.

CI and Hypothesis Testing

95% CI for a difference (e.g., $μ_{1} - μ_{2}$ ): If CI excludes 0, the difference is significant at $α = 0.05$ .
95% CI for an Odds Ratio: If CI excludes 1, the effect is significant.

Python Implementation

from scipy import stats
import numpy as np

data = np.array([12, 15, 14, 10, 13, 11, 16, 14])
n = len(data)
mean = np.mean(data)
se = stats.sem(data)  # Standard Error

# 95% CI
ci = stats.t.interval(confidence=0.95, df=n-1, loc=mean, scale=se)
print(f"Mean: {mean:.2f}")
print(f"95% CI: [{ci[0]:.2f}, {ci[1]:.2f}]")

R Implementation

data <- c(12, 15, 14, 10, 13, 11, 16, 14)

# t.test provides mean and 95% CI
t.test(data)$conf.int

# For regression coefficients
model <- lm(Y ~ X, data = df)
confint(model)

Interpretation Guide

Output	Interpretation
95% CI: [50, 70]	We are 95% confident the true mean is between 50 and 70.
CI for OR: [1.2, 3.5]	The odds ratio is significantly > 1. Effect is real.
CI for Diff: [-2, 5]	CI includes 0; difference is NOT significant.

Hypothesis Testing (P-Value & CI)
Standard Error
T-Distribution
Bayesian Statistics - Credible Intervals.

Confidence Intervals

Definition

Purpose

When to Use

Theoretical Background

General Formula (For a Mean)

Interpretation

CI Width Factors

CI and Hypothesis Testing

Python Implementation

R Implementation

Interpretation Guide

Related Concepts