Confidence Intervals

Confidence Intervals

Definition

Core Statement

A Confidence Interval (CI) is a range of values, derived from sample data, that is likely to contain the true population parameter with a specified level of confidence (e.g., 95%). It quantifies the uncertainty in an estimate.


Purpose

  1. Provide a range estimate instead of a single point estimate.
  2. Communicate the precision of an estimate.
  3. Support hypothesis testing (if CI excludes a null value, reject H0).

When to Use

Use Confidence Intervals When...

  • Reporting estimates of means, proportions, or regression coefficients.
  • Communicating uncertainty to stakeholders.
  • Comparing groups (CIs for difference or ratio).


Theoretical Background

General Formula (For a Mean)

CI=X¯±Zα/2σn

When σ is unknown (almost always), use the T-Distribution:

CI=X¯±tα/2,n1sn

Interpretation

Correct Interpretation

"If we repeated this experiment many times, 95% of the calculated confidence intervals would contain the true population parameter."

Common Misinterpretation

"There is a 95% probability the true parameter is within this interval." (That's a Bayesian credible interval).

CI Width Factors

Factor Effect on CI Width
Larger n Narrower (more precise).
Larger Confidence Level (99% vs 95%) Wider.
Larger Variability (s) Wider.

CI and Hypothesis Testing


Python Implementation

from scipy import stats
import numpy as np

data = np.array([12, 15, 14, 10, 13, 11, 16, 14])
n = len(data)
mean = np.mean(data)
se = stats.sem(data)  # Standard Error

# 95% CI
ci = stats.t.interval(confidence=0.95, df=n-1, loc=mean, scale=se)
print(f"Mean: {mean:.2f}")
print(f"95% CI: [{ci[0]:.2f}, {ci[1]:.2f}]")

R Implementation

data <- c(12, 15, 14, 10, 13, 11, 16, 14)

# t.test provides mean and 95% CI
t.test(data)$conf.int

# For regression coefficients
model <- lm(Y ~ X, data = df)
confint(model)

Interpretation Guide

Output Interpretation
95% CI: [50, 70] We are 95% confident the true mean is between 50 and 70.
CI for OR: [1.2, 3.5] The odds ratio is significantly > 1. Effect is real.
CI for Diff: [-2, 5] CI includes 0; difference is NOT significant.