Confidence Intervals
Confidence Intervals
Definition
Core Statement
A Confidence Interval (CI) is a range of values, derived from sample data, that is likely to contain the true population parameter with a specified level of confidence (e.g., 95%). It quantifies the uncertainty in an estimate.
Purpose
- Provide a range estimate instead of a single point estimate.
- Communicate the precision of an estimate.
- Support hypothesis testing (if CI excludes a null value, reject
).
When to Use
Use Confidence Intervals When...
- Reporting estimates of means, proportions, or regression coefficients.
- Communicating uncertainty to stakeholders.
- Comparing groups (CIs for difference or ratio).
Theoretical Background
General Formula (For a Mean)
When
Interpretation
Correct Interpretation
"If we repeated this experiment many times, 95% of the calculated confidence intervals would contain the true population parameter."
Common Misinterpretation
"There is a 95% probability the true parameter is within this interval." (That's a Bayesian credible interval).
CI Width Factors
| Factor | Effect on CI Width |
|---|---|
| Larger |
Narrower (more precise). |
| Larger Confidence Level (99% vs 95%) | Wider. |
| Larger Variability ( |
Wider. |
CI and Hypothesis Testing
- 95% CI for a difference (e.g.,
): If CI excludes 0, the difference is significant at . - 95% CI for an Odds Ratio: If CI excludes 1, the effect is significant.
Python Implementation
from scipy import stats
import numpy as np
data = np.array([12, 15, 14, 10, 13, 11, 16, 14])
n = len(data)
mean = np.mean(data)
se = stats.sem(data) # Standard Error
# 95% CI
ci = stats.t.interval(confidence=0.95, df=n-1, loc=mean, scale=se)
print(f"Mean: {mean:.2f}")
print(f"95% CI: [{ci[0]:.2f}, {ci[1]:.2f}]")
R Implementation
data <- c(12, 15, 14, 10, 13, 11, 16, 14)
# t.test provides mean and 95% CI
t.test(data)$conf.int
# For regression coefficients
model <- lm(Y ~ X, data = df)
confint(model)
Interpretation Guide
| Output | Interpretation |
|---|---|
| 95% CI: [50, 70] | We are 95% confident the true mean is between 50 and 70. |
| CI for OR: [1.2, 3.5] | The odds ratio is significantly > 1. Effect is real. |
| CI for Diff: [-2, 5] | CI includes 0; difference is NOT significant. |
Related Concepts
- Hypothesis Testing (P-Value & CI)
- Standard Error
- T-Distribution
- Bayesian Statistics - Credible Intervals.