Hypothesis Testing (P-Value & CI)
Hypothesis Testing (P-Value & CI)
Definition
Core Statement
Hypothesis Testing is a statistical method used to make inferences about a population based on sample data. It involves formulating a Null Hypothesis (
Purpose
- To determine whether observed data is consistent with a specific claim (e.g., "The drug has no effect").
- To provide a framework for making decisions under uncertainty.
- To quantify the strength of evidence against
via the p-value.
When to Use
Use Hypothesis Testing When...
- You have a specific claim to test (e.g., "The mean is 50").
- You want to decide between two competing hypotheses.
- You need a standardized framework for scientific inquiry.
Limitations of NHST
- It does not tell you the probability that
is true. - A significant p-value does not imply a large or important effect.
- Over-reliance can lead to "p-hacking" and reproducibility issues.
Theoretical Background
The Hypotheses
| Hypothesis | Symbol | Description |
|---|---|---|
| Null Hypothesis | The default assumption; typically "no effect" or "no difference." | |
| Alternative Hypothesis | The claim we are testing for; "there is an effect." |
The P-Value
Critical Definition
The p-value is the probability of obtaining a test statistic at least as extreme as the one observed, assuming
Interpretation:
- Small p-value (
): The observed data is unlikely under . This is surprising. Reject . - Large p-value (
): The observed data is consistent with . Not surprising. Fail to reject .
Significance Level ( )
The pre-defined threshold for rejecting
Confidence Intervals (CI)
A Confidence Interval is a range of plausible values for the population parameter.
- A 95% CI means: "If we repeated this experiment many times, 95% of the calculated intervals would contain the true parameter."
CI and P-Value Relationship
- If the 95% CI for a difference contains 0, the result is not significant at
. - If the 95% CI for an Odds Ratio contains 1, the result is not significant.
Decision Errors
| . | ||
|---|---|---|
| Reject |
Type I Error ( |
Correct Decision True Positive (Power = |
| Fail to Reject |
Correct Decision True Negative | Type II Error ( |
- Type I Error (
): Finding an effect that doesn't exist. - Type II Error (
): Missing an effect that does exist. - Power (
): The probability of correctly detecting a true effect. Aim for Power .
Assumptions
Limitations
Common Pitfalls
- P-value is NOT
. It is . These are not the same (Prosecutor's Fallacy). - Statistical Significance
Practical Significance. A p-value of 0.001 for a tiny effect (e.g., 0.001 kg weight loss) is meaningless. Always report Effect Size Measures. - Multiple Comparisons Problem: Running 20 tests at
guarantees roughly 1 false positive by chance. Use Bonferroni Correction or FDR. - Dichotomization: Treating
as "significant" and as "not significant" ignores the inherent uncertainty.
Python Implementation
from scipy import stats
import numpy as np
# Scenario: We test if a coin is biased (H0: p = 0.5)
# Observed: 60 heads in 100 tosses.
# Binomial Test
result = stats.binomtest(k=60, n=100, p=0.5, alternative='two-sided')
print(f"P-value: {result.pvalue:.4f}")
print(f"95% CI for p: {result.proportion_ci(confidence_level=0.95)}")
if result.pvalue < 0.05:
print("Reject H0: The coin is likely biased.")
else:
print("Fail to Reject H0: Could be fair.")
R Implementation
# Scenario: 60 Heads, 100 Tosses, H0: p = 0.5
test_res <- binom.test(x = 60, n = 100, p = 0.5)
print(test_res)
# Output:
# - p-value
# - 95% Confidence Interval
# - Sample estimate of p
# Interpretation
if(test_res$p.value < 0.05) {
cat("Reject H0: Coin is biased.\n")
} else {
cat("Fail to Reject H0: Coin may be fair.\n")
}
Interpretation Guide
| Scenario | Interpretation |
|---|---|
| p = 0.03 | Evidence against |
| p = 0.15 | Not enough evidence against |
| 95% CI = [1.2, 3.5] for OR | The effect is significant (doesn't contain 1) and the OR is between 1.2 and 3.5. |
| 95% CI = [-0.5, 0.8] for Diff | The effect is NOT significant (contains 0). |
Related Concepts
- Type I & Type II Errors
- Power Analysis
- Confidence Intervals
- Effect Size Measures
- Bonferroni Correction
- Bayesian Statistics - An alternative framework.