Type I & Type II Errors
Type I & Type II Errors
Definition
Core Statement
Type I Error (
Type II Error (
Purpose
- Understand the tradeoffs in hypothesis testing.
- Inform decisions about significance levels and power.
- Evaluate real-world consequences of statistical decisions.
Theoretical Background
Decision Matrix
| . | ||
|---|---|---|
| Reject |
Type I Error ( |
Correct Decision True Positive (Power = |
| Fail to Reject |
Correct Decision True Negative | Type II Error ( |
Key Relationships
(Significance Level): Probability of Type I error. Typically 0.05. : Probability of Type II error. Typically 0.20. - Power (
): Probability of correctly detecting a true effect. Typically 0.80.
Tradeoffs
The Alpha-Beta Tradeoff
- Lowering
(stricter significance threshold) Increases (more false negatives). - Increasing Power (lower
) Requires larger or accepting higher .
Real-World Consequences
| Error Type | Medical Example | Court Example |
|---|---|---|
| Type I (False Positive) | Diagnosing a healthy person as sick. (Unnecessary treatment). | Convicting an innocent person. |
| Type II (False Negative) | Missing a real disease. (No treatment for sick patient). | Acquitting a guilty person. |
Context Determines Priority
- Medical Screening: Minimize Type II (don't miss disease).
- Criminal Justice: Minimize Type I ("beyond reasonable doubt").
- Drug Trials: Balance both (FDA requires rigorous evidence).
Worked Example: A/B Testing
E-Commerce Experiment
A company tests a new checkout button color to increase conversion rate.
: New button has same conversion rate as old. : New button has different conversion rate. - Significance Level (
): 0.05 - Power (
): 0.80
Scenarios:
-
Type I Error (False Alarm):
- Reality: The button makes no difference.
- Test Result: Significant (
). - Consequence: Engineers waste time implementing a useless change. We think we improved, but we didn't. (5% chance of this).
-
Type II Error (Missed Opportunity):
- Reality: The button increases sales by 5%.
- Test Result: Not Significant (
). - Consequence: We discard a winning idea because our sample size was too small to detect it. (20% chance of this).
Controlling Errors
| Strategy | Effect on Type I ( |
Effect on Type II ( |
|---|---|---|
| Decreasing |
||
| Increasing Sample Size ( |
No change (fixed by design) | |
| One-Tailed Test | No change |
Limitations & Pitfalls
Common Traps
- P-Hacking (Alpha Inflation): Running 20 different tests and reporting the one that worked. This inflates the family-wise Type I error rate to nearly 64% (
). - Overpowering: With massive sample sizes (
), even tiny, irrelevant differences become "statistically significant" (Low Type II error, but practical insignificance). - Ignoring Power: Running a study with
is often a waste of time because Type II error is nearly guaranteed if the effect is small.
Python Implementation
# The alpha and beta are controlled by study design, not calculated directly.
# Use Power Analysis to understand the tradeoff.
from statsmodels.stats.power import TTestIndPower
analysis = TTestIndPower()
# Given: d=0.5, n=50, alpha=0.05
# What is the Type II error rate (beta)?
power = analysis.solve_power(effect_size=0.5, nobs1=50, alpha=0.05)
beta = 1 - power
print(f"Power: {power:.2f}")
print(f"Beta (Type II Error): {beta:.2f}")
R Implementation
library(pwr)
# Calculate Power
result <- pwr.t.test(d = 0.5, n = 50, sig.level = 0.05)
power <- result$power
beta <- 1 - power
cat("Power:", round(power, 3), "\n")
cat("Beta (Type II Error):", round(beta, 3), "\n")
Interpretation Guide
| Scenario | Implication |
|---|---|
| We accept a 1 in 20 chance of a False Positive. | |
| Power = 0.80 | We accept a 1 in 5 chance of missing a real effect. |
| Large |
Likely a real effect, but check Effect Size for practical relevance. |
| Small |
Inconclusive. Could be no effect, or could be Type II Error. |
Related Concepts
- Hypothesis Testing (P-Value & CI)
- Power Analysis
- Bonferroni Correction - Controls family-wise Type I error.