Type I & Type II Errors

Type I & Type II Errors

Definition

Core Statement

Type I Error (α): Rejecting a true null hypothesis (False Positive).
Type II Error (β): Failing to reject a false null hypothesis (False Negative).


Purpose

  1. Understand the tradeoffs in hypothesis testing.
  2. Inform decisions about significance levels and power.
  3. Evaluate real-world consequences of statistical decisions.

Theoretical Background

Decision Matrix

. H0 is True H0 is False
Reject H0 Type I Error (α) False Positive Correct Decision True Positive (Power = 1β)
Fail to Reject H0 Correct Decision True Negative Type II Error (β) False Negative

Key Relationships

Tradeoffs

The Alpha-Beta Tradeoff

  • Lowering α (stricter significance threshold) Increases β (more false negatives).
  • Increasing Power (lower β) Requires larger n or accepting higher α.


Real-World Consequences

Error Type Medical Example Court Example
Type I (False Positive) Diagnosing a healthy person as sick. (Unnecessary treatment). Convicting an innocent person.
Type II (False Negative) Missing a real disease. (No treatment for sick patient). Acquitting a guilty person.
Context Determines Priority

  • Medical Screening: Minimize Type II (don't miss disease).
  • Criminal Justice: Minimize Type I ("beyond reasonable doubt").
  • Drug Trials: Balance both (FDA requires rigorous evidence).


Worked Example: A/B Testing

E-Commerce Experiment

A company tests a new checkout button color to increase conversion rate.

  • H0: New button has same conversion rate as old.
  • H1: New button has different conversion rate.
  • Significance Level (α): 0.05
  • Power (1β): 0.80

Scenarios:

  1. Type I Error (False Alarm):

    • Reality: The button makes no difference.
    • Test Result: Significant (p<0.05).
    • Consequence: Engineers waste time implementing a useless change. We think we improved, but we didn't. (5% chance of this).
  2. Type II Error (Missed Opportunity):

    • Reality: The button increases sales by 5%.
    • Test Result: Not Significant (p>0.05).
    • Consequence: We discard a winning idea because our sample size was too small to detect it. (20% chance of this).

Controlling Errors

Strategy Effect on Type I (α) Effect on Type II (β)
Decreasing α (0.05 0.01) Decreases Increases (Harder to detect real effects)
Increasing Sample Size (n) No change (fixed by design) Decreases (Power increases)
One-Tailed Test No change Decreases (For that direction only)

Limitations & Pitfalls

Common Traps

  1. P-Hacking (Alpha Inflation): Running 20 different tests and reporting the one that worked. This inflates the family-wise Type I error rate to nearly 64% (10.9520).
  2. Overpowering: With massive sample sizes (n=1,000,000), even tiny, irrelevant differences become "statistically significant" (Low Type II error, but practical insignificance).
  3. Ignoring Power: Running a study with n=10 is often a waste of time because Type II error is nearly guaranteed if the effect is small.


Python Implementation

# The alpha and beta are controlled by study design, not calculated directly.
# Use Power Analysis to understand the tradeoff.

from statsmodels.stats.power import TTestIndPower
analysis = TTestIndPower()

# Given: d=0.5, n=50, alpha=0.05
# What is the Type II error rate (beta)?
power = analysis.solve_power(effect_size=0.5, nobs1=50, alpha=0.05)
beta = 1 - power
print(f"Power: {power:.2f}")
print(f"Beta (Type II Error): {beta:.2f}")

R Implementation

library(pwr)

# Calculate Power
result <- pwr.t.test(d = 0.5, n = 50, sig.level = 0.05)
power <- result$power
beta <- 1 - power

cat("Power:", round(power, 3), "\n")
cat("Beta (Type II Error):", round(beta, 3), "\n")

Interpretation Guide

Scenario Implication
α=0.05 We accept a 1 in 20 chance of a False Positive.
Power = 0.80 We accept a 1 in 5 chance of missing a real effect.
Large n, Small P Likely a real effect, but check Effect Size for practical relevance.
Small n, High P Inconclusive. Could be no effect, or could be Type II Error.