Type I & Type II Errors

Definition

Core Statement

Type I Error ( $α$ ): Rejecting a true null hypothesis (False Positive).
Type II Error ( $β$ ): Failing to reject a false null hypothesis (False Negative).

Purpose

Understand the tradeoffs in hypothesis testing.
Inform decisions about significance levels and power.
Evaluate real-world consequences of statistical decisions.

Theoretical Background

Decision Matrix

.	$H_{0}$ is True	$H_{0}$ is False
Reject $H_{0}$	Type I Error ( $α$ ) False Positive	Correct Decision True Positive (Power = $1 - β$ )
Fail to Reject $H_{0}$	Correct Decision True Negative	Type II Error ( $β$ ) False Negative

Key Relationships

$α$ (Significance Level): Probability of Type I error. Typically 0.05.
$β$ : Probability of Type II error. Typically 0.20.
Power ( $1 - β$ ): Probability of correctly detecting a true effect. Typically 0.80.

Tradeoffs

The Alpha-Beta Tradeoff

Lowering $α$ (stricter significance threshold) $\to$ Increases $β$ (more false negatives).
Increasing Power (lower $β$ ) $\to$ Requires larger $n$ or accepting higher $α$ .

Real-World Consequences

Error Type	Medical Example	Court Example
Type I (False Positive)	Diagnosing a healthy person as sick. (Unnecessary treatment).	Convicting an innocent person.
Type II (False Negative)	Missing a real disease. (No treatment for sick patient).	Acquitting a guilty person.

Context Determines Priority

Medical Screening: Minimize Type II (don't miss disease).
Criminal Justice: Minimize Type I ("beyond reasonable doubt").
Drug Trials: Balance both (FDA requires rigorous evidence).

Worked Example: A/B Testing

E-Commerce Experiment

A company tests a new checkout button color to increase conversion rate.

$H_{0}$ : New button has same conversion rate as old.
$H_{1}$ : New button has different conversion rate.
Significance Level ( $α$ ): 0.05
Power ( $1 - β$ ): 0.80

Scenarios:

Type I Error (False Alarm):
- Reality: The button makes no difference.
- Test Result: Significant ( $p < 0.05$ ).
- Consequence: Engineers waste time implementing a useless change. We think we improved, but we didn't. (5% chance of this).
Type II Error (Missed Opportunity):
- Reality: The button increases sales by 5%.
- Test Result: Not Significant ( $p > 0.05$ ).
- Consequence: We discard a winning idea because our sample size was too small to detect it. (20% chance of this).

Controlling Errors

Strategy	Effect on Type I ( $α$ )	Effect on Type II ( $β$ )
Decreasing $α$ (0.05 $\to$ 0.01)	$↓$ Decreases	$↑$ Increases (Harder to detect real effects)
Increasing Sample Size ( $n$ )	No change (fixed by design)	$↓$ Decreases (Power increases)
One-Tailed Test	No change	$↓$ Decreases (For that direction only)

Limitations & Pitfalls

Common Traps

P-Hacking (Alpha Inflation): Running 20 different tests and reporting the one that worked. This inflates the family-wise Type I error rate to nearly 64% ( $1 - {0.95}^{20}$ ).
Overpowering: With massive sample sizes ( $n = 1, 000, 000$ ), even tiny, irrelevant differences become "statistically significant" (Low Type II error, but practical insignificance).
Ignoring Power: Running a study with $n = 10$ is often a waste of time because Type II error is nearly guaranteed if the effect is small.

Python Implementation

# The alpha and beta are controlled by study design, not calculated directly.
# Use Power Analysis to understand the tradeoff.

from statsmodels.stats.power import TTestIndPower
analysis = TTestIndPower()

# Given: d=0.5, n=50, alpha=0.05
# What is the Type II error rate (beta)?
power = analysis.solve_power(effect_size=0.5, nobs1=50, alpha=0.05)
beta = 1 - power
print(f"Power: {power:.2f}")
print(f"Beta (Type II Error): {beta:.2f}")

R Implementation

library(pwr)

# Calculate Power
result <- pwr.t.test(d = 0.5, n = 50, sig.level = 0.05)
power <- result$power
beta <- 1 - power

cat("Power:", round(power, 3), "\n")
cat("Beta (Type II Error):", round(beta, 3), "\n")

Interpretation Guide

Scenario	Implication
$α = 0.05$	We accept a 1 in 20 chance of a False Positive.
Power = 0.80	We accept a 1 in 5 chance of missing a real effect.
Large $n$ , Small P	Likely a real effect, but check Effect Size for practical relevance.
Small $n$ , High P	Inconclusive. Could be no effect, or could be Type II Error.

Hypothesis Testing (P-Value & CI)
Power Analysis
Bonferroni Correction - Controls family-wise Type I error.

Type I & Type II Errors

Definition

Purpose

Theoretical Background

Decision Matrix

Key Relationships

Tradeoffs

Real-World Consequences

Worked Example: A/B Testing

Controlling Errors

Limitations & Pitfalls

Python Implementation

R Implementation

Interpretation Guide

Related Concepts