Fisher's Exact Test

Fisher's Exact Test

Definition

Core Statement

Fisher's Exact Test determines if there is a significant association between two categorical variables, specifically designed for small sample sizes where the Chi-Square test's approximations are unreliable. It calculates the exact probability of observing the data (or more extreme) under the null hypothesis of independence.


Purpose

  1. Test for association when expected cell counts are < 5.
  2. Provide exact p-values without relying on asymptotic approximations.

When to Use

Use Fisher's Exact Test When...

  • Contingency table is 2x2.
  • Any expected cell count < 5.
  • Sample size is small.

Modern Practice

Fisher's Exact Test can be computationally intensive for large tables, but modern computers handle it easily. It is often used as a default for 2x2 tables regardless of cell counts.


Theoretical Background

Worked Example: Rare Side Effect

Problem

You test a new drug vs placebo.

  • Treatment (n=10): 9 Healthy, 1 Side Effect.
  • Placebo (n=10): 5 Healthy, 5 Side Effects.

Question: Is the drug safer? (Does it reduce side effects?)
Chi-Square fails here because 1 cell has count 1, another has 5. We need Fisher's.

Solution:

Table:

Side Effect Healthy Total
Drug 1 (a) 9 (b) 10
Placebo 5 (c) 5 (d) 10
Total 6 14 20
  1. Calculate Probability of Observed Table:
    Using Hypergeometric probability formula:

    pobs=(1+91)(5+55)(206)=(101)(105)(206)=10×25238760=2520387600.065
  2. Calculate More Extreme Tables:

    • Table with 0 Side Effects in Drug group (Treatment even better).
    • P(0 SE) = (100)(106)(206)=1×210387600.0054.
  3. Total One-Sided P-Value:
    p=pobs+pextreme=0.065+0.00540.0704.

Conclusion: At α=0.05, p=0.07. We fail to reject H0. Even though 1 vs 5 looks big, with n=10, it could be chance.


Theoretical Background

Hypergeometric Distribution

Fisher's test assumes the row and column totals are fixed. The probability of observing exactly a successes in the first group is given by the probability mass function of the Hypergeometric distribution:

P(X=a)=(n1a)(n2ka)(Nk)

Where:

Odds Ratio (Conditional MLE)

Fisher's test estimates the Conditional Maximum Likelihood Estimate of the Odds Ratio, which is more robust for small samples than the simple sample odds ratio (adbc). The sample OR can be if a cell is zero, but Conditional MLE handles this.


Assumptions


Limitations

Pitfalls

  1. Computationally Intensive for Large Tables: For tables larger than 2x2 with large counts, computation can be slow.
  2. Conservative: Fisher's test can be conservative (p-values slightly larger than necessary).


Python Implementation

from scipy.stats import fisher_exact
import numpy as np

# 2x2 Table
#          Disease+  Disease-
# Exposed+     8        2
# Exposed-     1        5
table = np.array([[8, 2], [1, 5]])

odds_ratio, p_val = fisher_exact(table)

print(f"Odds Ratio: {odds_ratio:.2f}")
print(f"p-value: {p_val:.4f}")

R Implementation

# 2x2 Table
tbl <- matrix(c(8, 2, 1, 5), nrow = 2, byrow = TRUE)

result <- fisher.test(tbl)
print(result)

# Output includes:
# - p-value
# - Odds Ratio
# - 95% CI for OR

Interpretation Guide

Output Interpretation
p < 0.05 Significant association exists.
OR = 20 Exposed group has 20x the odds of disease compared to unexposed.
OR = One cell is zero (e.g., No cases in treatment group). Perfect separation.
OR 95% CI excludes 1 The effect is statistically significant.