Poisson Distribution
Poisson Distribution
Definition
Core Statement
The Poisson Distribution models the probability of a given number of events occurring in a fixed interval of time or space, assuming these events occur with a known constant mean rate and independently of the time since the last event.
Purpose
- Model counts of rare events (e.g., car accidents, emails per hour, typos per page).
- Baseline model for Count Regression (Poisson Regression).
- Approximation for Binomial Distribution when
is large and is small.
When to Use
Use Poisson When...
- Counting discrete events (
). - Events are independent.
- The average rate (
) is constant. - Two events cannot occur at the exact same instant.
Theoretical Background
Notation
where
Probability Mass Function (PMF)
(Euler's number). is the factorial of .
Properties
| Property | Formula |
|---|---|
| Mean | |
| Variance | |
| Standard Deviation | |
| Mode |
Equidispersion
A key property of Poisson is that Mean = Variance (
Worked Example: Fast Food Drive-Thru
Problem
A fast food drive-thru gets an average of
Questions:
- What is the probability of exactly 3 cars arriving in a minute?
- What is the probability of 0 cars (a quiet minute)?
Solution:
1. Exactly 3 cars:
Result: ~14% chance.
2. Zero cars:
Result: ~0.67% chance (very rare to be empty).
Assumptions
Limitations
Pitfalls
- Overdispersion: Real data often has variance > mean (clumping). Using Poisson here yields falsely small standard errors.
- Zero-Inflation: If you have many more zeros than predicted (e.g., store is closed), use Zero-Inflated Poisson (ZIP).
- Variable Rate: If rate changes over time (rush hour vs night), use non-homogeneous Poisson process.
Python Implementation
from scipy.stats import poisson
import numpy as np
import matplotlib.pyplot as plt
# Lambda = 5
lam = 5
dist = poisson(mu=lam)
# P(X = 3)
p_3 = dist.pmf(3)
print(f"P(X=3): {p_3:.4f}")
# Visualize
x = np.arange(0, 15)
plt.bar(x, dist.pmf(x), alpha=0.7)
plt.title(f"Poisson Distribution (λ={lam})")
plt.xlabel("Number of Events")
plt.ylabel("Probability")
plt.show()
R Implementation
# Lambda = 5
lam <- 5
# P(X = 3)
dpois(3, lambda = lam)
# P(X <= 2)
ppois(2, lambda = lam)
# Random Sample
rpois(10, lambda = lam)
Interpretation Guide
| Scenario | Interpretation |
|---|---|
| Rare events. Distribution is right-skewed. Mode at 0 or 1. | |
| Frequent events. Distribution looks symmetric (Normal). | |
| Var > Mean | Overdispersion. Model assumption violated. |
Related Concepts
- Poisson Regression - Predicting counts with covariates.
- Exponential Distribution - Time between Poisson events.
- Negative Binomial Regression - For overdispersed counts.
- Binomial Distribution - Poisson is limit as
.