Poisson Distribution

Poisson Distribution

Definition

Core Statement

The Poisson Distribution models the probability of a given number of events occurring in a fixed interval of time or space, assuming these events occur with a known constant mean rate and independently of the time since the last event.


Purpose

  1. Model counts of rare events (e.g., car accidents, emails per hour, typos per page).
  2. Baseline model for Count Regression (Poisson Regression).
  3. Approximation for Binomial Distribution when n is large and p is small.

When to Use

Use Poisson When...

  • Counting discrete events (k=0,1,2,).
  • Events are independent.
  • The average rate (λ) is constant.
  • Two events cannot occur at the exact same instant.


Theoretical Background

Notation

XPoisson(λ)

where λ (lambda) is the average number of events per interval (λ>0).

Probability Mass Function (PMF)

P(X=k)=λkeλk!

Properties

Property Formula
Mean E[X]=λ
Variance Var(X)=λ
Standard Deviation σ=λ
Mode λ
Equidispersion

A key property of Poisson is that Mean = Variance (λ). If Variance > Mean, data is Overdispersed (use Negative Binomial).


Worked Example: Fast Food Drive-Thru

Problem

A fast food drive-thru gets an average of λ=5 cars per minute during lunch.

Questions:

  1. What is the probability of exactly 3 cars arriving in a minute?
  2. What is the probability of 0 cars (a quiet minute)?

Solution:

1. Exactly 3 cars:

P(X=3)=53e53!=125×0.006760.140

Result: ~14% chance.

2. Zero cars:

P(X=0)=50e50!=1×0.006710.0067

Result: ~0.67% chance (very rare to be empty).


Assumptions


Limitations

Pitfalls

  1. Overdispersion: Real data often has variance > mean (clumping). Using Poisson here yields falsely small standard errors.
  2. Zero-Inflation: If you have many more zeros than predicted (e.g., store is closed), use Zero-Inflated Poisson (ZIP).
  3. Variable Rate: If rate changes over time (rush hour vs night), use non-homogeneous Poisson process.


Python Implementation

from scipy.stats import poisson
import numpy as np
import matplotlib.pyplot as plt

# Lambda = 5
lam = 5
dist = poisson(mu=lam)

# P(X = 3)
p_3 = dist.pmf(3)
print(f"P(X=3): {p_3:.4f}")

# Visualize
x = np.arange(0, 15)
plt.bar(x, dist.pmf(x), alpha=0.7)
plt.title(f"Poisson Distribution (λ={lam})")
plt.xlabel("Number of Events")
plt.ylabel("Probability")
plt.show()

R Implementation

# Lambda = 5
lam <- 5

# P(X = 3)
dpois(3, lambda = lam)

# P(X <= 2)
ppois(2, lambda = lam)

# Random Sample
rpois(10, lambda = lam)

Interpretation Guide

Scenario Interpretation
λ=1 Rare events. Distribution is right-skewed. Mode at 0 or 1.
λ=20 Frequent events. Distribution looks symmetric (Normal).
Var > Mean Overdispersion. Model assumption violated.