Binomial Distribution
Binomial Distribution
Definition
The Binomial Distribution models the number of successes in a fixed number of independent trials, where each trial has the same probability of success. It answers: "If I flip a coin 10 times, what's the probability of getting exactly 6 heads?"
Purpose
- Model discrete outcomes with two possibilities (success/failure).
- Calculate probabilities for quality control, polling, and A/B testing.
- Foundation for proportion tests and confidence intervals for proportions.
When to Use
- Fixed number of independent trials (
). - Each trial has two outcomes (success or failure).
- Constant probability of success (
) across trials. - Trials are independent.
Theoretical Background
Notation
where:
= number of trials = probability of success on each trial = number of successes
Probability Mass Function (PMF)
where
Properties
| Property | Formula |
|---|---|
| Mean | |
| Variance | |
| Standard Deviation | |
| Skewness |
Approximations
For large
Rule of thumb: Valid if
For large
Worked Example: Quality Control
A factory produces light bulbs with a defect rate of 2% (
A quality inspector randomly selects a batch of 20 bulbs (
Questions:
- What is the probability that exactly 2 bulbs are defective?
- What is the probability that at least 1 bulb is defective?
Solution:
1. Probability of exactly 2 defects (
Result: ~5.3% chance of finding exactly 2 bad bulbs.
2. Probability of at least 1 defect (
It's easier to calculate
Result: ~33.2% chance of finding at least one bad bulb in the batch.
Assumptions
Limitations
- Independence Violation (Clustering): If defects happen in clusters (e.g., a machine breaks down and produces 10 bad bulbs in a row), independence is violated. The Binomial model will underestimate the probability of extreme outcomes.
- Overdispersion: If the observed variance is significantly larger than
, the data is overdispersed. Use the Beta-Binomial Distribution instead. - Variable
: If the defect rate changes during the day, a simple Binomial model is invalid.
Python Implementation
from scipy.stats import binom
import numpy as np
import matplotlib.pyplot as plt
# Binomial(n=10, p=0.5)
n, p = 10, 0.5
dist = binom(n, p)
# P(X = 6)
prob_6 = dist.pmf(6)
print(f"P(X = 6 | n={n}, p={p}): {prob_6:.4f}")
# P(X <= 7)
prob_le_7 = dist.cdf(7)
print(f"P(X ≤ 7): {prob_le_7:.4f}")
# Visualize PMF
x = np.arange(0, n+1)
plt.bar(x, dist.pmf(x), alpha=0.7, edgecolor='black')
plt.xlabel('Number of Successes (k)')
plt.ylabel('P(X = k)')
plt.title(f'Binomial Distribution (n={n}, p={p})')
plt.xticks(x)
plt.grid(axis='y', alpha=0.3)
plt.show()
R Implementation
# Binomial(n=10, p=0.5)
n <- 10
p <- 0.5
# P(X = 6)
dbinom(6, size = n, prob = p)
# P(X <= 7)
pbinom(7, size = n, prob = p)
# Random sample
rbinom(20, size = n, prob = p)
# Visualize
x <- 0:n
plot(x, dbinom(x, size = n, prob = p), type = "h", lwd = 3,
xlab = "Number of Successes", ylab = "Probability",
main = paste("Binomial(n=", n, ", p=", p, ")", sep=""))
points(x, dbinom(x, size = n, prob = p), pch = 16, col = "blue")
Interpretation Guide
| Scenario | Interpretation |
|---|---|
| Mean = 50. Typical coin flip scenario. Symmetric distribution. | |
| Rare Events. Distribution is highly right-skewed. Better modeled by Poisson. | |
| P(X |
"At Least One" Risk. Even with low |
| Spread vs Mean | Standard Deviation |
Related Concepts
- Normal Distribution - Approximation for large
. - Poisson Distribution - Approximation for rare events.
- Hypergeometric Distribution - For sampling without replacement.
- Bernoulli Distribution - Special case with
.