Binomial Distribution

Definition

Core Statement

The Binomial Distribution models the number of successes in a fixed number of independent trials, where each trial has the same probability of success. It answers: "If I flip a coin 10 times, what's the probability of getting exactly 6 heads?"

Purpose

Model discrete outcomes with two possibilities (success/failure).
Calculate probabilities for quality control, polling, and A/B testing.
Foundation for proportion tests and confidence intervals for proportions.

When to Use

Use Binomial Distribution When...

Fixed number of independent trials ( $n$ ).
Each trial has two outcomes (success or failure).
Constant probability of success ( $p$ ) across trials.
Trials are independent.

Theoretical Background

Notation

X \sim Binomial (n, p)

where:

$n$ = number of trials
$p$ = probability of success on each trial
$X$ = number of successes

Probability Mass Function (PMF)

P (X = k) = (\binom{n}{k}) p^{k} (1 - p)^{n - k}

where $(\binom{n}{k}) = \frac{n!}{k! (n - k)!}$ is the binomial coefficient.

Properties

Property	Formula
Mean	$μ = n p$
Variance	$σ^{2} = n p (1 - p)$
Standard Deviation	$σ = \sqrt{n p (1 - p)}$
Skewness	$\frac{1 - 2 p}{\sqrt{n p (1 - p)}}$

Approximations

Normal Approximation

For large $n$ and $p$ not near 0 or 1:

Binomial (n, p) \approx N (n p, n p (1 - p))

Rule of thumb: Valid if $n p \geq 5$ and $n (1 - p) \geq 5$ .

Poisson Approximation

For large $n$ and small $p$ (rare events):

Binomial (n, p) \approx Poisson (λ = n p)

Worked Example: Quality Control

Problem

A factory produces light bulbs with a defect rate of 2% ( $p = 0.02$ ).
A quality inspector randomly selects a batch of 20 bulbs ( $n = 20$ ).

Questions:

What is the probability that exactly 2 bulbs are defective?
What is the probability that at least 1 bulb is defective?

Solution:

1. Probability of exactly 2 defects ( $X = 2$ ):

P (X = 2) = (\binom{20}{2}) (0.02)^{2} (0.98)^{18}

(\binom{20}{2}) = \frac{20 \times 19}{2} = 190

P (X = 2) = 190 \times 0.0004 \times 0.695 = 0.0528

Result: ~5.3% chance of finding exactly 2 bad bulbs.

2. Probability of at least 1 defect ( $X \geq 1$ ):
It's easier to calculate $1 - P (X = 0)$ .

P (X = 0) = (\binom{20}{0}) (0.02)^{0} (0.98)^{20} = 1 \times 1 \times 0.6676 = 0.6676

P (X \geq 1) = 1 - 0.6676 = 0.3324

Result: ~33.2% chance of finding at least one bad bulb in the batch.

Assumptions

Fixed $n$ : Number of trials is predetermined.
Binary Outcomes: Each trial results in success or failure.
Independence: Trials do not affect each other.
Constant $p$ : Probability of success is the same for all trials.

Limitations

Pitfalls

Independence Violation (Clustering): If defects happen in clusters (e.g., a machine breaks down and produces 10 bad bulbs in a row), independence is violated. The Binomial model will underestimate the probability of extreme outcomes.
Overdispersion: If the observed variance is significantly larger than $n p (1 - p)$ , the data is overdispersed. Use the Beta-Binomial Distribution instead.
Variable $p$ : If the defect rate changes during the day, a simple Binomial model is invalid.

Python Implementation

from scipy.stats import binom
import numpy as np
import matplotlib.pyplot as plt

# Binomial(n=10, p=0.5)
n, p = 10, 0.5
dist = binom(n, p)

# P(X = 6)
prob_6 = dist.pmf(6)
print(f"P(X = 6 | n={n}, p={p}): {prob_6:.4f}")

# P(X <= 7)
prob_le_7 = dist.cdf(7)
print(f"P(X ≤ 7): {prob_le_7:.4f}")

# Visualize PMF
x = np.arange(0, n+1)
plt.bar(x, dist.pmf(x), alpha=0.7, edgecolor='black')
plt.xlabel('Number of Successes (k)')
plt.ylabel('P(X = k)')
plt.title(f'Binomial Distribution (n={n}, p={p})')
plt.xticks(x)
plt.grid(axis='y', alpha=0.3)
plt.show()

R Implementation

# Binomial(n=10, p=0.5)
n <- 10
p <- 0.5

# P(X = 6)
dbinom(6, size = n, prob = p)

# P(X <= 7)
pbinom(7, size = n, prob = p)

# Random sample
rbinom(20, size = n, prob = p)

# Visualize
x <- 0:n
plot(x, dbinom(x, size = n, prob = p), type = "h", lwd = 3,
     xlab = "Number of Successes", ylab = "Probability",
     main = paste("Binomial(n=", n, ", p=", p, ")", sep=""))
points(x, dbinom(x, size = n, prob = p), pch = 16, col = "blue")

Interpretation Guide

Scenario	Interpretation
$n = 100$ , $p = 0.5$	Mean = 50. Typical coin flip scenario. Symmetric distribution.
$n = 1000$ , $p = 0.001$	Rare Events. Distribution is highly right-skewed. Better modeled by Poisson.
P(X $\geq$ 1)	"At Least One" Risk. Even with low $p$ , large $n$ makes failure likely.
Spread vs Mean	Standard Deviation $σ \propto \sqrt{n}$ . Percentage error decreases as $n$ grows.

Normal Distribution - Approximation for large $n$ .
Poisson Distribution - Approximation for rare events.
Hypergeometric Distribution - For sampling without replacement.
Bernoulli Distribution - Special case with $n = 1$ .