Bayesian Statistics
Bayesian Statistics
Definition
Core Statement
Bayesian Statistics is an approach to statistical inference that treats probability as a measure of belief. It combines prior knowledge with observed data to produce a posterior distribution---an updated belief about parameters after seeing the evidence.
Purpose
- Incorporate prior information into analysis.
- Provide probabilistic statements about parameters (e.g., "90% probability the effect is positive").
- Enable sequential updating as new data arrives.
- Offer coherent inference for small samples.
When to Use
Use Bayesian Methods When...
- You have meaningful prior information (from experts, past studies).
- Sample size is small and frequentist methods are unreliable.
- You want credible intervals (Bayesian equivalent of CIs) with direct probability interpretation.
- You need to update beliefs as data accumulates.
Challenges
- Computationally intensive (requires MCMC).
- Prior choice can be subjective and controversial.
- Steeper learning curve.
Theoretical Background
Bayes' Theorem
| Term | Name | Meaning |
|---|---|---|
| $P(\theta | Data)$ | Posterior |
| $P(Data | \theta)$ | Likelihood |
| Prior | Belief about |
|
| Evidence | Normalizing constant. |
The Bayesian Mantra
Prior Types
| Prior Type | Description | When to Use |
|---|---|---|
| Non-Informative | Vague/flat prior (e.g., Uniform). | When you have no prior knowledge. |
| Weakly Informative | Mildly constraining (e.g., Normal(0, 10)). | Regularize without strong beliefs. |
| Informative | Strong prior based on domain expertise. | When prior studies or theory exists. |
Credible Interval vs Confidence Interval
| Bayesian Credible Interval | Frequentist Confidence Interval |
|---|---|
| "There is a 95% probability that the parameter lies in this interval." | "If we repeated the experiment many times, 95% of such intervals would contain the true value." |
| Direct probability statement. | Frequentist coverage property. |
Worked Example: Is the Coin Fair?
Problem
You find a strange coin. You want to estimate the probability of Heads (
- Prior: You have no strong reason to think it's biased, but you aren't sure. You choose a Beta(2, 2) prior (weakly centered around 0.5).
- Prior Mean =
.
- Prior Mean =
- Data: You flip the coin 10 times and get 9 Heads.
- Update: Calculate the Posterior.
Solution (Conjugate Priors):
Since Beta is conjugate to Binomial:
-
New Parameters:
-
Posterior Distribution:
-
New Belief (Posterior Mean):
Conclusion: Before the data, you guessed 50% chance of Heads. Examples of 9/10 heads shifted your belief to ~79%. You are now suspicious the coin is biased towards Heads, but not 100% certain (since
Assumptions
Limitations
Pitfalls
- Prior Sensitivity: In small samples, the choice of prior matters heavily. If you used a Beta(100, 100) prior, 9 heads would barely move the needle.
- The "Flat Prior" Trap: Using a Uniform(0,
) prior on a variance parameter is often improper (integral doesn't converge) or biased. - Label Switching: In complex mixture models, the sampler might swap class labels, confusing the output.
Python Implementation (PyMC)
import pymc as pm
import arviz as az
# Example: Estimating a proportion
with pm.Model() as model:
# Prior
theta = pm.Beta("theta", alpha=1, beta=1) # Uniform prior
# Likelihood
y = pm.Binomial("y", n=100, p=theta, observed=60) # 60 successes
# Sample
trace = pm.sample(2000, return_inferencedata=True)
# Summary
print(az.summary(trace, hdi_prob=0.95))
# Plot Posterior
az.plot_posterior(trace)
R Implementation (rstanarm or brms)
library(rstanarm)
# Bayesian Regression
model <- stan_glm(Y ~ X1 + X2, data = df, family = gaussian(),
prior = normal(0, 2.5),
prior_intercept = normal(0, 10))
# Summary
print(summary(model), digits = 3)
# Posterior Intervals
posterior_interval(model, prob = 0.95)
# Diagnostics
plot(model, "trace")
Interpretation Guide
| Output | Interpretation |
|---|---|
| Posterior Mean = 0.6 | The expected value of the parameter given the data. |
| 95% Credible Interval | There is a 95% probability the true parameter falls in this range. (Unlike Frequentist CI). |
| Bayes Factor > 10 | Strong Evidence in favor of the hypothesis vs null. |
| Wide Posterior | Data was insufficient to overcome prior uncertainty. Need more data. |
Related Concepts
- Bayes' Theorem - The mathematical foundation.
- Maximum Likelihood Estimation (MLE) - Frequentist alternative.
- Probabilistic Programming - Tools like PyMC, Stan.
- Hypothesis Testing (P-Value & CI) - Frequentist framework.