Bayesian Statistics

Definition

Core Statement

Bayesian Statistics is an approach to statistical inference that treats probability as a measure of belief. It combines prior knowledge with observed data to produce a posterior distribution---an updated belief about parameters after seeing the evidence.

Purpose

Incorporate prior information into analysis.
Provide probabilistic statements about parameters (e.g., "90% probability the effect is positive").
Enable sequential updating as new data arrives.
Offer coherent inference for small samples.

When to Use

Use Bayesian Methods When...

You have meaningful prior information (from experts, past studies).
Sample size is small and frequentist methods are unreliable.
You want credible intervals (Bayesian equivalent of CIs) with direct probability interpretation.
You need to update beliefs as data accumulates.

Challenges

Computationally intensive (requires MCMC).
Prior choice can be subjective and controversial.
Steeper learning curve.

Theoretical Background

Bayes' Theorem

P (θ | D a t a) = \frac{P (D a t a | θ) \cdot P (θ)}{P (D a t a)}

Term	Name	Meaning
$P(\theta	Data)$	Posterior
$P(Data	\theta)$	Likelihood
$P (θ)$	Prior	Belief about $θ$ before seeing data.
$P (D a t a)$	Evidence	Normalizing constant.

The Bayesian Mantra

$Posterior \propto Likelihood \times Prior$

Prior Types

Prior Type	Description	When to Use
Non-Informative	Vague/flat prior (e.g., Uniform).	When you have no prior knowledge.
Weakly Informative	Mildly constraining (e.g., Normal(0, 10)).	Regularize without strong beliefs.
Informative	Strong prior based on domain expertise.	When prior studies or theory exists.

Credible Interval vs Confidence Interval

Bayesian Credible Interval	Frequentist Confidence Interval
"There is a 95% probability that the parameter lies in this interval."	"If we repeated the experiment many times, 95% of such intervals would contain the true value."
Direct probability statement.	Frequentist coverage property.

Worked Example: Is the Coin Fair?

Problem

You find a strange coin. You want to estimate the probability of Heads ( $θ$ ).

Prior: You have no strong reason to think it's biased, but you aren't sure. You choose a Beta(2, 2) prior (weakly centered around 0.5).
- Prior Mean = $2 / (2 + 2) = 0.5$ .
Data: You flip the coin 10 times and get 9 Heads.
Update: Calculate the Posterior.

Solution (Conjugate Priors):
Since Beta is conjugate to Binomial:

Posterior \sim Beta (α_{p r i o r} + Heads, β_{p r i o r} + Tails)

New Parameters:
- $α_{p o s t} = 2 + 9 = 11$
- $β_{p o s t} = 2 + 1 = 3$
Posterior Distribution:
- $Beta (11, 3)$
New Belief (Posterior Mean):
$E [θ | D a t a] = \frac{11}{11 + 3} = \frac{11}{14} \approx 0.786$

Conclusion: Before the data, you guessed 50% chance of Heads. Examples of 9/10 heads shifted your belief to ~79%. You are now suspicious the coin is biased towards Heads, but not 100% certain (since $n$ is small).

Assumptions

Prior Specification: Must choose a prior distribution.
Model Specification: Likelihood function must be defined.
Exchangeability: The order of data points shouldn't matter (usually).

Limitations

Pitfalls

Prior Sensitivity: In small samples, the choice of prior matters heavily. If you used a Beta(100, 100) prior, 9 heads would barely move the needle.
The "Flat Prior" Trap: Using a Uniform(0, $\infty$ ) prior on a variance parameter is often improper (integral doesn't converge) or biased.
Label Switching: In complex mixture models, the sampler might swap class labels, confusing the output.

Python Implementation (PyMC)

import pymc as pm
import arviz as az

# Example: Estimating a proportion
with pm.Model() as model:
    # Prior
    theta = pm.Beta("theta", alpha=1, beta=1)  # Uniform prior
    
    # Likelihood
    y = pm.Binomial("y", n=100, p=theta, observed=60)  # 60 successes
    
    # Sample
    trace = pm.sample(2000, return_inferencedata=True)

# Summary
print(az.summary(trace, hdi_prob=0.95))

# Plot Posterior
az.plot_posterior(trace)

R Implementation (rstanarm or brms)

library(rstanarm)

# Bayesian Regression
model <- stan_glm(Y ~ X1 + X2, data = df, family = gaussian(),
                  prior = normal(0, 2.5),
                  prior_intercept = normal(0, 10))

# Summary
print(summary(model), digits = 3)

# Posterior Intervals
posterior_interval(model, prob = 0.95)

# Diagnostics
plot(model, "trace")

Interpretation Guide

Output	Interpretation
Posterior Mean = 0.6	The expected value of the parameter given the data.
95% Credible Interval	There is a 95% probability the true parameter falls in this range. (Unlike Frequentist CI).
Bayes Factor > 10	Strong Evidence in favor of the hypothesis vs null.
Wide Posterior	Data was insufficient to overcome prior uncertainty. Need more data.

Bayes' Theorem - The mathematical foundation.
Maximum Likelihood Estimation (MLE) - Frequentist alternative.
Probabilistic Programming - Tools like PyMC, Stan.
Hypothesis Testing (P-Value & CI) - Frequentist framework.