Naive Bayes

Definition

Core Statement

Naive Bayes is a family of probabilistic classifiers based on applying Bayes' Theorem with the "naive" assumption of conditional independence between every pair of features given the class label.

P (y | x_{1}, \dots, x_{n}) \propto P (y) \prod_{i = 1}^{n} P (x_{i} | y)

Purpose

Text Classification: Spam detection, Sentiment Analysis (The "Hello World" of NLP).
Real-Time Prediction: Extremely fast train and predict.
Baseline: Often the first model to try alongside Logistic Regression.

Why "Naive"?

It assumes that features are independent.

Example: In text, it assumes the word "Bank" appears independently of the word "Account".
Reality: This is false (context matters).
Surprise: It still works exceptionally well because we only care about the ranking of probabilities (is Spam > Ham?), not the exact number.

Worked Example: Spam Filter

Problem

Classify email: "Free Money".
Priors: $P (S p a m) = 0.5$ , $P (H a m) = 0.5$ .

Likelihoods (from training data):

"Free" | Spam: 0.4
"Money" | Spam: 0.2
"Free" | Ham: 0.01 (Rare)
"Money" | Ham: 0.02 (Financials)

Calculation:

Score(Spam): $0.5 \times 0.4 \times 0.2 = 0.04$ .
Score(Ham): $0.5 \times 0.01 \times 0.02 = 0.0001$ .

Conclusion: $0.04 ≫ 0.0001$ . Classify as SPAM.

Types of Naive Bayes

Variant	Data Type	Usage
Multinomial NB	Counts (Integers)	Text Classification (Word counts).
Bernoulli NB	Binary (0/1)	Short text / Keyword presence.
Gaussian NB	Continuous	Normal data (e.g., Iris species).

Limitations

Pitfalls

Zero Frequency Problem: If a word "Casino" never appears in training Ham, $P ("Casino" | Ham) = 0$ . The whole product becomes 0.
- Fix: Laplace Smoothing (Add 1 to all counts).
Correlated Features: If you have "Money" and "Cash" (synonyms), NB double-counts the evidence, becoming over-confident.

Python Implementation

from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import CountVectorizer

# Corpus
X_train = ["Free money now", "Hi mom", "Win cash prize"]
y_train = ["Spam", "Ham", "Spam"]

# 1. Vectorize (Convert text to counts)
vec = CountVectorizer()
X_mat = vec.fit_transform(X_train)

# 2. Fit Model
clf = MultinomialNB(alpha=1.0) # alpha=1 is Laplace Smoothing
clf.fit(X_mat, y_train)

# 3. Predict
test = ["Free cash"]
test_vec = vec.transform(test)
print(f"Prediction: {clf.predict(test_vec)[0]}")
# Likely Spam

Bayes' Theorem - The math.
Logistic Regression - Discriminative counterpart (often better if enough data).
Bag of Words - The text representation used.
Smoothing - Handling zeros.

Naive Bayes

Definition

Purpose

Why "Naive"?

Worked Example: Spam Filter

Types of Naive Bayes

Limitations

Python Implementation

Related Concepts