Confusion Matrix

Confusion Matrix

Definition

Core Statement

A Confusion Matrix is a table that summarizes the performance of a classification model by comparing predicted labels to actual labels. It forms the basis for calculating metrics like Accuracy, Precision, Recall, and F1-Score.


Purpose

  1. Visualize classification performance.
  2. Calculate key metrics for model evaluation.
  3. Identify types of errors (False Positives vs False Negatives).

When to Use

Use Confusion Matrix When...

  • Evaluating any classification model (Binary or Multi-class).
  • You need to understand what kind of errors the model makes.
  • Accuracy alone is misleading (imbalanced classes).


Theoretical Background

The Matrix (Binary Classification)

Predicted Negative (0) Predicted Positive (1)
Actual Negative (0) True Negative (TN) False Positive (FP) Type I Error
Actual Positive (1) False Negative (FN) Type II Error True Positive (TP)

Key Metrics

Metric Formula Interpretation
Accuracy TP+TNTotal Overall correctness. (Misleading if imbalanced).
Precision TPTP+FP Of all predicted positives, how many are correct? (Avoid false alarms).
Recall (Sensitivity, TPR) TPTP+FN Of all actual positives, how many were found? (Avoid missing cases).
Specificity (TNR) TNTN+FP Of all actual negatives, how many were correctly identified?
F1-Score 2PrecisionRecallPrecision+Recall Harmonic mean of Precision and Recall. Balances both.

Precision vs Recall Trade-off

Context Matters

  • High Precision Priority: Spam detection. (False Positives = Real emails in spam).
  • High Recall Priority: Disease screening. (False Negatives = Missing sick patients).


Limitations

Pitfalls

  1. Accuracy Paradox: In imbalanced data (e.g., 99% negative), a model predicting all negative gets 99% accuracy but is useless.
  2. Threshold Dependent: Metrics change with the classification threshold. Use ROC & AUC for threshold-independent evaluation.


Python Implementation

from sklearn.metrics import confusion_matrix, classification_report
import seaborn as sns
import matplotlib.pyplot as plt

# y_true: actual labels, y_pred: predicted labels
cm = confusion_matrix(y_true, y_pred)

# Heatmap Visualization
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',
            xticklabels=['Pred 0', 'Pred 1'], yticklabels=['Actual 0', 'Actual 1'])
plt.title("Confusion Matrix")
plt.show()

# Full Report
print(classification_report(y_true, y_pred))

R Implementation

library(caret)

# Create factors
pred <- factor(predicted_labels)
actual <- factor(actual_labels)

# Confusion Matrix
confusionMatrix(data = pred, reference = actual)

# Output: Accuracy, Sensitivity, Specificity, Precision, etc.

Worked Numerical Example

Rare Disease Detection (Imbalanced)

Data: 100 Patients (95 Healthy, 5 Sick).
Model Prediction: Predicts everyone is "Healthy" (All Negative).

Confusion Matrix:

  • TP = 0, FN = 5 (Missed all sick people!)
  • TN = 95, FP = 0

Metrics:

  • Accuracy: (0+95)/100=95% (Looks amazing!)
  • Recall: 0/(0+5)=0% (Missed everyone!)
  • Precision: Undefined (0/0) or 0.

Conclusion: The model is useless despite 95% accuracy.


Interpretation Guide

Output Interpretation Edge Case Notes
High Accuracy, Low Recall Accuracy Paradox. Common in imbalanced data (Fraud, Disease). Ignore accuracy.
High Recall, Low Precision "Nets too wide". Many false alarms. Okay for screening tests (cheap filters).
High Precision, Low Recall "Very picky". Misses many cases, but trusts positive predictions. Good for spam filters (don't delete real email).
F1 = 0.9 Strong balance of P and R. Excellent model.

Common Pitfall Example

Threshold Setting Blindness

Scenario: Logistic Regression outputs probabilities (0.1, 0.6, 0.9, ...).
Default: Classify as 1 if p>0.5.

Problem:

  • If "Positive" is Cancer, p>0.5 might be too strict.
  • You miss patients with p=0.4 who might have cancer.

Solution:

  • Tune the Threshold.
  • Lower threshold to 0.2 Increase Recall (Find more cancer), but decrease Precision (More false alarms).
  • Use the ROC Curve to make this decision intentionally.