ROC & AUC
ROC & AUC
Definition
ROC (Receiver Operating Characteristic) Curve plots the True Positive Rate (Recall) against the False Positive Rate at all possible classification thresholds. AUC (Area Under the Curve) summarizes the ROC into a single number representing the model's ability to discriminate between classes.
Purpose
- Evaluate classifier performance across all thresholds.
- Compare multiple models using a single metric (AUC).
- Diagnose trade-offs between sensitivity and specificity.
When to Use
- Comparing binary classifiers.
- You need a threshold-independent metric.
- Classes are reasonably balanced.
- Imbalanced Classes: Use Precision-Recall AUC instead. ROC can be overly optimistic.
Theoretical Background
Axes of the ROC Curve
- Y-axis: True Positive Rate (TPR, Recall):
- X-axis: False Positive Rate (FPR):
Interpreting the Curve
| AUC Value | Interpretation |
|---|---|
| 1.0 | Perfect classifier. |
| 0.9 - 1.0 | Excellent discrimination. |
| 0.8 - 0.9 | Good discrimination. |
| 0.7 - 0.8 | Fair discrimination. |
| 0.5 | Random guessing (diagonal line). |
| < 0.5 | Worse than random (model is making inverse predictions). |
Geometric Interpretation
AUC is the probability that a randomly chosen positive instance is ranked higher than a randomly chosen negative instance.
Limitations
- Imbalanced Data: With rare positives, a model can have high AUC but low precision. Use Precision-Recall AUC.
- Ignores Calibration: AUC measures ranking, not probability correctness. Use Brier Score for calibration.
- Does not pick threshold: AUC tells you how good the model is overall, but you still need to choose a threshold for deployment.
Python Implementation
from sklearn.metrics import roc_curve, roc_auc_score
import matplotlib.pyplot as plt
# Get predicted probabilities (not class labels)
probs = model.predict_proba(X_test)[:, 1]
# Calculate AUC
auc = roc_auc_score(y_test, probs)
print(f"AUC: {auc:.3f}")
# Plot ROC Curve
fpr, tpr, thresholds = roc_curve(y_test, probs)
plt.plot(fpr, tpr, label=f'AUC = {auc:.2f}')
plt.plot([0, 1], [0, 1], 'k--', label='Random')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate (Recall)')
plt.title('ROC Curve')
plt.legend()
plt.show()
R Implementation
library(pROC)
# predicted_probs: probabilities from model
roc_obj <- roc(actual_labels, predicted_probs)
# Get AUC
auc(roc_obj)
# Plot
plot(roc_obj, main = "ROC Curve", print.auc = TRUE)
Worked Numerical Example
Scenario: Comparing two models for detecting spam emails.
- Model A: AUC = 0.92
- Model B: AUC = 0.85
Interpretation:
- If you pick a random Spam email and a random Non-Spam email:
- Model A has a 92% chance of assigning a higher "Spam Score" to the actual spam email.
- Model A separates the classes better than Model B.
BUT: Model A might be worse at low False Positive Rates (e.g., it blocks too many real emails). You must check the curve shape, not just the AUC number.
Interpretation Guide
| Scenario | Interpretation | Edge Case Notes |
|---|---|---|
| AUC = 0.95 | Excellent discrimination. | Check for "Target Leakage" (AUC=1.0 is suspicious). |
| AUC = 0.50 | Random guessing. | Model provides no value. |
| AUC < 0.50 | Worse than random. | Inverted labels? Maybe 0=True and 1=False in code? |
| Curves Cross | Models trade off performance. | Model A better for high precision, B better for high recall. Pick based on business goal. |
Common Pitfall Example
Scenario: Fraud Detection (0.1% Fraud).
Model:
- AUC = 0.90 (Looks good!).
- Precision at Recall 50% = 5%.
The Problem:
- ROC looks high because True Negatives (non-fraud) flood the calculation.
- In reality, to catch 50% of fraud, you flag 20 false start for every 1 real fraud.
- The model might be practically useless despite "high" AUC.
Solution: Use Precision-Recall Curve (PR-AUC).
- It ignores True Negatives and highlights how bad the False Positives are relative to True Positives.
Related Concepts
- Confusion Matrix - Metrics at a single threshold.
- Precision-Recall Curve - Better for imbalanced data.
- Binary Logistic Regression