Heatmap
Heatmap
Definition
Core Statement
A Heatmap is a 2D data visualization where individual values in a matrix are represented as colors. It is the primary tool for visualizing Correlation Matrices and finding patterns in high-dimensional or time-series data.
Purpose
- Correlation Analysis: Quickly spot highly correlated variables (
or ). - Missing Data Patterns: Visualize where
NaNvalues occur (are they random or structural?). - Clustering Results: When rows/cols are reordered by similarity (Clustered Heatmap), groups emerge.
- Time Series: Weekday vs Hour of Day grids (Traffic patterns).
When to Use
Use a Heatmap When...
- Comparing many-to-many relationships (e.g., 20 variables correlation).
- You want to identify "hot spots" (high activity/value).
- Displaying a Confusion Matrix for multi-class classification.
Do NOT Use When...
- Precise value reading is required (use a Table).
- Colorblind accessibility is a concern (use perceptually uniform colormaps like
viridisorcividis, NOTjet).
Key Types
1. Correlation Heatmap
- Input:
Correlation Matrix. - Use: Feature selection (drop redundant features).
2. Clustered Heatmap (Hierarchical)
- Input: Raw Data Matrix (
). - Action: Reorders rows and columns so similar items are next to each other.
- Use: Gene expression analysis, Customer segmentation.
Assumptions
Python Implementation
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
# Mock Data
data = pd.DataFrame(np.random.rand(10, 10), columns=[f'Var{i}' for i in range(10)])
corr = data.corr()
# 1. Basic Correlation Heatmap
plt.figure(figsize=(10, 8))
sns.heatmap(corr,
annot=True, # Show numbers
fmt=".2f", # 2 decimal places
cmap='coolwarm', # Diverging colormap
center=0, # Center at 0
vmin=-1, vmax=1) # Fixed range for correlation
plt.title("Feature Correlation Matrix")
plt.show()
# 2. Clustermap (Reorders rows/cols)
sns.clustermap(data.iloc[:20, :], cmap='viridis', standard_scale=1)
Limitations
Pitfalls
- Symmetry Trap: In correlation matrices, the top triangle is a mirror of the bottom. It's often cleaner to mask (hide) the upper triangle.
- Color Perception: Humans struggle to judge absolute differences in color. Don't use heatmaps for precise comparisons.
- Overcrowding: If you have 100+ variables, labels become unreadable. Remove labels or cluster.
Related Concepts
- Pearson Correlation - The metric usually visualized.
- Principal Component Analysis (PCA) - Alternative for high-dim data.
- Confusion Matrix - Often visualized as a heatmap.