Q-Q Plot
Q-Q Plot (Quantile-Quantile)
Definition
Core Statement
A Q-Q Plot (Quantile-Quantile Plot) is a graphical tool to compare the distribution of a sample against a theoretical distribution (usually the Normal Distribution). It plots sample quantiles against theoretical quantiles. If the data follows the theoretical distribution, points will fall on a straight diagonal line.
Purpose
- Assess normality of data or residuals.
- Diagnose deviations: skewness, heavy tails, outliers.
- Complement statistical tests (e.g., Shapiro-Wilk Test) with visual evidence.
When to Use
Use Q-Q Plot When...
- Checking normality assumption for t-tests, ANOVA, or regression residuals.
- Sample size is moderate to large (where histograms are also useful).
- You want to understand how normality is violated (skew? tails?).
Theoretical Background
Construction
- Sort the data.
- Calculate sample quantiles (percentiles).
- Calculate theoretical quantiles from
. - Plot pairs: (Theoretical
, Sample Value). - Add a 45-degree reference line.
Reading the Plot
| Pattern | Interpretation | Example |
|---|---|---|
| Straight Line | Data is Normal. | Normal residuals. |
| S-Shape | Light tails (Uniform). | Limited variation. |
| Curve (Concave) | Right Skew. | Income data. Use Log Transformations. |
| Curve (Convex) | Left Skew. | Ceiling effects. |
| Tails Depart Upward | Heavy right tail (Outliers). | Check for extreme values. |
| Tails Depart Downward | Heavy left tail. | Negative outliers. |
Assumptions
Q-Q plot is a diagnostic tool, not a test; no formal assumptions. However, interpretation assumes data is continuous.
Limitations
Pitfalls
- Subjectivity: "Straight enough" is a judgment call, especially for small
. - Small samples: Random variation can look like deviation.
- Not a formal test: Use Shapiro-Wilk Test for statistical confirmation.
Python Implementation
import statsmodels.api as sm
import matplotlib.pyplot as plt
# Create Q-Q Plot
sm.qqplot(data, line='45', fit=True)
plt.title("Q-Q Plot")
plt.show()
# For regression residuals
sm.qqplot(model.resid, line='45', fit=True)
plt.title("Q-Q Plot of Residuals")
plt.show()
R Implementation
# Q-Q Plot for a vector
qqnorm(data, main = "Q-Q Plot")
qqline(data, col = "red", lwd = 2)
# For regression residuals
model <- lm(Y ~ X, data = df)
qqnorm(resid(model))
qqline(resid(model), col = "red")
# Or use built-in plot for lm objects
plot(model, 2) # Standard plot #2 is Q-Q
Interpretation Guide
| Visual | Interpretation |
|---|---|
| Points on line | Data is approximately normal. |
| Points curve up at right | Right skew; consider log transform. |
| Points flatten at tails | Heavy tails (platykurtic/leptokurtic). |
| Single point far off | Outlier. Investigate. |
Related Concepts
- Shapiro-Wilk Test - Statistical test for normality.
- Normal Distribution - The reference distribution.
- Log Transformations - Fix for skewness.