Support Vector Machines (SVM)
Support Vector Machines (SVM)
Definition
Core Statement
Support Vector Machines (SVM) are supervised learning models used for classification and regression. The core idea is to find the hyperplane that best separates the two classes with the maximum margin (widest gap).
The "Support Vectors" are the data points closest to the hyperplane that define the margin.
Purpose
- High-Dimensional Classification: Effective when number of dimensions > number of samples.
- Complex Boundaries: Can model non-linear boundaries using the Kernel Trick.
- Robustness: Because it relies only on the support vectors (the "hard" cases), it is resistant to outliers far from the boundary.
When to Use
Use SVM When...
- Small to Medium datasets (it doesn't scale well to millions of rows).
- High dimensionality (e.g., Text Classification, Gene Expression).
- You need a robust boundary.
Limitations
- Slow Training:
complexity makes it painfully slow for large . - Noise Sensitivity: If classes overlap heavily, the margin concept starts to fail (requires soft margin).
- No Probability: Outputs are distances, not probabilities (unlike Logistic Regression). Requires
probability=True(Platt scaling) which is slow.
Theoretical Background
The Margin
- Hyperplane: A line (in 2D) or flat surface (in 3D) separating classes.
- Margin: The distance between the hyperplane and the nearest points (Support Vectors).
- Goal: Maximize the Margin. (A wide road is safer than a narrow one).
The Kernel Trick
What if data isn't linearly separable? (e.g., a red circle inside a blue ring).
- Idea: Project data into a Higher Dimension where it is separable.
- Kernel: A math function that computes dot products in this high dimension without actually transforming the data (Computational shortcut).
- Linear Kernel: Standard straight line.
- RBF (Radial Basis Function): Creates circular/blobby boundaries. Infinite dimensions.
- Polynomial: Curved boundaries.
Worked Example: 1D Separation
Problem
Data: Red dots at
Goal: Separate them.
Linear SVM:
- Hyperplane (Point):
. - Margin: Distance from 0 to -2 is 2. Distance from 0 to 2 is 2. Margin = 4.
Non-Linear Data:
- Red:
. Blue: . (Blue is sandwiched). - Problem: No single point separates them.
- Kernel Trick: Map
. - Red:
. Blue: . - Now separable! Cut at
.
- Red:
Python Implementation
from sklearn.svm import SVC
from sklearn.datasets import make_circles
import matplotlib.pyplot as plt
# complex non-linear data
X, y = make_circles(noise=0.1, factor=0.5, random_state=42)
# 1. Linear Kernel (Fails on circles)
# clf = SVC(kernel='linear')
# 2. RBF Kernel (Works!)
clf = SVC(kernel='rbf', C=1.0, gamma='scale')
clf.fit(X, y)
# Visualization code (pseudo)
# plot_decision_boundary(clf, X, y)
# Result: A circular boundary separating the inner class.
Key Parameters
| Parameter | Meaning | Effect of Increasing |
|---|---|---|
| C (Regularization) | Penalty for misclassification. | High C: Strict (Hard margin, fit training well, risk overfitting). Low C: Loose (Soft margin, wider road, better generalization). |
| Gamma (Kernel width) | How far the influence of a single point reaches. | High Gamma: Points are isolated islands (Overfitting). Low Gamma: Broad, smooth blobs (Underfitting). |
Related Concepts
- Logistic Regression - Alternative linear classifier (Probs).
- Kernel Density Estimation - Related to RBF.
- Overfitting & Underfitting - Controlled by C and Gamma.