Support Vector Machines (SVM)

Support Vector Machines (SVM)

Definition

Core Statement

Support Vector Machines (SVM) are supervised learning models used for classification and regression. The core idea is to find the hyperplane that best separates the two classes with the maximum margin (widest gap).
The "Support Vectors" are the data points closest to the hyperplane that define the margin.


Purpose

  1. High-Dimensional Classification: Effective when number of dimensions > number of samples.
  2. Complex Boundaries: Can model non-linear boundaries using the Kernel Trick.
  3. Robustness: Because it relies only on the support vectors (the "hard" cases), it is resistant to outliers far from the boundary.

When to Use

Use SVM When...

  • Small to Medium datasets (it doesn't scale well to millions of rows).
  • High dimensionality (e.g., Text Classification, Gene Expression).
  • You need a robust boundary.

Limitations

  • Slow Training: O(N3) complexity makes it painfully slow for large N.
  • Noise Sensitivity: If classes overlap heavily, the margin concept starts to fail (requires soft margin).
  • No Probability: Outputs are distances, not probabilities (unlike Logistic Regression). Requires probability=True (Platt scaling) which is slow.


Theoretical Background

The Margin

The Kernel Trick

What if data isn't linearly separable? (e.g., a red circle inside a blue ring).


Worked Example: 1D Separation

Problem

Data: Red dots at x=[3,2]. Blue dots at x=[2,3].
Goal: Separate them.

Linear SVM:

  • Hyperplane (Point): x=0.
  • Margin: Distance from 0 to -2 is 2. Distance from 0 to 2 is 2. Margin = 4.

Non-Linear Data:

  • Red: [3,3]. Blue: [1,1]. (Blue is sandwiched).
  • Problem: No single point separates them.
  • Kernel Trick: Map xx2.
    • Red: 9,9. Blue: 1,1.
    • Now separable! Cut at x2=5.

Python Implementation

from sklearn.svm import SVC
from sklearn.datasets import make_circles
import matplotlib.pyplot as plt

# complex non-linear data
X, y = make_circles(noise=0.1, factor=0.5, random_state=42)

# 1. Linear Kernel (Fails on circles)
# clf = SVC(kernel='linear') 

# 2. RBF Kernel (Works!)
clf = SVC(kernel='rbf', C=1.0, gamma='scale')
clf.fit(X, y)

# Visualization code (pseudo)
# plot_decision_boundary(clf, X, y)
# Result: A circular boundary separating the inner class.

Key Parameters

Parameter Meaning Effect of Increasing
C (Regularization) Penalty for misclassification. High C: Strict (Hard margin, fit training well, risk overfitting).
Low C: Loose (Soft margin, wider road, better generalization).
Gamma (Kernel width) How far the influence of a single point reaches. High Gamma: Points are isolated islands (Overfitting).
Low Gamma: Broad, smooth blobs (Underfitting).