Support Vector Machine (SVM) Algorithm Part 1 - Define Primal Optimization Goal
The primary goal of Support Vector Machines (SVM) is to maximize the margin between the two classes. This is achieved by finding the largest possible gap between two parallel hyperplanes, one for each class. Let's break down the algorithm in detail.
Visualizing the Problem
Consider a 2D plot with two classes: positive (+1) and negative (-1). The SVM aims to find a hyperplane h0 that separates the two classes while maximizing the margin. The margin is the distance between the two closest data points from each class, called support vectors, which lie on parallel hyperplanes h1 (for the positive class) and h2 (for the negative class).
The goal of the SVM algorithm is to find the optimal hyperplane h0 that maximizes the margin between the support vectors of the two classes, h1 and h2.
Deriving the Distance Formula
The distance from the origin (0) to the hyperplane h0 needs to be constant for any x, which implies that the projection of any point x onto the hyperplane must always yield the same result. Let's derive the formula for this distance step by step.
1. Geometric Intuition:
To understand the distance, we need to recognize the geometry of the hyperplane. It's the blue line from picture 1. For any point x, the distance from the point to the hyperplane is given by the projection of x onto the vector w (which is perpendicular to the hyperplane). The formula for this projection in terms of the angle θ between the vectors x and w is:
Here, ∥x∥ is the magnitude (or length) of the vector x, and θ is the angle between x and w.
For the SVM algorithm to work effectively, the distance from the origin to the hyperplane h0 should be constant for any x. This means we set:
where c is a constant value (which is the same for all data points). This ensures that the projection of any data point onto the hyperplane is consistent, and the margin between the classes is maximized.
2. Equation for the Hyperplane h0:
To satisfy this condition, we can replace c∥w∥ with -b (since b is the bias term associated with the hyperplane equation), resulting in the following equation for the hyperplane h0:
3. Equation for the Support Vectors h1 and h2:
For the support vectors, which lie closest to the hyperplane, the equations are modified as follows:
These two hyperplanes define the margin around the optimal hyperplane h0.
Maximizing the Margin
The margin is the distance between the two support vectors, one from the positive class (x+) and one from the negative class (x?), and is given by:
Thus, the margin is 2/∥w∥, and to maximize the margin, we need to minimize ∥w∥, leading to the optimization problem:
But we did not finish defining the problem yet. There are constraints.
Constraints for Optimization
The SVM optimization problem involves ensuring that all data points are correctly classified with a margin of at least 1 from the hyperplane. This leads to the following constraints:
These two constraints can be combined into one unified form:
This condition ensures that all data points are correctly classified with the desired margin.
Primal Optimization Goal of SVM
Therefore, the primal optimization goal of SVM is to find the weight vector w and bias b that maximize the margin between two classes while minimizing the classification error. This is achieved by solving the following optimization problem:
subject to the constraints:
Conclusion
This article provided a mathematical breakdown of the Support Vector Machine (SVM) algorithm and defined the primal optimization problem. However, SVM takes this to the next level by leveraging Lagrangian multipliers to transform the problem into a more efficient dual form. Tomorrow, I will delve into the details of how this transformation works mathematically and why it provides significant advantages in SVM optimization.
Source