Support Vector Machine (SVM) Algorithm Part 1 - Define Primal Optimization Goal

Support Vector Machine (SVM) Algorithm Part 1 - Define Primal Optimization Goal

The primary goal of Support Vector Machines (SVM) is to maximize the margin between the two classes. This is achieved by finding the largest possible gap between two parallel hyperplanes, one for each class. Let's break down the algorithm in detail.

Visualizing the Problem

Consider a 2D plot with two classes: positive (+1) and negative (-1). The SVM aims to find a hyperplane h0 that separates the two classes while maximizing the margin. The margin is the distance between the two closest data points from each class, called support vectors, which lie on parallel hyperplanes h1 (for the positive class) and h2 (for the negative class).

  • h1: Support vector for the positive class.
  • h0: The optimal hyperplane (decision boundary).
  • h2: Support vector for the negative class.

The goal of the SVM algorithm is to find the optimal hyperplane h0 that maximizes the margin between the support vectors of the two classes, h1 and h2.

Deriving the Distance Formula

The distance from the origin (0) to the hyperplane h0 needs to be constant for any x, which implies that the projection of any point x onto the hyperplane must always yield the same result. Let's derive the formula for this distance step by step.

1. Geometric Intuition:

To understand the distance, we need to recognize the geometry of the hyperplane. It's the blue line from picture 1. For any point x, the distance from the point to the hyperplane is given by the projection of x onto the vector w (which is perpendicular to the hyperplane). The formula for this projection in terms of the angle θ between the vectors x and w is:

Here, ∥x∥ is the magnitude (or length) of the vector x, and θ is the angle between x and w.

For the SVM algorithm to work effectively, the distance from the origin to the hyperplane h0 should be constant for any x. This means we set:

where c is a constant value (which is the same for all data points). This ensures that the projection of any data point onto the hyperplane is consistent, and the margin between the classes is maximized.

2. Equation for the Hyperplane h0:

To satisfy this condition, we can replace c∥w∥ with -b (since b is the bias term associated with the hyperplane equation), resulting in the following equation for the hyperplane h0:

3. Equation for the Support Vectors h1 and h2:

For the support vectors, which lie closest to the hyperplane, the equations are modified as follows:

  • For the positive class (y=+1): w?x+b=1
  • For the negative class (y=?1): w?x+b=?1

These two hyperplanes define the margin around the optimal hyperplane h0.


Maximizing the Margin

The margin is the distance between the two support vectors, one from the positive class (x+) and one from the negative class (x?), and is given by:

Thus, the margin is 2/∥w∥, and to maximize the margin, we need to minimize ∥w∥, leading to the optimization problem:

But we did not finish defining the problem yet. There are constraints.

Constraints for Optimization

The SVM optimization problem involves ensuring that all data points are correctly classified with a margin of at least 1 from the hyperplane. This leads to the following constraints:

  • For positive samples (y=+1): w?x+b≥1
  • For negative samples (y=?1): w?x+b≤?1

These two constraints can be combined into one unified form:

This condition ensures that all data points are correctly classified with the desired margin.

Primal Optimization Goal of SVM

Therefore, the primal optimization goal of SVM is to find the weight vector w and bias b that maximize the margin between two classes while minimizing the classification error. This is achieved by solving the following optimization problem:

subject to the constraints:


Conclusion

This article provided a mathematical breakdown of the Support Vector Machine (SVM) algorithm and defined the primal optimization problem. However, SVM takes this to the next level by leveraging Lagrangian multipliers to transform the problem into a more efficient dual form. Tomorrow, I will delve into the details of how this transformation works mathematically and why it provides significant advantages in SVM optimization.


Source

https://www.youtube.com/watch?v=6-ntMIaJpm0&t=632s

https://www.youtube.com/watch?v=ny1iZ5A8ilA&t=571s


要查看或添加评论,请登录

Gina Lee的更多文章