登录查看更多内容

Support Vector Machine (SVM) Algorithm Part 1 - Define Primal Optimization Goal

Gina Lee

Data Scientist Sharing My Daily Learnings | Former Data Analyst | Statistics @ uwaterloo

发布日期: 2025年3月9日

The primary goal of Support Vector Machines (SVM) is to maximize the margin between the two classes. This is achieved by finding the largest possible gap between two parallel hyperplanes, one for each class. Let's break down the algorithm in detail.

Visualizing the Problem

Consider a 2D plot with two classes: positive (+1) and negative (-1). The SVM aims to find a hyperplane h0 that separates the two classes while maximizing the margin. The margin is the distance between the two closest data points from each class, called support vectors, which lie on parallel hyperplanes h1 (for the positive class) and h2 (for the negative class).

h1: Support vector for the positive class.
h0: The optimal hyperplane (decision boundary).
h2: Support vector for the negative class.

The goal of the SVM algorithm is to find the optimal hyperplane h0 that maximizes the margin between the support vectors of the two classes, h1 and h2.

Deriving the Distance Formula

The distance from the origin (0) to the hyperplane h0 needs to be constant for any x, which implies that the projection of any point x onto the hyperplane must always yield the same result. Let's derive the formula for this distance step by step.

1. Geometric Intuition:

To understand the distance, we need to recognize the geometry of the hyperplane. It's the blue line from picture 1. For any point x, the distance from the point to the hyperplane is given by the projection of x onto the vector w (which is perpendicular to the hyperplane). The formula for this projection in terms of the angle θ between the vectors x and w is:

Here, ∥x∥ is the magnitude (or length) of the vector x, and θ is the angle between x and w.

For the SVM algorithm to work effectively, the distance from the origin to the hyperplane h0 should be constant for any x. This means we set:

where c is a constant value (which is the same for all data points). This ensures that the projection of any data point onto the hyperplane is consistent, and the margin between the classes is maximized.

2. Equation for the Hyperplane h0:

To satisfy this condition, we can replace c∥w∥ with -b (since b is the bias term associated with the hyperplane equation), resulting in the following equation for the hyperplane h0:

3. Equation for the Support Vectors h1 and h2:

For the support vectors, which lie closest to the hyperplane, the equations are modified as follows:

For the positive class (y=+1): w?x+b=1
For the negative class (y=?1): w?x+b=?1

These two hyperplanes define the margin around the optimal hyperplane h0.

Maximizing the Margin

The margin is the distance between the two support vectors, one from the positive class (x+) and one from the negative class (x?), and is given by:

Thus, the margin is 2/∥w∥, and to maximize the margin, we need to minimize ∥w∥, leading to the optimization problem:

But we did not finish defining the problem yet. There are constraints.

Constraints for Optimization

The SVM optimization problem involves ensuring that all data points are correctly classified with a margin of at least 1 from the hyperplane. This leads to the following constraints:

For positive samples (y=+1): w?x+b≥1
For negative samples (y=?1): w?x+b≤?1

These two constraints can be combined into one unified form:

This condition ensures that all data points are correctly classified with the desired margin.

Primal Optimization Goal of SVM

Therefore, the primal optimization goal of SVM is to find the weight vector w and bias b that maximize the margin between two classes while minimizing the classification error. This is achieved by solving the following optimization problem:

subject to the constraints:

Conclusion

This article provided a mathematical breakdown of the Support Vector Machine (SVM) algorithm and defined the primal optimization problem. However, SVM takes this to the next level by leveraging Lagrangian multipliers to transform the problem into a more efficient dual form. Tomorrow, I will delve into the details of how this transformation works mathematically and why it provides significant advantages in SVM optimization.

Source

https://www.youtube.com/watch?v=6-ntMIaJpm0&t=632s

https://www.youtube.com/watch?v=ny1iZ5A8ilA&t=571s

要查看或添加评论，请登录

Gina Lee的更多文章

Support Vector Machine (SVM) Algorithm Part 2 - Transforming the SVM Problem Using Lagrange Multipliers

2025年3月10日

Support Vector Machine (SVM) Algorithm Part 2 - Transforming the SVM Problem Using Lagrange Multipliers

In yesterday's article, we defined the primal optimization goal that SVM aims to solve. If you haven't read it yet, I…

1 条评论
How Decision Tree Build a Tree and Leaves

2025年2月21日

How Decision Tree Build a Tree and Leaves

When I introduced boosting models, do you remember they were based on decision trees? I briefly mentioned how these…
Generalized Linear Model (GLM) - Flexible Regression Model

2025年2月20日

Generalized Linear Model (GLM) - Flexible Regression Model

When we think about linear regression, the first thing that usually comes to mind is the assumption that the dependent…

1 条评论
What to Check Before You Decide to Apply a Linear Regression Model

2025年2月18日

What to Check Before You Decide to Apply a Linear Regression Model

In our last post, we talked about the assumptions of the linear regression model: linearity, homoscedasticity…
VGGNet: A Deep CNN Approach to Large-Scale Image Recognition

2025年2月15日

VGGNet: A Deep CNN Approach to Large-Scale Image Recognition

Last time, we talked about how CNNs use convolution to process images. Today, let’s dive deeper into one of the most…
How Convolutional Neural Network (CNN) Processes Image Data

2025年2月11日

How Convolutional Neural Network (CNN) Processes Image Data

Convolutional Neural Networks (CNNs) have revolutionized image processing by enabling models to automatically extract…
Understand Gradient Boosting

2025年2月6日

Understand Gradient Boosting

Last time, I explained the difference between bagging and boosting models: while bagging builds multiple models in…
A/B Testing Part 1: Power Analysis Calculates Sample Size

2025年2月1日

A/B Testing Part 1: Power Analysis Calculates Sample Size

When designing A/B testing, one of the most critical factors is determining how many participants (or samples) you need…
Gradient Descent Part 4: Regularization Prevents Overfitting

2025年1月28日

Gradient Descent Part 4: Regularization Prevents Overfitting

Regularization is a technique used to prevent overfitting by adding a penalty term to the loss function. This penalty…
Binary Classification: Loss Function and Activation Function

2025年1月25日

Binary Classification: Loss Function and Activation Function

In machine learning, binary classification is a problem where the goal is to predict which of two classes an instance…

See all articles

Visualizing the Problem

Deriving the Distance Formula

Maximizing the Margin

Constraints for Optimization

Primal Optimization Goal of SVM

Conclusion

Source

Gina Lee的更多文章

Support Vector Machine (SVM) Algorithm Part 2 - Transforming the SVM Problem Using Lagrange Multipliers

How Decision Tree Build a Tree and Leaves

Generalized Linear Model (GLM) - Flexible Regression Model

What to Check Before You Decide to Apply a Linear Regression Model

VGGNet: A Deep CNN Approach to Large-Scale Image Recognition

How Convolutional Neural Network (CNN) Processes Image Data

Understand Gradient Boosting

A/B Testing Part 1: Power Analysis Calculates Sample Size

Gradient Descent Part 4: Regularization Prevents Overfitting

Binary Classification: Loss Function and Activation Function