登录查看更多内容

Support Vector Machine — an overview

Yogesh Khurana

Business Architect | Banking | Cards Payments | HOGAN-CAMS | Luxoft - DXC | Axis Bank

发布日期: 2019年11月2日

Support Vector Machine — Machine Learning Algorithm

The Support Vector Machine, created by Vladimir Vapnik in the 60s is still one of the most popular machine learning classifiers. Support Vector Machines (SVMs) are a particularly powerful and flexible class of supervised algorithms for both classification and regression. SVM is also capable of detecting outliers.

The objective of the Support Vector Machine is to find the best splitting boundary between data. In a two-dimensional space, you can think of a splitting boundary like the best fit ‘line’ that divides your dataset. With a Support Vector Machine, we are dealing in vector space, thus the separating line is actually a separating ‘hyperplane’. The best separating hyperplane is defined as the hyperplane that contains the ‘widest’ margin between support vectors. The hyperplane may also be referred to as a decision boundary.

In this section, we will develop the intuition behind support vector machines and their use in classification problems. Here, we will consider discriminative classification in which we simply find a line or curve (in two dimensions) or manifold (in multiple dimensions) that divides the classes from each other.

As an example of this, consider the simple case of a classification task, in which the two classes of points are well separated:

A linear discriminative classifier would attempt to draw a straight line separating the two sets of data, and thereby create a model for classification. For two-dimensional data like that shown here, this is a task we could do by hand. But immediately we see a problem: there is more than one possible dividing line that can perfectly discriminate between the two classes!

These are three very different separators that, nevertheless, perfectly discriminate between these samples. Depending on which you choose, a new data point (for example, the one marked by the ‘X’ in the above figure) will be assigned a different label! Evidently our simple intuition of ‘drawing a line between classes’ is not enough and we need to think deeper.

Support Vector Machines offer one way to improve on this. So, rather than simply drawing a zero-width line between the classes, we can draw a margin of some width around each line, up to the nearest point.

In support vector machines, the line that maximizes the margin is the one we will choose as the optimal model. Support vector machines are an example of such a maximum margin estimator.

Fitting a support vector machine

Let’s see the result of an actual fit to this data: we will use Scikit-Learn’s support vector classifier to train an SVM model on this data.

To better visualize what’s happening here, let’s plot SVM decision boundaries:

This is the dividing line that maximizes the margin between two sets of points. Notice that a few of the training points just touch the margin; they are indicated by the black circles in the above figure. These points are the pivotal elements of this fit , and are known as the support vectors, and give the algorithm its name. In Scikit-Learn, the identity of these points is stored in the support_vectors_ attribute of the classifier.

A key to this classifier’s success is that for the fit, only the position of the support vectors matters; any points further from the margin that are on the correct side do not modify the fit! Technically, this is because these points do not contribute to the loss function used to fit the model, so their position and number do not matter so long as they do not cross the margin.

We can see this, for example, if we plot the model learned from the first 60 points and first 120 points of the dataset:

In the left panel, we see the model and support vectors for 60 training points. In the right panel, we have doubled the number of training points, but the model has not changed. The three support vectors from the left panel are still the support vectors from the right panel. This is one of the strengths of the SVM model.

Kernel SVM: When the data is not linearly separable

When SVM is combined with kernels, it becomes more powerful. To motivate the need for kernels, let’s look at some data that is not linearly separable:

It is clear that no linear discrimination will ever be able to separate this data. But we can think about how we might project the data into a higher dimension such that a linear separator would be sufficient. For example, one simple projection we could use would be to compute a radial basis function centered on the middle clump. We can visualize this extra data dimension using a three-dimensional plot:

We can see that with this additional dimension, the data becomes trivially linearly separable, by drawing a separating plane at, say, r=0.7.

Here we had to choose and carefully tune our projection; if we had not centered our radial basis function (RBF) in the right direction, we would not have seen such clean, linearly separable results. In general, the need to make such a choice is a problem: we would like to somehow automatically find the best basis functions to use.

One strategy to this end is to compute a basis function centered at every point in the dataset, and let the SVM algorithm examine through the results. This type of basis function transformation is known as a kernel transformation.

A potential problem with this strategy — projecting N points into N dimensions — is that it might become very computationally intensive as N grows large. However, because of a neat little procedure known as the kernel trick, a fit on kernel transformed data can be done implicitly — that is, without ever building the full N-dimensional representation of the kernel projection! This kernel trick is built into the SVM, and is one of the reasons the method is so powerful.

In Scikit-Learn, we can apply kernelized SVM simply by changing our linear kernel to an RBF (radial basis function) kernel, using the kernel model hyperparameter:

Using this kernelized support vector machine, we learn a suitable nonlinear decision boundary. This kernel transformation strategy is used often in machine learning to turn fast linear methods into fast nonlinear methods, especially for models in which the kernel trick can be used.

Softening the margins

So far we have discussed about very clean datasets, in which a perfect decision boundary exists. But what if your data has some amount of overlaps?

To handle this case, the SVM has a feature that softens the margin; that is, it allows some of the points to creep into the margin if that allows a better fit. The hardness of the margin is controlled by a tuning parameter, most often known as ‘C’. For very large ‘C’, the margin is hard and the points cannot lie in it. For smaller C, the margin is softer, and can grow to encompass some points.

The plot shown below, gives a visual picture of how a changing C parameter affects the final fit, via the softening of the margin:

The optimal value of the C parameter will depend on your dataset, and should be tuned via cross-validation or a similar procedure.

Pros and Cons associated with SVM

Pros:

SVM is effective in high dimensional spaces.
It is effective in cases where number of dimensions is greater than number of samples.
It uses a subset of training points in the decision function (called Support vectors), so it is also memory efficient.
It works well with clear margin of separation .

Cons:

SVM doesn’t directly provide probability estimates, these are calculated using an expensive five-fold cross-validation.
SVM doesn’t perform well, when we have large dataset because the required training time is higher.

要查看或添加评论，请登录

Yogesh Khurana的更多文章

Regularization — a solution to overfitting

2019年12月20日

Regularization — a solution to overfitting

Fed up with Model Over-fitting? Regularization comes to the rescue… When you are working on data to prepare a model…
A brief overview of AutoML - Automated machine learning

2019年11月7日

A brief overview of AutoML - Automated machine learning

The automated machine learning (AutoML) is the process of end-to-end automating the process of applying machine…

1 条评论
Difference between Model Validation and Model Evaluation?

2019年10月13日

Difference between Model Validation and Model Evaluation?

Model Validation Model validation is defined within regulatory guidance as “the set of processes and activities…

Support Vector Machine — an overview

Yogesh Khurana

Business Architect | Banking | Cards Payments | HOGAN-CAMS | Luxoft - DXC | Axis Bank

Yogesh Khurana的更多文章

社区洞察

其他会员也浏览了

Machine learning

Understanding Support Vector Machines (SVM) and Decision Trees in Machine Learning

Machine Learning - Hyperparameter Tuning

Simplifying Machine Learning’s Orthogonality and Orthonormality

How (not) to use Machine Learning for time series forecasting: The sequel

Unveiling the Art of Feature Selection in Machine Learning

Titanic Machine Learning from Disaster

What Is Lasso and Ridge Regression in Machine Learning?

Effective XGBoost by Matt Harrison

What is RandomizedSearchCV in Machine Learning

Yogesh Khurana的更多文章

Regularization — a solution to overfitting

A brief overview of AutoML - Automated machine learning

Difference between Model Validation and Model Evaluation?

社区洞察

其他会员也浏览了

Machine learning

Understanding Support Vector Machines (SVM) and Decision Trees in Machine Learning

Machine Learning - Hyperparameter Tuning

Simplifying Machine Learning’s Orthogonality and Orthonormality

How (not) to use Machine Learning for time series forecasting: The sequel

Unveiling the Art of Feature Selection in Machine Learning

Titanic Machine Learning from Disaster

What Is Lasso and Ridge Regression in Machine Learning?

Effective XGBoost by Matt Harrison

What is RandomizedSearchCV in Machine Learning