Support Vector Machine (SVM)
Waqas Ali - FCMA, CAIS, ADMA, CDS
Chief AI Scientist @ BRB Group | AI Strategy, Team Leadership
Like logistic regression, a Support Vector Machine (SVM) is a linear classifier, meaning that it produces a hyperplane in vector space that attempts to separate the two classes of the dataset.
The difference between logistic regression and SVMs is the?loss function. Logistic regression uses a log-likelihood function that penalizes all points proportional to the error in the probability estimate, even those on the correct side of the hyperplane.
An SVM, on the other hand, uses hinge loss, which only penalizes points on the wrong side of the hyperplane or very close to it on the right side.
What is a Support Vector Machine?
The SVM (Support Vector Machine) classifier attempts to find the maximum margin hyperplane separating the two classes, where margin indicates the distance between the separation plane and the closest data points on either side.
In the case where the data is not linearly separable, the points in the margin are penalized proportionally to their distance from the margin. The figure below shows a concrete example: the two classes are represented by white and black dots respectively. The solid line is the separation plane, and the dotted lines are the margins.
The square points are the support vectors; that is, those which provide a non-zero contribution to the loss function.
领英推荐
How SVM Work?
To classify a new data point x, we simply determine which side of the plane x falls on. If we want to get a real-valued score we can compute the distance from x to the separating plane and then apply a sigmoid to map to [0,1].?
The real power of SVMs comes from the kernel trick, which is a mathematical transformation that takes a linear decision boundary and produces a nonlinear boundary. At a high level, a kernel transforms one vector space to another space.
Support Vector Machine: Benefits and Limitations
SVMs have shown very good performance in practice, especially in large spaces, and the fact that they can be described in terms of support vectors leads to efficient implementations for marking new data points.
However, the complexity of forming a kernel SVM grows quadratically with the number of training samples, so that for training set sizes greater than a few million, the kernels are rarely used, and the decision limit is linear.
Another drawback is that the scores produced by SVMs cannot be interpreted as probabilities; converting scores to probabilities requires additional computation and cross-validation, for example using Platt scaling or isotonic regression.
?