Understanding Support Vector Machine
Support Vector Machine: An Introduction
I have talked about Linear regression and Classification on my prior articles. Before we take on to Support Vector Machine (aka SVM), let’s first understand what it does. The objective of the support vector machine algorithm is to find a hyperplane in an N-dimensional space(N — the number of features) that distinctly classifies the data points. It is highly preferred by many as it produces significant accuracy with less computation power. SVM can be used for both regression and classification tasks. But it is widely used in classification objectives.
As sheen in the first representation on the above figure, there could be multiple ways we can chose the hyperplane to separate the data points. However, our objective is to identify the plane with a maximum margin, i.e. the distance between the data points of both the classes should be maximum with the optimal hyperplane when compared to other hyperplanes.
How does it work?
Let’s understand the functionality of SVM using different scenarios:
Scenario 1 – Here it’s pretty obvious from the below figure, that there are three hyperplanes to classify the datapoints however, the hyperplane B is defining the classification in a better way.
Scenario 2 – As we can see in the below figure, all the three hyper planes are defining the classification correctly. However, A & B has a very less margin for errors, i.e. they are very close to the data points. If there is any fluctuation in the data point values, they may fall into errors. The hyperplane C on the other hand has a significant margin for errors as it has the maximum distance from the data points of both the classes.
Scenario 3 – In the below scenario, we can see that there is one data points which can be considered as outlier. The hyperplane B has a better for margin of errors however, SVM still chooses the hyperplane A because it first identifies the hyperplane which classifies the data points better and then it compares the margin of errors.
Scenario 4 – I mentioned in the previous scenario that the SVM first tries to identify the hyperplane which classifies the data better and then thinks about the margin of errors. However, in the below example we can not classify the data points with a straight line. Hence, SVM ignores the outlier and selects the hyperplane which classifies the data point with maximum margin of errors.
Scenario 5 – We have seen only linear classification with SVM so far however, in the below example it is not possible to have a linear hyperplane that can classify the data points.
SVM solves this problem easily for us by adding another feature to the dataset.
Z = X^2 + Y^2 and plotting the datapoints on X-Z axis instead of X-Y axis.
Now that we have a completely different picture of the dataset, we can easily identify the hyperplane on X-Z axis.
There are few things that we can notice in the above figure:
- The values on the Z-axis will always be positive as it the squared sum of X and Y values
- The values of the circles (black) are closer to the origin of X axis in the actual dataset and the squares (blue) are comparatively far from the origin. Hence, the circles will always be closer to the origin and squares will be above that when seen on the Z-axis.
Data Preparation for SVM
This section lists some suggestions for how to best prepare your training data when learning an SVM model.
· Numerical Inputs: SVM assumes that the inputs are numeric. If we have categorical inputs, we may need to convert them to binary dummy variables (one variable for each category).
· Binary Classification: Basic SVM as described in this post is intended for binary (two-class) classification problems. Although, extensions have been developed for regression and multi-class classification.
SVM Tuning Parameters
In real world application, finding perfect class for millions of training data set takes lot of time. There are some parameters which can be tuned while using this algorithm to achieve at a better accuracy.
Kernel
The learning of the hyperplane in linear SVM is done by transforming the problem using some linear algebra. This is where the kernel plays role.
For linear kernel the equation for prediction for a new input using the dot product between the input (x) and each support vector (xi) is calculated as follows:
f(x) = B(0) + sum(ai * (x, xi))
This is an equation that involves calculating the inner products of a new input vector (x) with all support vectors in training data. The coefficients B0 and ai (for each input) must be estimated from the training data by the learning algorithm.
The polynomial kernel can be written as K(x,xi) = 1 + sum(x * xi)^d and exponential as K(x,xi) = exp(-gamma * sum((x — xi2)). [Source for this excerpt : https://machinelearningmastery.com/].
Regularization
The Regularization parameter (often termed as C parameter in python’s sklearn library) tells the SVM optimization how much you want to avoid misclassifying each training example.
For large values of C, the optimization will choose a smaller-margin hyperplane if that hyperplane does a better job of getting all the training points classified correctly. Conversely, a very small value of C will cause the optimizer to look for a larger margin separating hyperplane, even if that hyperplane misclassifies more points.
Gamma
The gamma parameter defines how far the influence of a single training example reaches, with low values meaning ‘far’ and high values meaning ‘close’. In other words, with low gamma, points far away from plausible separation line are considered in calculation for the separation line. Whereas high gamma means the points close to plausible line are considered in calculation.
Margin
And finally, last but very important characteristic of SVM classifier. SVM to core tries to achieve a good margin. A good margin is one where this separation is larger for both the classes. Images below gives to visual example of good and bad margin. A good margin allows the points to be in their respective classes without crossing to other class.
Implementing SVM using Python
Python uses the SKLEARN library for the SVM algorithm.
Suggested readings:
- Liner regression (https://www.dhirubhai.net/pulse/linear-regression-gautam-k/)
- Classification (https://www.dhirubhai.net/pulse/classification-data-science-gautam-k/)