Understanding Support Vector Machine

No alt text provided for this image

Support Vector Machine: An Introduction

I have talked about Linear regression and Classification on my prior articles. Before we take on to Support Vector Machine (aka SVM), let’s first understand what it does. The objective of the support vector machine algorithm is to find a hyperplane in an N-dimensional space(N — the number of features) that distinctly classifies the data points. It is highly preferred by many as it produces significant accuracy with less computation power. SVM can be used for both regression and classification tasks. But it is widely used in classification objectives.

No alt text provided for this image

As sheen in the first representation on the above figure, there could be multiple ways we can chose the hyperplane to separate the data points. However, our objective is to identify the plane with a maximum margin, i.e. the distance between the data points of both the classes should be maximum with the optimal hyperplane when compared to other hyperplanes.

How does it work?

Let’s understand the functionality of SVM using different scenarios:

Scenario 1 – Here it’s pretty obvious from the below figure, that there are three hyperplanes to classify the datapoints however, the hyperplane B is defining the classification in a better way.

No alt text provided for this image

Scenario 2 – As we can see in the below figure, all the three hyper planes are defining the classification correctly. However, A & B has a very less margin for errors, i.e. they are very close to the data points. If there is any fluctuation in the data point values, they may fall into errors. The hyperplane C on the other hand has a significant margin for errors as it has the maximum distance from the data points of both the classes.

No alt text provided for this image

Scenario 3 – In the below scenario, we can see that there is one data points which can be considered as outlier. The hyperplane B has a better for margin of errors however, SVM still chooses the hyperplane A because it first identifies the hyperplane which classifies the data points better and then it compares the margin of errors.

No alt text provided for this image

Scenario 4 – I mentioned in the previous scenario that the SVM first tries to identify the hyperplane which classifies the data better and then thinks about the margin of errors. However, in the below example we can not classify the data points with a straight line. Hence, SVM ignores the outlier and selects the hyperplane which classifies the data point with maximum margin of errors.

No alt text provided for this image

Scenario 5 – We have seen only linear classification with SVM so far however, in the below example it is not possible to have a linear hyperplane that can classify the data points.

No alt text provided for this image

SVM solves this problem easily for us by adding another feature to the dataset.

Z = X^2 + Y^2 and plotting the datapoints on X-Z axis instead of X-Y axis.

No alt text provided for this image

Now that we have a completely different picture of the dataset, we can easily identify the hyperplane on X-Z axis.

There are few things that we can notice in the above figure:

  • The values on the Z-axis will always be positive as it the squared sum of X and Y values
  • The values of the circles (black) are closer to the origin of X axis in the actual dataset and the squares (blue) are comparatively far from the origin. Hence, the circles will always be closer to the origin and squares will be above that when seen on the Z-axis.

Data Preparation for SVM

This section lists some suggestions for how to best prepare your training data when learning an SVM model.

·        Numerical Inputs: SVM assumes that the inputs are numeric. If we have categorical inputs, we may need to convert them to binary dummy variables (one variable for each category).

·        Binary Classification: Basic SVM as described in this post is intended for binary (two-class) classification problems. Although, extensions have been developed for regression and multi-class classification.

SVM Tuning Parameters

In real world application, finding perfect class for millions of training data set takes lot of time. There are some parameters which can be tuned while using this algorithm to achieve at a better accuracy.

Kernel

The learning of the hyperplane in linear SVM is done by transforming the problem using some linear algebra. This is where the kernel plays role.

For linear kernel the equation for prediction for a new input using the dot product between the input (x) and each support vector (xi) is calculated as follows:

f(x) = B(0) + sum(ai * (x, xi))

This is an equation that involves calculating the inner products of a new input vector (x) with all support vectors in training data. The coefficients B0 and ai (for each input) must be estimated from the training data by the learning algorithm.

The polynomial kernel can be written as K(x,xi) = 1 + sum(x * xi)^d and exponential as K(x,xi) = exp(-gamma * sum((x — xi2)). [Source for this excerpt : https://machinelearningmastery.com/].

Regularization

The Regularization parameter (often termed as C parameter in python’s sklearn library) tells the SVM optimization how much you want to avoid misclassifying each training example.

For large values of C, the optimization will choose a smaller-margin hyperplane if that hyperplane does a better job of getting all the training points classified correctly. Conversely, a very small value of C will cause the optimizer to look for a larger margin separating hyperplane, even if that hyperplane misclassifies more points.

Gamma

The gamma parameter defines how far the influence of a single training example reaches, with low values meaning ‘far’ and high values meaning ‘close’. In other words, with low gamma, points far away from plausible separation line are considered in calculation for the separation line. Whereas high gamma means the points close to plausible line are considered in calculation.

Margin

And finally, last but very important characteristic of SVM classifier. SVM to core tries to achieve a good margin. A good margin is one where this separation is larger for both the classes. Images below gives to visual example of good and bad margin. A good margin allows the points to be in their respective classes without crossing to other class.

Implementing SVM using Python

Python uses the SKLEARN library for the SVM algorithm.

No alt text provided for this image


Suggested readings:

  • Liner regression (https://www.dhirubhai.net/pulse/linear-regression-gautam-k/)
  • Classification (https://www.dhirubhai.net/pulse/classification-data-science-gautam-k/)

 


要查看或添加评论,请登录

Gautam Kumar的更多文章

  • Treating outliers on a dataset

    Treating outliers on a dataset

    An outlier is an observation that lies an abnormal distance from other values in a random sample from a population. In…

  • What is Cloud Computing

    What is Cloud Computing

    The most simplistic definition of cloud computing is the delivery of on-demand IT services over the internet. The…

    1 条评论
  • An Introduction to Lambda Function

    An Introduction to Lambda Function

    Functions are basically piece of codes which execute only when we invoke them. For any programming language, functions…

  • Classification in Data Science

    Classification in Data Science

    What is Classification? Although classification can be performed on both structured and unstructured data, it is mainly…

  • Understanding the basics of Data Clustering

    Understanding the basics of Data Clustering

    Clustering Clustering is the task of dividing the population or data points into a few groups such that data points in…

  • Multicollinearity - understanding the relationship between variables

    Multicollinearity - understanding the relationship between variables

    Multicollinearity Multicollinearity or simply collinearity is defined by the phenomenon in which two or more…

  • Dimension Reduction - Principal Component Analysis (aka PCA)

    Dimension Reduction - Principal Component Analysis (aka PCA)

    Being in an era of data flowing from every here and there, we often come across scenarios that we gather way too much…

    2 条评论
  • Understanding the ROC & AUC

    Understanding the ROC & AUC

    Introduction In any type of machine learning, we need to calculate the accuracy of the model for performance…

  • Linear Regression

    Linear Regression

    When it comes to supervised machine learning, there are two types of learning algorithms: Regression – this basically…

社区洞察

其他会员也浏览了