登录查看更多内容

Understanding Support Vector Machine

Gautam Kumar

Senior Lead Engineer

发布日期: 2020年4月5日

Support Vector Machine: An Introduction

I have talked about Linear regression and Classification on my prior articles. Before we take on to Support Vector Machine (aka SVM), let’s first understand what it does. The objective of the support vector machine algorithm is to find a hyperplane in an N-dimensional space(N — the number of features) that distinctly classifies the data points. It is highly preferred by many as it produces significant accuracy with less computation power. SVM can be used for both regression and classification tasks. But it is widely used in classification objectives.

As sheen in the first representation on the above figure, there could be multiple ways we can chose the hyperplane to separate the data points. However, our objective is to identify the plane with a maximum margin, i.e. the distance between the data points of both the classes should be maximum with the optimal hyperplane when compared to other hyperplanes.

How does it work?

Let’s understand the functionality of SVM using different scenarios:

Scenario 1 – Here it’s pretty obvious from the below figure, that there are three hyperplanes to classify the datapoints however, the hyperplane B is defining the classification in a better way.

Scenario 2 – As we can see in the below figure, all the three hyper planes are defining the classification correctly. However, A & B has a very less margin for errors, i.e. they are very close to the data points. If there is any fluctuation in the data point values, they may fall into errors. The hyperplane C on the other hand has a significant margin for errors as it has the maximum distance from the data points of both the classes.

Scenario 3 – In the below scenario, we can see that there is one data points which can be considered as outlier. The hyperplane B has a better for margin of errors however, SVM still chooses the hyperplane A because it first identifies the hyperplane which classifies the data points better and then it compares the margin of errors.

Scenario 4 – I mentioned in the previous scenario that the SVM first tries to identify the hyperplane which classifies the data better and then thinks about the margin of errors. However, in the below example we can not classify the data points with a straight line. Hence, SVM ignores the outlier and selects the hyperplane which classifies the data point with maximum margin of errors.

Scenario 5 – We have seen only linear classification with SVM so far however, in the below example it is not possible to have a linear hyperplane that can classify the data points.

SVM solves this problem easily for us by adding another feature to the dataset.

Z = X^2 + Y^2 and plotting the datapoints on X-Z axis instead of X-Y axis.

Now that we have a completely different picture of the dataset, we can easily identify the hyperplane on X-Z axis.

There are few things that we can notice in the above figure:

The values on the Z-axis will always be positive as it the squared sum of X and Y values
The values of the circles (black) are closer to the origin of X axis in the actual dataset and the squares (blue) are comparatively far from the origin. Hence, the circles will always be closer to the origin and squares will be above that when seen on the Z-axis.

Data Preparation for SVM

This section lists some suggestions for how to best prepare your training data when learning an SVM model.

· Numerical Inputs: SVM assumes that the inputs are numeric. If we have categorical inputs, we may need to convert them to binary dummy variables (one variable for each category).

· Binary Classification: Basic SVM as described in this post is intended for binary (two-class) classification problems. Although, extensions have been developed for regression and multi-class classification.

SVM Tuning Parameters

In real world application, finding perfect class for millions of training data set takes lot of time. There are some parameters which can be tuned while using this algorithm to achieve at a better accuracy.

Kernel

The learning of the hyperplane in linear SVM is done by transforming the problem using some linear algebra. This is where the kernel plays role.

For linear kernel the equation for prediction for a new input using the dot product between the input (x) and each support vector (xi) is calculated as follows:

f(x) = B(0) + sum(ai * (x, xi))

This is an equation that involves calculating the inner products of a new input vector (x) with all support vectors in training data. The coefficients B0 and ai (for each input) must be estimated from the training data by the learning algorithm.

The polynomial kernel can be written as K(x,xi) = 1 + sum(x * xi)^d and exponential as K(x,xi) = exp(-gamma * sum((x — xi2)). [Source for this excerpt : https://machinelearningmastery.com/].

Regularization

The Regularization parameter (often termed as C parameter in python’s sklearn library) tells the SVM optimization how much you want to avoid misclassifying each training example.

For large values of C, the optimization will choose a smaller-margin hyperplane if that hyperplane does a better job of getting all the training points classified correctly. Conversely, a very small value of C will cause the optimizer to look for a larger margin separating hyperplane, even if that hyperplane misclassifies more points.

Gamma

The gamma parameter defines how far the influence of a single training example reaches, with low values meaning ‘far’ and high values meaning ‘close’. In other words, with low gamma, points far away from plausible separation line are considered in calculation for the separation line. Whereas high gamma means the points close to plausible line are considered in calculation.

Margin

And finally, last but very important characteristic of SVM classifier. SVM to core tries to achieve a good margin. A good margin is one where this separation is larger for both the classes. Images below gives to visual example of good and bad margin. A good margin allows the points to be in their respective classes without crossing to other class.

Implementing SVM using Python

Python uses the SKLEARN library for the SVM algorithm.

Suggested readings:

Liner regression (https://www.dhirubhai.net/pulse/linear-regression-gautam-k/)
Classification (https://www.dhirubhai.net/pulse/classification-data-science-gautam-k/)

要查看或添加评论，请登录

Gautam Kumar的更多文章

Treating outliers on a dataset

2022年12月23日

Treating outliers on a dataset

An outlier is an observation that lies an abnormal distance from other values in a random sample from a population. In…
What is Cloud Computing

2021年3月15日

What is Cloud Computing

The most simplistic definition of cloud computing is the delivery of on-demand IT services over the internet. The…

1 条评论
An Introduction to Lambda Function

2020年4月16日

An Introduction to Lambda Function

Functions are basically piece of codes which execute only when we invoke them. For any programming language, functions…
Classification in Data Science

2020年3月31日

Classification in Data Science

What is Classification? Although classification can be performed on both structured and unstructured data, it is mainly…
Understanding the basics of Data Clustering

2020年3月30日

Understanding the basics of Data Clustering

Clustering Clustering is the task of dividing the population or data points into a few groups such that data points in…
Multicollinearity - understanding the relationship between variables

2020年3月23日

Multicollinearity - understanding the relationship between variables

Multicollinearity Multicollinearity or simply collinearity is defined by the phenomenon in which two or more…
Dimension Reduction - Principal Component Analysis (aka PCA)

2020年3月20日

Dimension Reduction - Principal Component Analysis (aka PCA)

Being in an era of data flowing from every here and there, we often come across scenarios that we gather way too much…

2 条评论
Understanding the ROC & AUC

2020年3月15日

Understanding the ROC & AUC

Introduction In any type of machine learning, we need to calculate the accuracy of the model for performance…
Linear Regression

2020年3月15日

Linear Regression

When it comes to supervised machine learning, there are two types of learning algorithms: Regression – this basically…

See all articles

Understanding Support Vector Machine

Gautam Kumar

Senior Lead Engineer

Support Vector Machine: An Introduction

How does it work?

Data Preparation for SVM

SVM Tuning Parameters

Implementing SVM using Python

Gautam Kumar的更多文章

社区洞察

其他会员也浏览了

Elastic Net Regression: Combining Both Ridge & Lasso

Digging Deeper: How Combining 4M & SUMDEx is Revolutionizing Utility Mapping for the Future

How to Deal with Multicollinearity?

RANDOM FOREST MODEL(RFM)

Effective XGBoost by Matt Harrison

A Tutorial on Ridge and Lasso Regression

Demystifying the K-Nearest Neighbors (KNN) Algorithm: A Deep Dive into Its Mechanics and Applications

Kernel method in stock prices anomaly detection

K- Nearest Neighbors Explaination

Machine Learning Unveils House Price Predictions!

Support Vector Machine: An Introduction

How does it work?

Data Preparation for SVM

SVM Tuning Parameters

Implementing SVM using Python

Gautam Kumar的更多文章

Treating outliers on a dataset

What is Cloud Computing

An Introduction to Lambda Function

Classification in Data Science

Understanding the basics of Data Clustering

Multicollinearity - understanding the relationship between variables

Dimension Reduction - Principal Component Analysis (aka PCA)

Understanding the ROC & AUC

Linear Regression

社区洞察

其他会员也浏览了

Elastic Net Regression: Combining Both Ridge & Lasso

Digging Deeper: How Combining 4M & SUMDEx is Revolutionizing Utility Mapping for the Future

How to Deal with Multicollinearity?

RANDOM FOREST MODEL(RFM)

Effective XGBoost by Matt Harrison

A Tutorial on Ridge and Lasso Regression

Demystifying the K-Nearest Neighbors (KNN) Algorithm: A Deep Dive into Its Mechanics and Applications

Kernel method in stock prices anomaly detection

K- Nearest Neighbors Explaination

Machine Learning Unveils House Price Predictions!