SVM (Support Vector Machine)
An internet search on the topmost Machine Learning Algorithms will list SVM in the top 5 (if not the top three) in most of the listings. The reason is SVM can be used as a classification as well as a Regression Algorithm. It can be used for both Linear as well as Non-Linear Regression.
What makes it so awesome is the flexibility it gives to alter the Tuning parameters. This helps in overcoming the over-fitting problem and also makes it robust to outliers owing to the tuning parameters like Kernel, Cost and Gamma function.
Introduction:
In SVM algorithm, we plot each data item as a point in n-dimensional space (where n is the number of features) with the value of each feature being the value of a particular coordinate. Then, we perform classification by finding the hyper-plane that differentiates the classes aptly. As reflected in the first diagram, all the three lines divide the two categories very well. But (as shown in second diagram) we will need to choose a line which gives the Maximum Margin. The Distance between positive and negative hyperplane is called as Maximum Margin.
Detailing:
- Underlying Model - To use SVM as Regression we use SVR instead of SVM.
- Tuning -There are primarily three parameters for model tuning - Kernel, Cost and Gamma function.
2.1 Kernel Function:
It gives the option to take a non-linear boundary. The options available for tuning with Kernel are "linear", "poly","rbf" etc. "rbf" and "poly" are useful for non-linear hyper-plane. Default value is "rbf". The figures below clearly demarcate the difference between Linear and Non Linear Kernels.
The Kernel helps to transform from 1D to 2D to 3D. Below two diagrams show the two scenarios. If the the data is in say n dimensions, the SVM will convert it to n+1 dimension.
It is tough to visualize higher dimension graphs. Below figure tries this to some extent
2.2 Gamma Parameter:
It defines the 'far' influence of a single training data.
2.2.1 High Gamma: Higher the values of Gamma , Close is the reach. Implying, only the points closer to the demarcation line have an influence on the line. Higher the value of gamma, the model will try to exactly fit the training data set. This will lead to generalization error and cause over-fitting problem.
2.2.2 Low Gamma: Lower the values of Gamma, Far is the reach. Implying, that far away points have higher influence on the line. From the figure it can be implied about the influence of the far away points on the line, thus making it slightly linear.
Below comparison will give a good gist of the implications of varying Gamma Parameters.
2.3 Cost Parameter 'C':
This parameter works towards enabling the SVM model getting everything right v/s getting the things that it gets 'very' right.
- If there is a High C, then the Demarcation hyperplane will be precisely correct.
- In Medium C, the aim is to have a larger separation between the points even if the accuracy is decreased a bit
- In Low C, it talks about maximizing the margin
The Cost Parameter C is used for Error controlling. It controls the trade off between smooth decision boundary and classifying the training points correctly.
A low C makes the decision surface smooth, while a High C aims at classifying all training examples correctly by giving the model freedom to select more samples as support vectors
These Model Tuning Parameters make SVM a very powerful Machine Learning Algorithm
Steps to Implement a SVM:
1. Split the data into Train and Test DataSet
2. Fit SVM to the Training DataSet
3. Predict the Test DataSet
4. Do Model Evaluation
5. Perform Model Tuning based on the above parameters
To Implement SVM in R, use the e.1071 package
Applications of SVM:
1. Spam Classification
2. Text Categorization
3. Handwriting Recognition
4. Image Classification
5. Speaker Identification
6. Face Detection
7. Bioinformatics
Disadvantages:
- In SVM, we are not able to get a view of the variable/feature weightage and individual impact on model generation. So the feature influence is a black box
- Since the final model is not easy to see, we cannot do small calibrations to the model hence its tough to incorporate our business logic. This issue aggravates since we are not able to see the impact of weights
- SVM also doesn't perform well when the data set has more noise