Support vector machine classifier with regularisation
Jakub Polec
20+ yrs in Tech & Finance & Quant | ex-Microsoft/Oracle/CERN | IT / Cloud Architecture Leader | AI/ML Data Scientist | SaaS & Fintech
The SVM is a powerful, yet intuitive machine learning model that excels in binary classification tasks. It operates on a simple principle: find the best boundary that separates classes of data with the maximum margin. But what makes SVM stand out is its versatility — by incorporating different kernel functions, it can tackle both linear and nonlinear datasets.
Our focus here was on the linear SVM with L2 regularization, which brings an added twist: it not only strives for the best classification boundary but also balances model complexity and generalization by preventing overfitting. The 'hinge loss' component, a staple in SVM models, ensures that predictions that are correct but close to the boundary are penalised, nudging the boundary away from the data points to ensure a buffer zone or 'margin'.
Basically:
L1 regularization (Lasso) adds the sum of the absolute values of the weights to the loss function, which encourages sparsity—meaning some weights can become exactly zero. This can be beneficial when you have many features but believe that only a few are actually important, as it helps with feature selection.
L2 regularization (Ridge) adds the sum of the squares of the weights to the loss function. It tends to spread the error among all the weights, shrinking them closer to zero but rarely to zero. This is helpful when most features have some influence on the output and you want to keep all of them in the model.
In essence, use L1 when you want to reduce the number of features, and L2 when you want to penalize large weights more severely without discarding features entirely.
领英推荐
Let's see code with L2 penalty and high loss, as svm_clf:
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.svm import LinearSVC
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
# We'll use the Iris dataset for demonstration
# SVMs are binary classifiers, so we need to filter the data to only use two classes
iris = datasets.load_iris()
X = iris["data"][:, (2, 3)] # we only take petal length and width for simplicity
y = (iris["target"] == 2).astype(np.float64) # Iris-Virginica
# Set regularization parameter
l1_regularization = 0.5
# Create a LinearSVC model with L2 penalty and hinge loss
svm_clf = Pipeline([
("scaler", StandardScaler()),
("linear_svc", LinearSVC(C=l1_regularization, loss="hinge", penalty="l2", dual=True, max_iter=10000))
])
# Fit the model
svm_clf.fit(X, y)
# Get the parameters
beta = svm_clf.named_steps["linear_svc"].coef_[0]
intercept = svm_clf.named_steps["linear_svc"].intercept_[0]
# Visualize the dataset and the decision boundary
plt.figure(figsize=(10, 6))
plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.bwr)
ax = plt.gca()
xlim = ax.get_xlim()
ylim = ax.get_ylim()
# Create grid to evaluate model
xx = np.linspace(xlim[0], xlim[1], 30)
yy = np.linspace(ylim[0], ylim[1], 30)
YY, XX = np.meshgrid(yy, xx)
xy = np.vstack([XX.ravel(), YY.ravel()]).T
Z = (np.dot(xy, beta) - intercept).reshape(XX.shape)
# Plot decision boundary and margins
contours = ax.contour(XX, YY, Z, levels=[0], linestyles=["-"], colors='black')
plt.xlabel("Petal length")
plt.ylabel("Petal width")
plt.title("SVM Classifier with L2 Regularization and Hinge Loss")
plt.show()
and output:
The visualization paints a clear picture: data points on one side belong to one class, and those on the other side to another. The separation is so distinct that even a layperson can appreciate the underlying patterns the SVM has unearthed.
To achieve this, we leveraged Python's rich ecosystem, particularly sklearn, to build and train our SVM model. The process was straightforward: scale the features, fit the model, and draw the boundary. Despite the simplicity in implementation, the outcome is profound — a testament to the power of combining robust algorithms with effective data scaling.