SVM Parameter Optimization with Python: A Step-by-Step Guide
Usama Zafar
PhD Aspirant | Volunteer Teacher @ iCodeGuru | Software & Machine Learning Engineer
Support Vector Machines (SVM) are widely used in machine learning for classification and regression tasks. However, the performance of an SVM model depends heavily on its parameter settings, such as the kernel type, the penalty parameter C, and the kernel coefficient gamma. Therefore, optimizing these parameters is critical for achieving better accuracy and generalization. In this article, we will discuss the importance of parameter optimization, the different ways to optimize SVM models, and how to implement them in Python.
Why We Need Parameter Optimization:
In SVM, the parameter optimization problem can be formulated as finding the optimal hyperplane that maximizes the margin between the two classes while minimizing the classification error. However, selecting the right hyperparameters is a challenging task that requires trial and error. The default parameter settings may not always be optimal for the given dataset, resulting in poor performance, overfitting, or underfitting. Therefore, we need parameter optimization to fine-tune the SVM model and improve its predictive power.
Importance of Optimization:
Optimizing the SVM model's parameters has several benefits, including:
How Many Ways to Optimize SVM Model?
There are several ways to optimize an SVM model, including:
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVC
# Load dataset
data = load_breast_cancer()
X, y =,
# Define parameter grids
param_grid = {'C': [0.1, 1, 10, 100],
???????'gamma': [0.01, 0.1, 1, 10],
???????'kernel': ['rbf', 'linear', 'poly']}
# Perform grid search
svc = SVC()
grid_search = GridSearchCV(svc, param_grid, cv=5), y)
# Print best parameters and accuracy
print('Best parameters:', grid_search.best_params_)
print('Best accuracy:', grid_search.best_score_)
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import RandomizedSearchCV
from sklearn.svm import SVC
from scipy.stats import uniform
# Load dataset
data = load_breast_cancer()
X, y =,
# Define parameter distributions
param_dist = {'C': uniform(0.1, 100),
???????'gamma': uniform(0.01, 10),
???????'kernel': ['rbf', 'linear', 'poly']}
# Perform random search
svc = SVC()
random_search = RandomizedSearchCV(svc, param_distributions=param_dist, cv=5, n_iter=50), y)
# Print best parameters and accuracy
print('Best parameters:', random_search.best_params_)
print('Best accuracy:', random_search.best_score_)
from sklearn.datasets import load_breast_cancer
from sklearn.svm import SVC
from skopt import gp_minimize
from import Real, Categorical
from skopt.utils import use_named_args
from sklearn.model_selection import cross_val_score
# Load dataset
data = load_breast_cancer()
X, y =,
# Define parameter space
param_space = [Real(0.1, 100, 'log-uniform', name='C'),
????????Real(0.01, 10, 'log-uniform', name='gamma'),
????????Categorical(['rbf', 'linear', 'poly'], name='kernel')]
# Define objective function
def objective(**params):
??svc =
from sklearn.datasets import load_breast_cancer
from sklearn.svm import SVC
from deap import algorithms, base, creator, tools
import random
# Load dataset
data = load_breast_cancer()
X, y =,
# Define fitness function
def evaluate(individual):
??svc = SVC(C=individual[0], gamma=individual[1], kernel=individual[2])
??scores = cross_val_score(svc, X, y, cv=5)
??return scores.mean(),
# Define genetic operators
def create_individual():
??return [10 ** random.uniform(-1, 2), 10 ** random.uniform(-2, 1), random.choice(['rbf', 'linear', 'poly'])]
def mutate(individual):
??index = random.randint(0, 2)
??if index == 0:
????individual[index] = 10 ** random.uniform(-1, 2)
??elif index == 1:
????individual[index] = 10 ** random.uniform(-2, 1)
????individual[index] = random.choice(['rbf', 'linear', 'poly'])
??return individual,
# Define genetic algorithm
creator.create('FitnessMax', base.Fitness, weights=(1.0,))
creator.create('Individual', list, fitness=creator.FitnessMax)
toolbox = base.Toolbox()
toolbox.register('individual', tools.initIterate, creator.Individual, create_individual)
toolbox.register('population', tools.initRepeat, list, toolbox.individual)
toolbox.register('evaluate', evaluate)
toolbox.register('mate', tools.cxTwoPoint)
toolbox.register('mutate', mutate)
toolbox.register('select', tools.selTournament, tournsize=3)
pop = toolbox.population(n=50)
hof = tools.HallOfFame(1)
stats = tools.Statistics(lambda ind:
stats.register('mean', lambda x: sum(x) / len(x))
stats.register('std', lambda x: (sum((xi - stats['mean'](x)) ** 2 for xi in x) / len(x)) ** 0.5)
stats.register('min', lambda x: min(x))
stats.register('max', lambda x: max(x))
pop, log = algorithms.eaSimple(pop, toolbox, cxpb=0.5, mutpb=0.2, ngen=50, stats=stats, halloffame=hof, verbose=True)
# Print best parameters and accuracy
best_individual = hof[0]
print('Best parameters:', {'C': best_individual[0], 'gamma': best_individual[1], 'kernel': best_individual[2]})
svc = SVC(C=best_individual[0], gamma=best_individual[1], kernel=best_individual[2])
scores = cross_val_score(svc, X, y, cv=5)
print('Best accuracy:', scores.mean())
These are some of the most popular techniques used for SVM parameter optimization. Depending on the problem and the size of the parameter space, one of these methods may be more suitable than the others. It's always a good idea to try multiple techniques and compare their results to find the best hyperparameters for your model.
PhD Aspirant | Gold MLSA @Microsoft | x4 Int'l Hackathons Winner ?? | Top RA @Hackmakers | DSA Trainer, Admin @iCodeGuru | TA @Stanford CIP | x30 times Public Guest Speaker ???
1 年Thanks for sharing such an amazing guide on SVM Optimization ??
Data Scientist | Django Developer | Machine Learning | Deep Learning | Python
1 年Thanks for sharing