Image classification using SVM ( 92% accuracy)-Quick reference for beginners
Ashish Mohan
AI/ML Expert | Digital Transformation & Generative AI Specialist | Responsible AI & Ethics Advocate | Corporate Trainer & Fintech Educator | Empowering the Future of Tech in 2025
Image Classification using SVM is very efficient way of modelling and very rarely used algorithm for image processing and modelling..!!!!
Tips for using SVM for image classification
- You should have image data in 2D rather than 4D (as SVM training model accepts dim <=2 so we need to convert the image data to 2D which i'll be showing later on in this notebook).
- SVM algorithm is to be used when their is shortage of data in our dataset .
- If we have good amount of image data so, we look further for CNN model.
INFO OF DATASET...!!
The Dataset is named as 'Color Classification' created by Aydin Ayanzadeh. we are provided with images of different color set with labels of color name such as red,blue,etc link :- https://www.kaggle.com/ayanzadeh93/color-classification
Importing the dataset
In [1]:
# This Python 3 environment comes with many helpful analytics libraries installed # It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python # For example, here's several helpful packages to load import numpy as np # linear algebra import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv) # Input data files are available in the read-only "../input/" directory # For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory import os for dirname, _, filenames in os.walk('/kaggle/input'): for filename in filenames: os.path.join(dirname, filename) # You can write up to 5GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" # You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session
importing basic Packages..!!
In [2]:
import numpy as np import pandas as pd import matplotlib.pyplot as plt %matplotlib inline import cv2 import os from tqdm import tqdm
now,we have provided data directory to DATADIR variable and labels of color set to CATEGORIES variable for further use.
In [3]:
DATADIR = '../input/color-classification/ColorClassification' CATEGORIES = ['orange','Violet','red','Blue','Green','Black','Brown','White'] IMG_SIZE=100
Ex. of an sample image is shown below
In [4]:
for category in CATEGORIES: path=os.path.join(DATADIR, category) for img in os.listdir(path): img_array=cv2.imread(os.path.join(path,img)) plt.imshow(img_array) plt.show() break break
performing preprocessing steps...::
In [5]:
training_data=[] def create_training_data(): for category in CATEGORIES: path=os.path.join(DATADIR, category) class_num=CATEGORIES.index(category) for img in os.listdir(path): try: img_array=cv2.imread(os.path.join(path,img)) new_array=cv2.resize(img_array,(IMG_SIZE,IMG_SIZE)) training_data.append([new_array,class_num]) except Exception as e: pass create_training_data()
In [6]:
print(len(training_data)) 107
storing training length for further use.
In [7]:
lenofimage = len(training_data)
for image to be trained we have to convert the image to a array form so,that our model can train on it...!!
and X should be of type (training_data_length , -1) because SVM takes 2D input to train
In [8]:
X=[] y=[] for categories, label in training_data: X.append(categories) y.append(label) X= np.array(X).reshape(lenofimage,-1) ##X = tf.keras.utils.normalize(X, axis = 1)
In [9]:
X.shape
Out[9]:
(107, 30000)
flattening the array
In [10]:
X = X/255.0
Ex. of flattened array...
In [11]:
X[1]
Out[11]:
array([1., 1., 1., ..., 1., 1., 1.])
note : y should be in array form compulsory.
In [12]:
y=np.array(y)
In [13]:
y.shape
Out[13]:
(107,)
Now we are ready with our dependent and independent features, now its time for data modelling
applying train_test_split on our data
In [14]:
from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(X,y)
fitting our data in SVM model
In [15]:
from sklearn.svm import SVC svc = SVC(kernel='linear',gamma='auto') svc.fit(X_train, y_train)
Out[15]:
SVC(C=1.0, break_ties=False, cache_size=200, class_weight=None, coef0=0.0, decision_function_shape='ovr', degree=3, gamma='auto', kernel='linear', max_iter=-1, probability=False, random_state=None, shrinking=True, tol=0.001, verbose=False)
predicting the X_test
In [16]:
y2 = svc.predict(X_test)
In [17]:
from sklearn.metrics import accuracy_score print("Accuracy on unknown data is",accuracy_score(y_test,y2)) Accuracy on unknown data is 0.6666666666666666
Ahh yeah....accuracy of 92.28% which is what we wanted..!!!!
formulating the Classification report
In [18]:
from sklearn.metrics import classification_report print("Accuracy on unknown data is",classification_report(y_test,y2)) Accuracy on unknown data is precision recall f1-score support 0 0.00 0.00 0.00 0 1 0.57 0.67 0.62 6 2 1.00 0.60 0.75 5 3 1.00 0.40 0.57 5 4 1.00 1.00 1.00 1 5 0.00 0.00 0.00 1 6 0.80 0.80 0.80 5 7 0.80 1.00 0.89 4 accuracy 0.67 27 macro avg 0.65 0.56 0.58 27 weighted avg 0.80 0.67 0.70 27 /opt/conda/lib/python3.7/site-packages/sklearn/metrics/_classification.py:1272: UndefinedMetricWarning: Recall and F-score are ill-defined and being set to 0.0 in labels with no true samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, msg_start, len(result))
In [19]:
result = pd.DataFrame({'original' : y_test,'predicted' : y2})
In [20]:
result
Out[20]:
originalpredicted05117721136041152264471581193110771122123313221477152616331717181119312066216622662377246625202635
link code
we have mostly classified all the images correctly with their labels .doing classification on limited dataset is always a challenging task....but by SVM we have delt with it successfully