Creating a CNN Model for Image Classification with TensorFlow
Shanza Khan
Computer Science | Artifical Intelligence | Machine Learning | Deep Learning | Data Science | Graphic Designer | Social Media Marketing | Diploma in Computer Operator | Networking & Operating System
Creating a CNN Model for Image Classification with TensorFlow
Artificial neural networks are an artificial intelligence model inspired by the functioning of the human brain. Artificial neural networks consist of interconnected neurons. Neurons receive input data, process it, and produce output. This structure occurs in layers. Layers process the raw input data to make more useful inferences and contribute to extracting more meaningful information that can help solve relevant problems. Artificial neural networks consist of three different layers: Input Layer, Hidden Layer, and Output Layer.
Input Layer: The layer to which we send the data to the neural network. No changes are made to the input data in this layer, and the values of the input observations entered are transferred to the next layer, which is the Hidden Layer.
Hidden Layer: The learning process takes place in this layer. The information received is processed here.
Output Layer: It is where the desired information to be learned by the neural networks based on the input data and the comparison of the learned value is made. Once the operations here are completed, the result of this process appears in the output layer.
In this article, I discussed the structure of Convolutional Neural Networks (CNN) and the steps of creating a CNN model.
A Convolutional Neural Network (CNN) is used in the field of deep learning for tasks such as image recognition, image classification, and object detection. The CNN algorithm, which classifies by capturing features in different images, consists of different laye
s.
Let’s examine the functions of CNN layers
Convolution Layer: This layer is where features are detected. It distinguishes parts of the images that create differences between images. Filtering operations are performed on the image in this layer.
Pooling Layer: It is used to reduce the size of the image without losing its features. Typically, operations such as maximum pooling or average pooling are used to preserve the most significant features. This way, for example, if there is a large image, the number of pixels is reduced. It is used to speed up the process by reducing computational complexity. Features are preserved in the pooling layer, the size is reduced, and there is no loss of information.
Flattening Layer: It performs the process of resizing the incoming data and preparing the data for the neural network. The matrix is converted into a vector.
Fully-Connected Layer: This is the step where the artificial neural network model is created.
Let’s look at the steps to create a CNN algorithm using a dataset.
1.Get to Know the Dataset
In practice, we will use the CIFAR-10 dataset available on the Kaggle platform. CIFAR-10 is a popular dataset in the field of computer vision
CIFAR-10 dataset consists of 60,000 color images with a resolution of 32x32 pixels, divided into 10 different classes. The training set contains 50,000 images, while the test set contains 10,000 images.
The 10 classes are respectively Plane, Automobile, Bird, Cat, Deer, Dog, Frog, Horse, Ship, and Truck
TensorFlow and Keras are popular libraries for creating and training deep learning and artificial intelligence models. They are often used together. Let’s install the TensorFlow and Keras libraries.
pip install tensorflow
pip install keras
Let’s import the libraries we will use in the project.
import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow.keras import datasets,layers,models
import matplotlib.pyplot as plt
Let’s call the dataset. CIFAR-10 dataset is available within Keras. “xtrain” and “ytrain” contain the training images and labels, while “xtest” and “ytest” contain the test images and labels.
(xtrain,ytrain),(xtest,ytest)=datasets.cifar10.load_data()
ytest=ytest.reshape(-1,)
ytest
The expression “ytest.reshape(-1,)” is used to reshape the “ytest” variable into a one-dimensional array. The dimension of the “ytest” variable is automatically reshaped, resulting in a one-dimensional array. This way, the label data is formatted appropriately.
Classifications are present in the dataset as numerical values. Let’s name them. Let’s create a def function to visualize a specific data point and its corresponding label.
领英推荐
classname=["airplane","automobile","bird","cat","deer","dog","frog","horse","ship","truck"]
def example(x,y,index):
plt.figure(figsize=(15,2))
plt.imshow(x[index])
plt.xlabel(classname[y[index]])
#x: Dataset containing the images.
#y: Dataset containing the labels. Specifies the class of each image.
#index: Specify the index of the data point to be visualized
example(xtest,ytest,8)
The 8th image in the “xtest” dataset and its label from the “ytest” dataset will be visualized, and the output will be displayed.
2.Normalization Operations
Let’s determine the standardization procedures and layer properties of the neural network.To create a CNN (Convolutional Neural Network) model, we will use Keras’s Sequential model. The model we create is a deep learning model designed to classify 32x32-sized color images provided.
It is a common practice to scale the pixel values of images in the dataset to the range [0, 1]. With the code, we normalize the pixel values of the images, which are between 0 and 255, to values between 0 and 1.
xtrain=xtrain/255
xtest=xtest/255
rom keras.layers import Input
model = models.Sequential([
Input(shape=(32,32,3)), # InputLayer'? buraya ekleyin
layers.Conv2D(filters=32, kernel_size=(3,3), activation='relu'),
layers.MaxPooling2D(2,2),
layers.Conv2D(filters=64, kernel_size=(3,3), activation='relu'),
layers.MaxPooling2D(2,2),
layers.Flatten(),
layers.Dense(64, activation="relu"),
layers.Dense(10, activation="softmax")
])
The Conv2D layer allows extraction of features from images. Relu (Rectified Linear Activation) activation function is used, which ensures that the outputs of the layer are zero for values below zero.
The MaxPooling2D layer is applied as the second step. A maximum pooling operation of size 2x2 is applied, which reduces the size of the image by taking the maximum pixel value from each 2x2 pixel block and helps preserve important features.
I repeated the same steps twice. These operations help in transforming features into higher-level features and further reducing the size.
The Flatten layer converts all feature maps into a one-dimensional vector. This allows CNN to transition to dense layers.
Dense Layers come after the Flatten layer. A dense layer with 64 neurons follows, with ReLU activation function. Then, another dense layer with 10 neurons for classification follows. The outputs of this layer are computed using a softmax activation function, which determines the probabilities of the classes. If it is a binary classification problem, a sigmoid activation function can be used.
model.compile(optimizer="adam",
loss="sparse_categorical_crossentropy",
metrics=["accuracy"])
“model.compile” prepares the dataset for training. In the application, the Adam optimization algorithm is used as the optimizer. The “loss” parameter measures how far the model’s predictions are from the actual labels. The “accuracy” metric calculates the rate of correct classification and measures the classification performance of the model.
3.Training the Model
The completion of training is specified with epochs. An epoch is one pass through the entire training dataset on the model. For example, 30 epochs mean the model will go through the training dataset 30 times.
history=model.fit(xtrain,ytrain,epochs=30, validation_data=(xtest,ytest))
“model.evaluate” computes the loss value and the specified metric values of the model based on the provided test dataset and labels. You can create Loss and Accuracy charts with the following codes.
loss,acc= model.evaluate(xtest,ytest,verbose=False)
plt.figure(figsize=(20,5))
plt.subplot(1,2,1)
plt.plot(history.history['accuracy'],color = "b",label= "Training Accuracy")
plt.plot(history.history["val_accuracy"], color = "r", label = "Validation Accuracy")
plt.legend(loc = "lower right")
plt.xlabel("Epoch", fontsize= 16)
plt.ylabel("Accuracy",fontsize = 16)
plt.ylim([min(plt.ylim()),1])
plt.title("Training and Test Performance Graph", fontsize = 16)
plt.figure(figsize= (20,5))
plt.subplot(1,2,2)
plt.plot(history.history["loss"],color= "b",label="Training Loss")
plt.plot(history.history["val_loss"],color="r",label="Validation Loss")
plt.legend(loc= "upper right")
plt.xlabel("Epoch",fontsize=16)
plt.ylabel("Loss",fontsize = 16)
plt.ylim([0,max(plt.ylim())])
plt.title("Training and Test Loss Graph",fontsize= 16)
plt.show()
ypred=model.predict(xtest)
ypred[:3]
The “np.argmax” function returns the index of the largest element in an array, in this case, it will return the index of the class with the highest probability.
ypred1=[np.argmax(element) for element in ypred]
ypred1[:3]
Let’s perform evaluations to assess how close the model’s predictions are to the actual data and to evaluate the model’s performance. If there is similarity between ytest and ypred1, it can be said that the model performs well.
# True classes
y_true = [3, 8, 8, 0]
# Predicted classes
ypred1 = [3, 8, 1, 0]
# Printing the true and predicted classes
for true, pred in zip(y_true[:3], ypred1[:3]):
print("True Class:", classname[true], "\tPredicted Class:", classname[pred])