Deep Learning/Convolutional Neural Networks. Creating AI/Image recognition algorithms series part 1.
Darko Medin
Data Scientist and a Biostatistician. Developer of ML/AI models. Researcher in the fields of Biology and Clinical Research. Helping companies with Digital products, Artificial intelligence, Machine Learning.
In previous articles, i talked about saving the basic Artificial Neural Network architecture and weights as a prerequisite for working with different Artificial Neural Networks. In this article i will start with a specific AI spheres called Computer vision/Image recognition and show how to create Convolutional Neural Network algorithms.
First, remember to load all the packages and optimizers from the previous parts like :
import tensorflow as tf
import numpy as np
from keras.models import Adam
import visualkeras as v_k
As i said before, i will create an image recognition algorithm using TensorFlow and will incorporate Fashion MNIST dataset [1]. This dataset is all about predicting fashion images as its name says, predicting types of clothes based on images.
Why did i chose this AI dataset? Its challenging enough and its a good benchmarking dataset. Contrary to MNIST[2] (another digit prediction dataset from Zalando research) which is easy to classify, with most AI models easily achieving 99% accuracy, Fashion MNIST is a bit harder to predict with 'good' model typically predicting between 90-96% classes. This is very important, as in data science, one should never aim for easy tasks and easy to create models.
Having perfect data is never good for optimizing models. In reality data tends to be imperfect and have specific flaws, so good algorithms are developed to account for those flaws exactly on using flawed data (makes sense, right...) Fashion MNIST is ideal for this, it has 60 000 train with additional 10 000 test 28x28 pixel images, which is small in terms of image size but still has enough informative for models to achieve relatively high classification power (if the predictive algorithm is good enough of course).
Keeping in mind that pixels are actually 2D numerical data and keeping statistical perspective is very important...while exploring the data...
Here is the code to load the dataset using keras :
(trainI, trainL), (testI, testL)=tf.keras.datasets.fashion_mnist.load_data()
trainI is now an object containing train images and trainL is an new object containing labels for them. Same principle applies for testI and testL (test images and their labels)
First a bit of pre-processing, as always...
Basic Dense/Fully Connected ANNs could easily reach 100% training accuracy, but its validation and testing accuracy which are most relevant would probably be around 85-87%. Good Fashion MNIST predictive models should overreach 90% in validation and test sets and typically range from 90.5% to 96.5%. In Artificial intelligence, accuracy percentage points become more and more difficult to achieve as a threshold is increased. After 90% accuracy, every percent is vital and makes a huge difference for a dataset like this!
To overreach this threshold, i will add a specific layers that we discussed before, Convolutional Layers, using Conv2D() function. Convolutional layers are typically complemented via pooling layers, so i can use MaxPooling2D() function specifying kernels again.
As it can be seen there are 2 main parts of this Convolutional Neural Network. Actual convolutions happen in first part where Conv2D is dominant and accompanied via Pooling and Normalization layers... But, i augmented this unit via 4 dense feedforward layers, with 100, 100, 100, 200 neurons which is a combination that will increase the power of convolutions....Ill explain the whole Convolution/Augmentation process and the model complexity using a more intuitive visualization...
Ill use visualkeras to create a volume, length and funnel based blueprint of layers and their functions. This is very important to assess in the Algorithm engineering part. But before that ill see the parameter summary...
Here is the code...
And here is the summary...
As it can be seen, there are 1 078 208 ANN parameters. So just over 1M parameters is the Complexity of the created ANN (compare that to the models i created in previous articles which had around thousands rather then millions of parameters). From my experience this would be something i can call medium complexity Convolutional ANN, but with advanced augmented Convolutional functions...which can be observed in the next visualkeras graph.
From Machine Learning Engineer's perspective, ill need to segregate this architecture into functional parts...So first 4 layers are composed of 2 convolutional and 1 max pooling layer with additional Batch Normalization layer and these are first functional unit. They will subsample 2x2 pixel image parts and identify unique patterns....Next 4 layers in blue color are Feedforward fully connected layers and will augment the processing of the previous functional unit. Third functional unit is another Convolutional unit similar to the first one. Notice how i structured Fully connected Augmentation unit between 2 Convolutional units (with additional pooling layers). This should increase accuracy by 1-2% in contrast to just using Convolutional units... Fourth units is composed of Flatten layer and Fully connected neurons, generally larger in number compared to first 3 units and this is where most post processing happens...
From experience i know that these 4 functional units will increase accuracy by 7-9% compared to using just Fully connected Feedforward Artificial Neural Network...
I need to think about overfitting problems all the time... So i will add a validation set within my training data to have a more validated metric and avoid too much overfitting and hopefully my validation metrics will be similar to the actual testing metrics. I'll use similar compiling options as in previous parts, Adam optimizer, set learning rate and start training the model.
I'll also test the model in this step. As you can see, i like training and testing the model in 1 step. This speeds up and makes the whole process logical from validation perspective, contrary to training first without seeing the test result.
During the training phase 100 iterations were perfomed and as you can see they do take a while to complete. So 100 iterations from 67-90s would take a bit more then an hour to complete and i did use gpu enabled computation here, so basic computation on CPU could take hours to complete. This is completely normal for complex algorithms (over 1M parameters) and i will talk in the next tutorials.
Let me interpret the results for now. So my Artificial CNN achieved around 97% train accuracy, but more importantly it achieved 0.94 validation accuracy (94% accurate). This is really high for 1M parameter Artificial CNN. To confirm the results the model is tested on 10 000 images separated from the training data using model.evaluate() and it can be seen that Test accuracy is 0.9356 which meant it was 93.56% accurate on testing data. Now, as i said basic feedforward ANN would achieve around 85-87% and basic Convolutional networks would be around 90-91%, so this Augmented Artificial CNN overperformed Basic feedforwards by 6-8% and basic Convolutional Architecture by around 3-3.5%. The best ever accuracies achieved on Fashion MNIST are around 96% in very very complex architectures and i will talk about achieving this extremely high accuracy in the next tutorial. On this example it can be seen that between good model and best model 2.5% accuracy might be all the difference. As i said, a Machine Learning Engineer should never be satisfied with anything short of a super result.
Until then, thanks for reading and more on super-results in the next article/tutorial.
The article/tutorials practical examples - created using a Python [1], Spyder IDE[2], TensorFlow[3] and Keras[4] platforms.
Links :
1. Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms. Han Xiao, Kashif Rasul, Roland Vollgraf. arXiv:1708.07747. https://github.com/zalandoresearch/fashion-mnist
2.MNIST handwritten digit database. LeCun, Yann and Cortes, Corinna and Burges, CJ. ATT Labs. https://yann.lecun.com/exdb/mnist
3. https://www.python.org/
4.https://www.spyder-ide.org/
5.https://www.tensorflow.org/
6. https://keras.io/
Relentlessly helping with academic publishing
3 年Don't understand it Darko Medin, but I like it! ??