How to beat Google AutoML in Image Classification

How to beat Google AutoML in Image Classification

TL;DR: We used Google AutoML to classify Airbus A380 vs B747, we will see here how to do it manually to achieve better result using PyTorch.

I made a test of AutoML last week. It's in french but the screen captures talk by themselves: it's easy to use and the result is pretty good.

So what's the point to hire a Data Scientist if you could do it yourself with a few clicks on AutoML?

Let's see what we could achieve!

We will use Fast.AI library, which makes use of the PyTorch framework, supported by Facebook.

The code we will use is based one from the first lesson of part 1 of Fast.AI courses, with some improvement.

We will not make an architecture our-self as researcher already made it for us.

We will use ResNext architecture, which is an evolution of ResNet.

They both are convolution neural networks. It means that it's not only layers of simples neurons but convolution layer. This type of layer is perfect for images as it makes use of 2D pieces of information to be able to recognize form.

See this video for a concrete example of what it means:

We do not have to create ResNext50 as it is already in Fastai library.

We will use classic technics, that Google probably uses, to improve the results :

  • Fine-tuning : we use a model pre-trained on ImageNet, so it already knows how to recognize elementary forms and will learn faster.
  • Dropout : we randomly shoot neurons while learning to prevent over-fitting.
  • Data augmentation: randomly modify images to help the model to generalize.
  • Different sizes: we begin learning with small size, then bigger.
  • Learning rate annealing: we change the learning rate while learning.
  • Cyclical learning rate: the same as above but with increased duration as we learn.
  • Save best model: we automatically save the model while learning if the performance has increased. Because often the performance at the end of the learning is not the best of the complete learning.
  • Loop with different parameters: instead of launching many trainings manually we could program it
  • Saving different learning error loss to see how our model learns.
  • Compute metrics like the confusion matrix to see the performance we achieve.

It could seem complicated but, as often in Deep Learning, it is not so much line of code:

learning_rate = 1e-3
dropout = [0.25,0.5]
global_results = collections.OrderedDict([])
lr = np.array([learning_rate/10,learning_rate/5,learning_rate])
training_loop = [
    [512, 64, 5, lr, CropType.RANDOM, dropout],
    [256, 128, 5, lr, CropType.RANDOM, dropout],
    [123, 300, 5, lr, CropType.RANDOM, dropout],
    [123, 300, 5, lr, CropType.NO, [0.25, 0.5, 0.7]],
    [123, 300, 5, lr, CropType.RANDOM, 0.6],
    [123, 300, 5, lr, CropType.RANDOM, dropout],
    [123, 300, 5, lr, CropType.RANDOM, dropout],
]
i = 0
for bs, sz, cycle, lr, crop_type, ps in training_loop:
    i+=1
    # DAta augmentation
    tfms = tfms_from_model(arch, sz, aug_tfms=aug_tfms, max_zoom=0.5,
                            crop_type=crop_type)
    # Load data
    data = ImageClassifierData.from_paths(PATH, tfms=tfms, bs=bs,
                                           num_workers=num_cpus())
    learn.ps = ps
    vals_s2s, ep_vals_s2s = learn.fit(lr, cycle, cycle_len=1,
                                      cycle_mult=2, get_ep_vals=True,
                                      best_save_name=arch.__name__ + "-" + str(i) +  "_clean_bestmodel")
    if len(global_results) > 0:
        for k, v in ep_vals_s2s.items():
            global_results[len(global_results)] = v
    else:
        global_results = ep_vals_s2s
   
    print("After ", str(len(global_results)), " epochs, the accuracy is ",
                 str(vals_s2s[1]*100)[:5], "%" )
    print(" with hyperparameters : Batch size=", bs, " Drop out=", ps,
        " Learning rate=", lr, " Cycle=", cycle, " Images sizes=", sz )
    fichier = "acc" + str(vals_s2s[1]*100)[:5] + '_' + arch.__name__ +
             '_' + str(len(global_results)) + "_" + str(sz) + '_weights'print("Saving to ", fichier)
 learn.save(fichier)

As there are random things happening in learning, just running many times the same learning could improve the result. By around plus or minus one point, when we come to 97% it could make the difference.

So don't forget to run the code many times to have better results.

And if the resulting drop by an order of more than 10 points, here is a pro-tip: just delete the temp folder. That's why I add the line on top of my notebook :

!rm -r {PATH}{tmp_name}

After many runs and parameters tuning we finish by having many weights files, we will make a script to automatically find the best model.

And we achieve this performance :

It mean 2 A380 has been misclassified as B747 (false negatives), and 2 B747 misclassified as A380 (false positives).

Please compare it to the Google Result :


To be fair Google has only 18 errors as there is a duplicate in the validation data-set.

So 18 errors on 309 images for Google : 94.1% vs 4 errors on 309 for us : 98.7% accuracy.

In terms of number of errors , it's 4 times better !

It took us more time than with Google but we a have a solution that is better and that we could tweak to our specific needs.

For example, it could be useful in certain case to give priority to false positives versus true negatives.

Or we could have a look at which area is used by the model to make his prediction :

The source is also available on my GitHub.

So AutoML is a good starting point but hiring a Data Scientist remain a good option to improve reliability ;-)


Bartosz Telenczuk

machine/deep learning engineer

5 年

Hi Benoit, thanks for this nice comparison between AutoML and manual tuning with fast.AI. I agree that with the right expertise, humans (still) can outperform automatic hyperparameter searches. However, I think that even if the tools improve, data scientist still will be necessary. I see that the role of a data scientist is not only to find best models, but also to ask the right questions, prepare the data and communicate the results of the model. I also recommend this blog post (it may be a bit extreme in its message, but many points are well taken): https://veekaybee.github.io/2019/02/13/data-science-is-different/

回复

要查看或添加评论,请登录

社区洞察

其他会员也浏览了