登录查看更多内容

How I Improved the Performance of my Computer Vision Model Two-Fold

Ravit Sharma

MSPhD EE @ UCLA

发布日期: 2020年8月4日

Python is great for training deep learning models. The variety of supported platforms make it easy for pretty much anyone to train their own custom neural network.

But how about when it comes time for inferencing? When it comes time for deploying the machine learning model, whether in a web app with the deep learning model running in the backend (e.g. Cough Symptom Analysis Web App https://cospect.konect-co.com/) or an EDGE device, it's not only the performance of the model that matters but also the speed, especially when the hardware is a limitation.

From a quick glance at the Tensorflow Model garden below, it's clear that accuracy isn't a limiting factor when it comes to Object Detection. (Source: https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2_detection_zoo.md)

But in a time-sensitive or resource-constrained environment, such as a real-time model enabling self-driving cars to "see" the world, you certainly don't want a model with a latency of 5 seconds, as quick reaction time is necessary on the road.

So, let's dig deep into the problem. Here's where most deep learning tutorials stop:

Ok, great. You can see your Object Classification model in action and the fact that you are now an AI Developer brings a smile to your face. But...let's see, how long does it take to run on your machine in Python? Let's have a look.

Baseline: Running a training graph

On an "Intel(R) Xeon(R) CPU @ 2.30GHz" on Google Colab, the model takes 81 ms to execute! This model is certainly not fit to be used for inferencing in time-sensitive environments. And what about the reported 30ms latency you saw advertised? Wherefore this drastic reduction in speed?

The main explanation is that the current graph is a TRAINING graph. It isn't optimized for execution. Some of the features that make it unfit are that its weights are still treated as "variables" by TensorFlow rather than constants, slowing down operations. Secondly, redundant training operations such as "Identity" are present in the training graph but serve no purpose in inferencing.

What is the solution? How can we get our model to a point where it's suitable for inferencing and can be deployed?

Solution 1: Freezing the Graph

One quick fix is to freeze the graph, meaning to convert all variables to constant tensors. The function to accomplish this is convert_variables_to_constants_v2, present in tensorflow.python.framework.convert_to_constants.

Have a look at the improvement below! We went from ~80 to ~45 milliseconds. That's over a 40% increase in speed just from converting variables to constants.

Solution 2: Performing Graph Optimizations

Following this trivial step of converting variables to constants, another useful step is to perform optimizations on the model graph. Tensorflow has a set of optimizers https://www.tensorflow.org/guide/graph_optimization to improve the quality of the graph across a set of metrics. Examples of these metrics include power consumed, memory consumed, and time taken.

There are also some application-specific optimizations. If you want to drastically reduce the model memory with a slight reduction in accuracy, you might want to look into quantization (https://www.tensorflow.org/lite/performance/post_training_quantization). However, if you are using your model in a setting where accuracy is more important than time or memory constraints, such as a Kaggle data science competition, then quantization would not be beneficial.

It's important to analyze which types of optimizations would be in the best interest for your model: accuracy? latency? memory consumption? power consumption? (last two are especially important for low-memory devices like microcontrollers)

In fact, if you are targeting your machine learning model for microcontrollers, you might want to look into a compiler specific to low-power, low-memory devices, such as DeepC (https://github.com/ai-techsystems/deepC).

Solution 3: Using C API (or C++ wrapper)

One shortcoming of Python is that it's an interpreted language, so no intermediary executable is generated. Rather, all the code is interpreted on the fly. For this reason, Python programs are rather slow and the interpreter has a high memory consumption.

If you're looking to interface your C++ project with a machine learning model and process the output directly as vectors, taking advantage of Tensorflow's C API may be the right choice (https://www.tensorflow.org/install/lang_c).

For C++ projects specifically, Cppflow (https://github.com/serizba/cppflow) is a simple and elegant solution that provides an C++ interface with the C API. Have a look how easy it is to make a tensor, load in the data, and run the model.

auto x = new Tensor(model, "x"); // making a new input tensor

x->set_data(input_data, {1, 224, 224, 3}); // loading input data in the tensor

model.run({x}, {num_detections, detection_boxes, detection_classes, detection_scores}); // Running the model

Let's have a quick look at the performance of this model.

Adjusting for the speed difference between Google Colab and my local machine, using the Cppflow C++ API does run about 3x slower than Python implementation, which is important to be wary of.

To recap, it's important to optimize your deep learning model to apply it in the production settings. Training is not the last step. Graph optimizations (with freezing variables to constants being the simplest of them) enable you to target your model for the appropriate production setting, which depends on the scenario in which it is used and hardware it's running on.

All of the source code mentioned in this article is available at https://github.com/SRavit1/BoostingModelSpeed. Thanks for reading!

要查看或添加评论，请登录

Ravit Sharma的更多文章

Computer Vision: The Next Era of Smart Policy

2020年7月31日

Computer Vision: The Next Era of Smart Policy

Hi there, my name is Ravit Sharma, and I’m CTO at Konect. At Konect (https://www.
TensorFlow vs PyTorch

2019年7月12日

TensorFlow vs PyTorch

PyTorch and Tensorflow are two deep learning frameworks used in the creation of neural networks; they have largely the…
Object Tracking Methods

2019年7月3日

Object Tracking Methods

Object tracking is what it sounds like: the process of keeping track of an object as it moves across the screen. This…
Intel Vision Technology and OpenVINO

2019年5月25日

Intel Vision Technology and OpenVINO

Recently, I had the opportunity of attending a workshop of Intel's 2019 Embedded Vision Summit. The event was part of a…
Synthetic Gradients in Neural Networks

2019年5月25日

Synthetic Gradients in Neural Networks

A neural network is simply a complex function that processes input and spits out an output. Through computation of a…
UPGMA Method: Designing a Phylogenetic Tree

2019年1月28日

UPGMA Method: Designing a Phylogenetic Tree

A phylogenetic tree (AKA cladogram) is a diagrammatic representation of the evolutionary relatedness between various…

See all articles

How I Improved the Performance of my Computer Vision Model Two-Fold

Ravit Sharma

MSPhD EE @ UCLA

Baseline: Running a training graph

Solution 1: Freezing the Graph

Solution 2: Performing Graph Optimizations

Solution 3: Using C API (or C++ wrapper)

Ravit Sharma的更多文章

社区洞察

其他会员也浏览了

My new GenAI book is now available!

Hallucination-Free, Self-Tuned, Fast Hierarchical LLMs with Multi-Token Embeddings

Artificial Intelligence #207

Artificial Intelligence #207

Python and the Democratization of AI: Hands-On Code Examples and Creative Project Ideas (EN-PT)

The sound of war: Tensorflow VS Pytorch

How to Learn AI on Your Own

The Generative AI Roadmap: A Step-by-Step Guide to Becoming an Expert Of Innovation

Basic deep learning ( AI ) basics data handling tools in python:

Creating a machine learning image classifier in under 45 mins with FastAI

Baseline: Running a training graph

Solution 1: Freezing the Graph

Solution 2: Performing Graph Optimizations

Solution 3: Using C API (or C++ wrapper)

Ravit Sharma的更多文章

Computer Vision: The Next Era of Smart Policy

TensorFlow vs PyTorch

Object Tracking Methods

Intel Vision Technology and OpenVINO

Synthetic Gradients in Neural Networks

UPGMA Method: Designing a Phylogenetic Tree

社区洞察

其他会员也浏览了

My new GenAI book is now available!

Hallucination-Free, Self-Tuned, Fast Hierarchical LLMs with Multi-Token Embeddings

Artificial Intelligence #207

Artificial Intelligence #207

Python and the Democratization of AI: Hands-On Code Examples and Creative Project Ideas (EN-PT)

The sound of war: Tensorflow VS Pytorch

How to Learn AI on Your Own

The Generative AI Roadmap: A Step-by-Step Guide to Becoming an Expert Of Innovation

Basic deep learning ( AI ) basics data handling tools in python:

Creating a machine learning image classifier in under 45 mins with FastAI