Intel Vision Technology and OpenVINO
Recently, I had the opportunity of attending a workshop of Intel's 2019 Embedded Vision Summit. The event was part of a larger conference focused on the practicality of deploying computer vision models. The speakers got into the low-level details of controlling the hardware behind the computations of neural networks, explaining how the OpenVINO toolkit can be used to optimize performance in edge computing.
One of the topics covered was inferencing, which is the process of decreasing the computation of a neural network following the training phase. The training phase is a compute-intensive phase in which the parameters of a neural network are fine-tuned using gradient descent to minimize a loss. Inferencing can improve the performance of a trained model through two ways. The first is determining the activations of a neural network which are not needed, and can be removed from the neural network. Another method is combining layers of a neural network to optimize performance. For instance, the convolution layers and ReLU can be combined. Convolution is a compute intensive process, and may be performed on the Movidius Neural Compute Stick, an accelerator. If the next layer is ReLU, a very computationally inexpensive function, the computations can be performed on the compute stick, as opposed to conveying the data back to the CPU, which would take more time.
This ties into another of the topics covered, heterogeneous computing, which refers to the use of performing computations across multiple processors or cores. The rationale behind this is that there are certain operations that are faster on certain accelerators than others. A great example is convolution, which is much faster on a GPU as opposed to CPU. Thus, the convolution operation would be performed on the GPU, as opposed to the CPU, in order to enhance the performance of the neural network. The speakers at the event introduced the OpenVINO toolkit, which enables programmers to control which operations are performed on which hardware. The goal is to enhance the performance of the neural network.
They presented the Intel Movidius Neural Compute Stick, CPU, GPU, and FPGA, which are hardware accelerators designed to increase the speed with which the computations are carried out. The OpenVINO presents a common API for heterogeneous execution, such that programmers are directly able to control where each operation is carried out, thus improving the performance of the neural network. An Intermediate Representation (IR) is used to convert from standard frameworks into a form that the inference engine. The inference engine can be incorporated into the application to deploy the model. The toolkit also supports libraries commonly used in computer vision, including OpenCV and OpenVX. As a whole, the OpenVINO toolkit demonstrates the vertical integration of Intel, as it owns all, from the hardware accelerators mentioned above to the OpenVINO toolkit and API used to control the optimization and inference process.