Run your AI model fast in a cost-effective way
When you consider running your AI model on which resource types, you can think about the following points. This is the summary from Microsoft Build BRK3013.
What kind? Where? Hardware? Which framework?
- What kind of the model? Is it pre-built or custom?
- Where to run the model? Is it running on the cloud or running on the edge?
- Hardware? Using CPU / GPU / FPGA?
- Which framework? TensorFlow/ PyTorch/ SciKit-Learn/ Keras/ Chainer/ MXNet?
Hardware
Different hardware have different considerations
The following picture[3] can represent:
CPU: general purpose computing
GPU: Can do parallel and large amount of processing. GPU is especially useful for performing analysis, deep learning, and machine learning algorithms. The GPU allows some calculations to perform 10 to 100 times faster than the same calculations on a traditional CPU.
FPGA: Super fast and re-configurable. It can deploy AI models and run on that chip directly.
Regarding the comparison between GPU and FPGA, the speaker in this Microsoft Build session[2] used "car repair shop" V.S. "pit stop" to describe.
GPU is suitable for handling bunch of data and effective in parallel processing.
Conversely, when you do not process a lot of data at the same time then some of the resources will be idle. For example, the car repair shop can repair 10 to 20 cars at a time. But when there is only one car, the other slots are idle.
FPGA is customized to be close to the hardware level, so it can run very fast for custom applications. Yet, it cannot process a lot of data at once. It is like a pit stop, after a car enters the station, the station processes that car quickly and then takes next one.
You can choose the hardware according to the different type of tasks and requirements.
Frameworks
Framework is also a concern to users. Should we use pytorch? Tensorflow? or other frameworks? We also worry about the compatibility and performance between the platform and the framework. The complexity will increase if you need to run your framework on various platforms. Optimizing machine learning models for inference is difficult since you need to tune the model and the inference library to make the most of the hardware capabilities.
In response to this problem, Microsoft and a community of partners created ONNX as an open standard for representing machine learning models. Models from many frameworks including TensorFlow, PyTorch, SciKit-Learn, Keras, Chainer, MXNet, and MATLAB can be exported or converted to the standard ONNX format. Once the models are in the ONNX format, they can be run on a variety of platforms and devices. [1] According to the speaker said[2], we can imagine that the conversion is like converting all kinds of files into pdf file, and the platform only needs to optimize the processing performance of pdf file format. By using ONNX Runtime, you can benefit from extensive production-level optimization, testing and continuous improvement.
Conclusion
Before you decide to use what kind of resource type to run your AI model, you can think of the following points in advance. First, what kind of the AI model to run? It may not need super computing power for training if you decide to use pre-built model. Second, where to run? run on cloud or edge? For example, if you would like to react immediately on edge device or you can't tolerate internet latency, you may consider to run your model on edge. According to different scenarios, the allocated resources on cloud and edge will be different. Last, what kind of hardware? It will be highly related to the type of the processed tasks. You may use CPU if you want to use in general computing, GPU if you want to run image processing or parallel computing, or FPGA if you want to run single and fast computing. For the framework part, you can convert your model to ONNX to run on various platforms and devices and increase the efficiency.
Reference:
[1] https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-onnx
[2] https://mybuild.techcommunity.microsoft.com/sessions/76979?source=sessions#top-anchor
[3] https://bixbit.io/en/blog/post/fpga-for-mining-what-trends-will-prevail-in-2019
Supply Chain DT/IT Dept. Manager
5 年Well written. It's a useful guide article.
Founder of Freebble
5 年Nice to know. Thanks for sharing this.