Run your AI model fast in a cost-effective way

Run your AI model fast in a cost-effective way

When you consider running your AI model on which resource types, you can think about the following points. This is the summary from Microsoft Build BRK3013.

What kind? Where? Hardware? Which framework?

  1. What kind of the model? Is it pre-built or custom?
  2. Where to run the model? Is it running on the cloud or running on the edge?
  3. Hardware? Using CPU / GPU / FPGA?
  4. Which framework? TensorFlow/ PyTorch/ SciKit-Learn/ Keras/ Chainer/ MXNet?


Hardware

Different hardware have different considerations

The following picture[3] can represent:

No alt text provided for this image

CPU: general purpose computing

GPU: Can do parallel and large amount of processing. GPU is especially useful for performing analysis, deep learning, and machine learning algorithms. The GPU allows some calculations to perform 10 to 100 times faster than the same calculations on a traditional CPU.

FPGA: Super fast and re-configurable. It can deploy AI models and run on that chip directly.

Regarding the comparison between GPU and FPGA, the speaker in this Microsoft Build session[2] used "car repair shop" V.S. "pit stop" to describe.

GPU is suitable for handling bunch of data and effective in parallel processing.

No alt text provided for this image

Conversely, when you do not process a lot of data at the same time then some of the resources will be idle. For example, the car repair shop can repair 10 to 20 cars at a time. But when there is only one car, the other slots are idle.


No alt text provided for this image

FPGA is customized to be close to the hardware level, so it can run very fast for custom applications. Yet, it cannot process a lot of data at once. It is like a pit stop, after a car enters the station, the station processes that car quickly and then takes next one.

You can choose the hardware according to the different type of tasks and requirements.


Frameworks

No alt text provided for this image

Framework is also a concern to users. Should we use pytorch? Tensorflow? or other frameworks? We also worry about the compatibility and performance between the platform and the framework. The complexity will increase if you need to run your framework on various platforms. Optimizing machine learning models for inference is difficult since you need to tune the model and the inference library to make the most of the hardware capabilities. 

In response to this problem, Microsoft and a community of partners created ONNX as an open standard for representing machine learning models. Models from many frameworks including TensorFlow, PyTorch, SciKit-Learn, Keras, Chainer, MXNet, and MATLAB can be exported or converted to the standard ONNX format. Once the models are in the ONNX format, they can be run on a variety of platforms and devices. [1] According to the speaker said[2], we can imagine that the conversion is like converting all kinds of files into pdf file, and the platform only needs to optimize the processing performance of pdf file format. By using ONNX Runtime, you can benefit from extensive production-level optimization, testing and continuous improvement.

No alt text provided for this image

Conclusion

Before you decide to use what kind of resource type to run your AI model, you can think of the following points in advance. First, what kind of the AI model to run? It may not need super computing power for training if you decide to use pre-built model. Second, where to run? run on cloud or edge? For example, if you would like to react immediately on edge device or you can't tolerate internet latency, you may consider to run your model on edge. According to different scenarios, the allocated resources on cloud and edge will be different. Last, what kind of hardware? It will be highly related to the type of the processed tasks. You may use CPU if you want to use in general computing, GPU if you want to run image processing or parallel computing, or FPGA if you want to run single and fast computing. For the framework part, you can convert your model to ONNX to run on various platforms and devices and increase the efficiency.

Reference:

[1] https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-onnx

[2] https://mybuild.techcommunity.microsoft.com/sessions/76979?source=sessions#top-anchor

[3] https://bixbit.io/en/blog/post/fpga-for-mining-what-trends-will-prevail-in-2019

Mark Zhao

Supply Chain DT/IT Dept. Manager

5 年

Well written. It's a useful guide article.

回复
Andrew Lee

Founder of Freebble

5 年

Nice to know. Thanks for sharing this.

要查看或添加评论,请登录

LeAnn Huang的更多文章

  • AI in Power BI

    AI in Power BI

    Power BI has been widely used for end user to do data analytics. Recently Power BI delivers a unique set of AI-infused…

    2 条评论
  • MLOps - Make AI Impact Your Business Nonstop

    MLOps - Make AI Impact Your Business Nonstop

    “After you build the model for us, we don’t know how to maintain your model or retrain it for ourselves.” - These are…

    2 条评论

社区洞察

其他会员也浏览了