登录查看更多内容

Run your AI model fast in a cost-effective way

LeAnn Huang

Lead Machine Learning Engineer at UPS

发布日期: 2019年6月23日

When you consider running your AI model on which resource types, you can think about the following points. This is the summary from Microsoft Build BRK3013.

What kind? Where? Hardware? Which framework?

What kind of the model? Is it pre-built or custom?
Where to run the model? Is it running on the cloud or running on the edge?
Hardware? Using CPU / GPU / FPGA?
Which framework? TensorFlow/ PyTorch/ SciKit-Learn/ Keras/ Chainer/ MXNet?

Hardware

Different hardware have different considerations

The following picture[3] can represent:

CPU: general purpose computing

GPU: Can do parallel and large amount of processing. GPU is especially useful for performing analysis, deep learning, and machine learning algorithms. The GPU allows some calculations to perform 10 to 100 times faster than the same calculations on a traditional CPU.

FPGA: Super fast and re-configurable. It can deploy AI models and run on that chip directly.

Regarding the comparison between GPU and FPGA, the speaker in this Microsoft Build session[2] used "car repair shop" V.S. "pit stop" to describe.

GPU is suitable for handling bunch of data and effective in parallel processing.

Conversely, when you do not process a lot of data at the same time then some of the resources will be idle. For example, the car repair shop can repair 10 to 20 cars at a time. But when there is only one car, the other slots are idle.

FPGA is customized to be close to the hardware level, so it can run very fast for custom applications. Yet, it cannot process a lot of data at once. It is like a pit stop, after a car enters the station, the station processes that car quickly and then takes next one.

You can choose the hardware according to the different type of tasks and requirements.

Frameworks

Framework is also a concern to users. Should we use pytorch? Tensorflow? or other frameworks? We also worry about the compatibility and performance between the platform and the framework. The complexity will increase if you need to run your framework on various platforms. Optimizing machine learning models for inference is difficult since you need to tune the model and the inference library to make the most of the hardware capabilities.

In response to this problem, Microsoft and a community of partners created ONNX as an open standard for representing machine learning models. Models from many frameworks including TensorFlow, PyTorch, SciKit-Learn, Keras, Chainer, MXNet, and MATLAB can be exported or converted to the standard ONNX format. Once the models are in the ONNX format, they can be run on a variety of platforms and devices. [1] According to the speaker said[2], we can imagine that the conversion is like converting all kinds of files into pdf file, and the platform only needs to optimize the processing performance of pdf file format. By using ONNX Runtime, you can benefit from extensive production-level optimization, testing and continuous improvement.

Conclusion

Before you decide to use what kind of resource type to run your AI model, you can think of the following points in advance. First, what kind of the AI model to run? It may not need super computing power for training if you decide to use pre-built model. Second, where to run? run on cloud or edge? For example, if you would like to react immediately on edge device or you can't tolerate internet latency, you may consider to run your model on edge. According to different scenarios, the allocated resources on cloud and edge will be different. Last, what kind of hardware? It will be highly related to the type of the processed tasks. You may use CPU if you want to use in general computing, GPU if you want to run image processing or parallel computing, or FPGA if you want to run single and fast computing. For the framework part, you can convert your model to ONNX to run on various platforms and devices and increase the efficiency.

Reference:

[1] https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-onnx

[2] https://mybuild.techcommunity.microsoft.com/sessions/76979?source=sessions#top-anchor

[3] https://bixbit.io/en/blog/post/fpga-for-mining-what-trends-will-prevail-in-2019

Mark Zhao

Supply Chain DT/IT Dept. Manager

5 年

Well written. It's a useful guide article.

Andrew Lee

Founder of Freebble

5 年

Nice to know. Thanks for sharing this.

1 次回应

查看更多评论

要查看或添加评论，请登录

LeAnn Huang的更多文章

AI in Power BI

2019年7月1日

AI in Power BI

Power BI has been widely used for end user to do data analytics. Recently Power BI delivers a unique set of AI-infused…

2 条评论
MLOps - Make AI Impact Your Business Nonstop

2019年5月23日

MLOps - Make AI Impact Your Business Nonstop

“After you build the model for us, we don’t know how to maintain your model or retrain it for ourselves.” - These are…

2 条评论

Run your AI model fast in a cost-effective way

LeAnn Huang

Lead Machine Learning Engineer at UPS

What kind? Where? Hardware? Which framework?

Hardware

Frameworks

Conclusion

LeAnn Huang的更多文章

社区洞察

其他会员也浏览了

How to Set Up a Private GPT: Step by Step

How Big Deep Learning Models are Trained? A Book Review

Optimizing Deployment and Inference for Large-Scale Transformer Models: A Practical Guide

Attention is not Exactly What you Need. Introducing Mamba!

How Does GPU Technology Help In Machine Learning?

Best GPU(s) for Deep Learning in 2021

7 Best Laptops For Deep Learning and Data Science in 2020

Inference a Model in Small Microcontroller

Brain-on-a-chip: How does it work?

A First Demonstration of Thermodynamic Matrix Inversion

What kind? Where? Hardware? Which framework?

Hardware

Frameworks

Conclusion

LeAnn Huang的更多文章

AI in Power BI

MLOps - Make AI Impact Your Business Nonstop

社区洞察

其他会员也浏览了

How to Set Up a Private GPT: Step by Step

How Big Deep Learning Models are Trained? A Book Review

Optimizing Deployment and Inference for Large-Scale Transformer Models: A Practical Guide

Attention is not Exactly What you Need. Introducing Mamba!

How Does GPU Technology Help In Machine Learning?

Best GPU(s) for Deep Learning in 2021

7 Best Laptops For Deep Learning and Data Science in 2020

Inference a Model in Small Microcontroller

Brain-on-a-chip: How does it work?

A First Demonstration of Thermodynamic Matrix Inversion