Serving deep learning models with Rust
tl:dr; Me and Anubhab Bandyopadhyay have experimented with using Rust to serve the inference pipelines of machine learning models using tract core, an ONNX runtime. It allows us to run model inference in an optimised, carefully controlled way, while the training and experimentation side of deep learning models is still built using Python.
By the way, the inference is a fancy word for "get predictions from". The two-language problem of AI/ML is well known. Most of the interesting, cutting-edge stuff happens in Python. No matter what's your opinion about the language (I love it, just for the record) you can't avoid it. Pick up any SOTA result beating, latest research paper in the deep learning domain from the last 5 years, the accompanying code will be mostly in Python, specifically PyTorch (sorry, TensorFlow).
The serving or app side where these models are consuming is an open world though. Requirements of how to serve ML models are often dictated by a variety of factors: the ML knowledge of engineers working on the app side, the amount of existing code already written in another language, the hardware/latency/power consumption requirements, and so on.
For a routine requirement, you can very well serve the model directly in Python using Flask or FastAPI. In my career, I have seen other attempts at re-writing the model algorithm in a target language like Java, etc. Heck, I even know an effort in PHP. It is one thing to code up a regression model or a simple 1/2 layer neural network in your favorite language. Take my words from a decade-long experience in this, don't do it. Thankfully, deep learning models are sufficiently complicated that no one attempts these rewrites anymore, saving me the agony.
But if you squint hard enough all ML models are a combination of lots of data (weights) flowing through some computational primitives. There were solutions such as PMML which saved ML models in an XML format so that a library in the target language can carry out the same computation. PMML was clumsy and never became mainstream (IMO!). In the last 4/5 years, Open Neural Network Exchange (ONNX) has emerged as a worthy solution for the deep learning world. Created by the PyTorch team at Facebook, it is now a Linux Foundation project.
领英推荐
At xAmbit AI, we train models in PyTorch or TensorFlow and export them to ONNX format. On the inference side, a Rust library (sonos tract) helps us carry out the inference. Rust might look like an odd choice so few words of explanation. If you have the weights and network architecture from ONNX, the prediction step of a neural network is mostly a deterministic computation. You want it to run as quickly and efficiently as possible. A combination of Rust and ONNX serves us well for this. Rust also forces us to think deeply about the shape and format of the data at the IO of the model. I'm not going to lie. For me, someone who grew up on freewheelin' dynamic typing languages like Python, R, and Lisps this process is excruciatingly painful at times. But this slowing down and detailing is worthwhile in terms of reduced errors and better savings in operational costs.
One long-term motive to explore Rust is also an embedded world. We believe that the most interesting things in deep learning will happen (or are already happening) at the intersection of the software hardware and the edge like mobile or micro-controllers. This excursion helps us prepare for that future.
(For the more informed, yes Sonos Tract does not support GPU and their focus is on restricted hardware devices).