A Metaflow serverless Story
Joining a modern ML framework and a Serverless Engine
Overview
Metaflow is an open-source framework for building and deploying data science projects. Its mission is to make it easier for data scientists to build and deploy production-ready machine learning workflows, by providing a high-level abstraction framework for common data science tasks such as data preparation, model training, and deployment.
"Everything you need to develop data science and ML apps"
The goal of Metaflow is to help data scientists focus on the actual data science tasks, rather than getting bogged down in the details of infrastructure, data management and deployment. This is achieved by providing a set of tools and abstractions that simplify the development process and enable data scientists to create complex workflows with ease.
Running ML complex projects requires compute capabilities, which are often challenging to configure and maintain when there is the need to submit such workflows at production scale.
Metaflow has made it easy for data scientists and machine learning engineers to run ML workflows both locally and utilising cloud resources (along with rapidly shifting between the two!). However, cloud resources have their own quirks and subtleties that aren’t always ideal for data science and ML:
Is there still space to further improve compute resources assignment when running large and complex data science projects, such as lowering the latency for executing tasks in the cloud?
Serverless is here…to help
Serverless platforms are, on paper, a perfect candidate to have the cake and eat it too: code execution is fast (vs @batch), and the underlying computation is fully managed and abstracted away (vs @kube).??
As the wise man said, “in theory there is no difference between theory and practice - but in practice, there is”: when using cloud vendor serverless solution, there are some tradeoffs that should be taken into account:
Overall, Metaflow is a powerful and flexible tool for managing the entire data science workflow, from data ingestion to model deployment. It has been designed with portability and flexibility in mind to minimise the above mentioned issues and risks, but it is true that on the computing side there is still the need to rely on some specific cloud providers services (AWS and/or Azure), and normally these are the components which lacks the most of flexibility and customization.
We saw a crazy opportunity: can open source serverless provide a new backbone for open source ML pipelines? With Nuvolaris, we believe it can.
Nuvolaris 101
Nuvolaris started with the idea of implementing a portable and open platform, based on the Apache Openwhisk serverless engine, simplifying the process of building and deploying cloud native applications.
Nuvolaris provides solutions and tools to solve some of the above issues:
While OpenWisk, at the core of Nuvolaris, is usually associated with microservices, there is no principled reason to stop there. And that’s why, after starting using Metaflow, we asked ourselves:
?"If serverless platform are capable of executing functions, what about executing a Metaflow @step as a function"?
It turns out we can, and we will explain how in the following section.
The @nuvolaris decorator
Our @nuvolaris prototype implements a step decorator that can be used by the Metaflow "scheduler" to deploy and execute a serverless function inside an OpenWhisk runtime, as an alternative to @batch and @kube for remote execution.
The user experience is as seamless as all Metaflow features are: it is sufficient to add the @nuvolaris decorator to a @step function in a flow! For example, the following small pipeline implements parallel executions of a step through nuvolaris:
from metaflow import FlowSpec, step, nuvolaris
class ForeachFlow(FlowSpec):
@step
def start(self):
self.titles = ['Stranger Things',
'House of Cards',
'Narcos',
'Suburra',
'Star Trek',
'Mission Impossible',
'Mission Impossible 2',
'Mission Impossible 3',
'Rogue']
self.next(self.a, foreach='titles')
@nuvolaris(namespace="nuvolaris", action="each", memory=256, timeout=120000)
@step
def a(self):
self.title = '%s processed' % self.input
self.next(self.join)
@step
def join(self, inputs):
self.results = [input.title for input in inputs]
self.next(self.end)
@step
def end(self):
print('\n'.join(self.results))
if __name__ == '__main__':
ForeachFlow()
When the above code runs, Metaflow launches parallel execution of the @nuvolaris functions, deploying behind the scene an “action” with the specified parameters in OpenWhisk, and start polling for completion:
2023-03-21 19:40:01.386 [1679427598783956/start/1 (pid 579034)] Foreach yields 9 child steps.
...
2023-03-21 19:40:04.932 [1679427598783956/a/9 (pid 579231)] creating action each with memory=256 and timeout=120000
...
2023-03-21 19:40:05.448 [1679427598783956/a/9 (pid 579231)] checking completion of nuvolaris activation a731a8fd14924f95b1a8fd14927f9527
As a video is worth a thousand log traces, we captured the complete execution of the above example in a video !
While preliminary, this prototype already shows a fantastic experience provided by OpenWhisk: in particular, the engine keeps up accepting requests and executing them, even if there are not enough resources for parallel execution, without affecting the overall execution flow.
Summing up our experience, we believe that Nuvolaris is a good fit for Metaflow, offering out of the box desirable features which particularly fits ML scenarios:
Where we are now and what’s next
Currently the Nuvolaris Metaflow plugin supports Openwhisk customization parameters for namespace, action Name, memory and timeout; it is implemented starting from a fork of Metaflow 2.7.14 and is executing the actions via a custom python Openwhisk runtime with the required demo dependencies.
Of course, this is not the last word on the serverless @step functions, but it’s a very good first step. Next steps we are thinking about already are things like the following:
If you want to try our prototype yourself, clone the repo , check the video and don’t be shy: get in touch with us for anything, including taking some of the above next steps together.