USENIX OpML '20 - Session 5 - Model Deployment Strategies
Join us for the OpML '20 session on model deployment strategies for operational machine learning, hosted on the USENIX OpML Slack Workspace channel for our Ask-Me-Anything session with the authors. It will be Tuesday, August 4 from 9am - 10:30am, PDT. To join, just join the free slack workspace above and go to the channel!
It doesn't matter how good your data science and machine learning engineering teams are if you can't run your models in production! Whether you're working with personalized customer date, real-time sensing, or search and ranking—rapidly changing features and combinatoric complexity often rule out just computing everything offline.
In this session, we have presenters from three top-tier internet companies, Intuit, Netflix, Adobe, the United States Air Force, and Clarkson University for running and managing models in production. Topics include techniques for directly running PyTorch models as RESTful endpoints, trade-offs between model execution strategies and intermediate formats such as ONNX, an open source system enabling data scientists and others to bring their models to production without deep systems experience, and techniques for managing and running production models at scale.
FlexServe: Deployment of PyTorch Models as Flexible REST Endpoints
Edward Verenich, Clarkson University; Alvaro Velasquez, Air Force Research Laboratory; M. G. Sarwar Murshed and Faraz Hussain, Clarkson University
The integration of artificial intelligence capabilities into modern software systems is increasingly being simplified through the use of cloud-based machine learning services and representational state transfer architecture design. However, insufficient information regarding underlying model provenance and the lack of control over model evolution serve as an impediment to more widespread adoption of these services in operational environments which have strict security requirements. Furthermore, although tools such as TensorFlow Serving allow models to be deployed as RESTful endpoints, they require the error-prone process of converting the PyTorch models into static computational graphs needed by TensorFlow. To enable rapid deployments of PyTorch models without the need for intermediate transformations, we have developed FlexServe, a simple library to deploy multi-model ensembles with flexible batching.
Managing ML Models @ Scale - Intuit’s ML Platform
Srivathsan Canchi and Tobias Wenzel, Intuit Inc.
At Intuit, machine learning models are derived from huge, sensitive data sets that are continuously evolving, which in turn requires continuous model training and tuning with a high level of security and compliance. Intuit’s Machine Learning Platform provides Model LifeCycle management capabilities that are scalable and secure using GitOps, SageMaker, Kubernetes and Argo Workflows.
In this talk, we’ll go over the model management problem statement at Intuit, data science/MLE needs vs Intuit’s enterprise needs, provide an introduction to our model management interface and self serve capabilities. This talk will cover aspects of our platform such as feature management and processing, bill backs, collaborations and separation of operational concerns between platform and model. These capabilities of the platform have enabled model publishing velocity increases of over 200%, and this talk will illustrate how we got there.
Edge Inference on Unknown Models at Adobe Target
Georgiana Copil, Iulian Radu, and Akash Maharaj, Adobe Systems
Customer’s Data Scientist: “I know much better my business, just give me the data and I’ll create the model you should run.” Transforming this into reality, in our production systems, comes with a lot of challenges, which we will discuss in this talk.
In today’s world, increasingly many companies build their own data science/ML departments. When needing to run their custom models on different systems, the models need to be converted to other frameworks or to a format interpretable in representation standards for machine learning models (e.g., ONNX). In this talk we discuss challenges and approaches to using such models in real-time, low-latency systems. We discuss the limitations of existing frameworks, scoring runtimes, model representations, and the existing solutions to overcome them. We discuss how these methods can be used today to build a solution that provides real-time scoring for high throughput workload.
More Data Science, Less Engineering: A Netflix Original
Savin Goyal, Netflix
Data Science usage at Netflix goes much beyond our eponymous recommendation systems. It touches almost all aspects of our business - from optimizing content delivery to making our infrastructure more resilient to failures and beyond. Also, our unique culture affords our data scientists extraordinary freedom of choice in ML tools and libraries, all of which results in an ever-expanding set of interesting problem statements and a diverse set of ML approaches to tackle them. Our data scientists, at the same time, are expected to build, deploy, and operate complex ML workflows autonomously without the need to be significantly experienced with systems or data engineering.
In this talk, we discuss the infrastructure available to our data scientists focused on providing an improved development and deployment experience for ML workflows. We focus on Metaflow (now open source at metaflow.org), our ML framework, which offers delightful abstractions to manage the model’s lifecycle end-to-end and how our culture and focus on human-centric design affects our data scientist’s velocity.
We hope to see you at the session!
Joel Young and Nisha Talagala, USENIX OpML '20 Co-Chairs
Data Scientist, Programmer, Educator, Mentor, Lifelong Learner, Explorer
4 年Looking forward to it
ML Infrastructure | Gen AI, Leadership
4 年Mammad Zadeh, Tim Converse, Faisal Siddiqi
ML Infrastructure | Gen AI, Leadership
4 年Authors: Edward Verenich, MD GOLAM SARWAR MURSHED, Faraz Hussain, Srivathsan Canchi, Tobias Wenzel, Georgiana Copil, Iulian Radu, Akash Maharaj, Savin Goyal