USENIX OpML '20 - Session 6 - Applications and Experiences
Monterey California - Joel Young

USENIX OpML '20 - Session 6 - Applications and Experiences

Join us for the OpML '20 session on operational machine learning issues from the point of view of practitioners solving real problems, hosted on the USENIX OpML Slack Workspace channel for our Ask-Me-Anything session with the authors. It will be Wednesday, August 5 from 9am - 10:30am, PDT. To join, just join the free slack workspace above and go to the channel!

Operational ML is a wide-ranging space where many technologies, platforms, and approaches are available. Some challenges cannot be seen until you actually try it out. Early adopters and their solutions are great design patterns for others to follow and learn from!

This session contains four such examples of ML Ops in real life, with talks from Mercari, Google, VMware, and Nvidia. You will learn how real-world scalable ML Ops works at scale and in applications ranging from E-commerce to Self Driving Cars, and the experiences, best practices, and solutions within!

Auto Content Moderation in C2C e-Commerce

Shunya Ueta, Suganprabu Nagaraja, and Mizuki SangoMercari, inc.

Consumer-to-consumer (C2C) e-Commerce is a large and growing industry with millions of monthly active users. In this paper, we propose auto content moderation for C2C e-Commerce to moderate items using Machine Learning (ML). We will also discuss practical knowledge gained from our auto content moderation system. The system has been deployed to production at Mercari since late 2017 and has significantly reduced the operation cost in detecting items violating our policies. This system has increased coverage by 554.8 % over a rule-based approach.

Inside NVIDIA’s AI Infrastructure for Self-driving Cars

Clement FarabetNVIDIA

We'll discuss Project MagLev, NVIDIA's internal end-to-end AI platform for developing its self-driving car software, DRIVE. We'll explore the platform that supports continuous data ingest from multiple cars producing TB of data per hour. We'll also cover how the platform enables autonomous AI designers to iterate training of new neural network designs across thousands of GPU systems and validate the behavior of these designs over multi PB-scale data sets. We will talk about our overall architecture for everything from data center deployment to AI pipeline automation, as well as large-scale AI dataset management, AI training, and testing.

Challenges and Experiences with MLOps for Performance Diagnostics in Hybrid-Cloud Enterprise Software Deployments

Amitabha Banerjee, Chien-Chia Chen, Chien-Chun Hung, Xiaobo Huang, Yifang Wang, and Razvan ChevesaranVMware Inc

This paper presents how VMware addressed the following challenges in operationalizing our ML-based performance diagnostics solution in enterprise hybrid-cloud environments: data governance, model serving and deployment, dealing with system performance drifts, selecting model features, centralized model training pipeline, setting the appropriate alarm threshold, and explainability. We also share the lessons and experiences we learned over the past four years in deploying ML operations at scale for enterprise customers.

Automating Operations with ML

Todd Underwood and Steven RossGoogle

Engineers have been attracted to the idea of using Machine Learning to control their applications and infrastructure. Unfortunately, the majority of proposed uses of ML for production engineering are unsuited for the stated purpose. They generally fail to account for several structural limitations of the proposed application, including failure to account for error rate, cost versus failure and most generally insufficient number of labeled examples.
We will review the common proposed applications of Machine Learning to production control including: anomaly detection, monitoring/alerting, capacity prediction, security, and resource scaling. For each we will use experience to demonstrate the limitations that ML modeling techniques have. We will identify one application with the best results.
We will end with specific recommendations for how organizations can get ready to take advantage of ML for their production operations in the future.

We hope to see you at the session!

Joel Young and Nisha Talagala, USENIX OpML '20 Co-Chairs

Joel Young

ML Infrastructure | Gen AI, Leadership

4 年
回复

要查看或添加评论,请登录

Joel Young的更多文章

社区洞察

其他会员也浏览了