登录查看更多内容

USENIX OpML '20 - Session 4 - Algorithms

Joel Young

ML Infrastructure | Gen AI, Leadership

发布日期: 2020年7月24日

Join us for the OpML '20 session on algorithms for operational machine learning, hosted on the USENIX OpML Slack Workspace channel for our Ask-Me-Anything session with the authors. It will be Friday, July 31 from 9am - 10:30am, PDT. To join, just join the free slack workspace above and go to the channel!

Why are we covering new algorithms in a Production ML conference? Simple - new ML techniques emerge daily, and in our field, new innovations make it into production reality in record time!

This session covers new algorithmic innovations, from scalable AutoML with Ray to Real Time Incremental learning, in a practical and production context. From LinkedIn, Intel and University of Merced, the talks cover how to improve ML scale, iteration and speed, how to increase automation and reduce human reliance and thereby deploy new models faster, how to make adaptive ML with incremental learning, and more!

RIANN: Real-time Incremental Learning with Approximate Nearest Neighbor on Mobile Devices

Jiawen Liu and Zhen Xie, University of California, Merced; Dimitrios Nikolopoulos, Virginia Tech; Dong Li, University of California, Merced

Approximate nearest neighbor (ANN) algorithms are the foundation for many applications on mobile devices. Real-time incremental learning with ANN on mobile devices is emerging. However, incremental learning with current ANN algorithms on mobile devices is difficult, because data is dynamically and incrementally generated on mobile devices and as a result, it is difficult to reach high timing and recall requirements on indexing and search. Meeting the high timing requirements is critical on mobile devices because of short user response time and low battery lifetime.

We introduce an indexing and search system for graph-based ANN on mobile devices called RIANN. By constructing ANN with dynamic ANN construction properties, RIANN enables high flexibility for ANN construction to meet the high timing and recall requirements in incremental learning. To select an optimal ANN construction property, RIANN incorporates a statistical prediction model. RIANN further offers a novel analytical performance model to avoid runtime overhead and interaction with mobile devices. In our experiments, RIANN significantly outperforms the state-of-the-art ANN (2.42x speedup) on Samsung S9 mobile phone without compromising search time or recall. Also, for incrementally indexing 100 batches of data, the state-of-the-art ANN satisfies 55.33% batches on average while RIANN can satisfy 96.67% with minimum impact on recall.

Rise of the Machines: Removing the Human-in-the-Loop

Viral Gupta and Yunbo Ouyang, LinkedIn

Most large-scale online recommender systems like notifications recommendation, newsfeed ranking, people recommendations, job recommendations, etc. often have multiple utilities or metrics that need to be simultaneously optimized. The machine learning models that are trained to optimize a single utility are combined together through parameters to generate the final ranking function. These combination parameters drive business metrics. Finding the right choice of the parameters is often done through online A/B experimentation, which can be incredibly complex and time-consuming, especially considering the non-linear effects of these parameters on the metrics of interest. In this talk we will present how we build generic solution to solve the problem at scale.

Cluster Serving: Distributed Model Inference using Big Data Streaming in Analytics Zoo

Jiaming Song, Dongjie Shi, QiYuan Gong, Lei Xia, and Jason Dai, Intel

As deep learning projects evolve from experimentation to production, there is increasing demand to deploy deep learning models for large-scale, real-time distributed inference. While there are many tools available for relevant tasks (such as model optimization, serving, cluster scheduling, workflow management, etc.), it is still a challenging process for many deep learning engineers and scientists to develop and deploy distributed inference workflow that can scale out to large clusters in a transparent fashion.

To address this challenge, we have developed Cluster Serving, an automated and distributed serving solution that supports a wide range of deep learning models (such as TensorFlow, PyTorch, Caffe, BigDL, and OpenVINO). It provides simple publish-subscribe (pub/sub) and REST APIs, through which users can easily send their inference requests to the input queue using simple Python or HTTP APIs. Cluster Serving will then automatically manage the scale-out and real-time model inference across a large cluster, using distributed Big Data streaming frameworks (such as Apache Spark Streaming and Apache Flink).

In this talk, we will present the architecture design for Cluster Serving, and discuss the underlying design patterns and tradeoffs to deploy deep learning models on distributed Big Data streaming frameworks in production. In addition, we will also share real-world experience and "war stories" of users who have adopted Cluster Serving to develop and deploy distributed inference workflow.

Scalable AutoML for Time Series Forecasting using Ray

Shengsheng Huang and Jason Dai, Intel

Time Series Forecasting is widely used in real world applications, such as network quality analysis in Telcos, log analysis for data center operations, predictive maintenance for high-value equipment, and etc. Recently there's a trend to apply machine learning and deep learning methods to such problems, and there's evidence that they can outperform traditional methods (such as autoregression and exponential smoothing) in several well-known competitions and real-world use cases.

However, building the machine learning applications for time series forecasting can be a laborious and knowledge-intensive process. In order to provide an easy-to-use time series forecasting toolkit, we have applied Automated Machine Learning (AutoML) to time series forecasting. The toolkit is built on top of Ray (a distributed framework for emerging AI applications open-sourced by UC Berkeley RISELab), so as to automate the process of feature generation and selection, model selection and hyper-parameter tuning in a distributed fashion. In this talk we will share how we build the AutoML toolkit for time series forecasting, as well as real-world experience and take aways from earlier users.

We hope to see you at the session!

Joel Young and Nisha Talagala, USENIX OpML '20 Co-Chairs

要查看或添加评论，请登录

Joel Young的更多文章

USENIX OpML '20 - Session 8 - Bias, Ethics, and Privacy

2020年7月25日

USENIX OpML '20 - Session 8 - Bias, Ethics, and Privacy

Join us for the final OpML '20 session on bias, ethics, and privacy from the perspective of operational machine…

3 条评论
USENIX OpML '20 - Session 7 - Model Training

2020年7月25日

USENIX OpML '20 - Session 7 - Model Training

Join us for the OpML '20 session on operational machine learning issues from the point of view of practitioners solving…
USENIX OpML '20 - Session 6 - Applications and Experiences

2020年7月25日

USENIX OpML '20 - Session 6 - Applications and Experiences

Join us for the OpML '20 session on operational machine learning issues from the point of view of practitioners solving…

1 条评论
USENIX OpML '20 - Session 5 - Model Deployment Strategies

2020年7月25日

USENIX OpML '20 - Session 5 - Model Deployment Strategies

Join us for the OpML '20 session on model deployment strategies for operational machine learning, hosted on the USENIX…

4 条评论
Features, Explainability, and Analytics OpML '20 Session 3

2020年7月21日

Features, Explainability, and Analytics OpML '20 Session 3

Join us for the OpML '20 session on Features, Explainability, and Analytics, hosted on the USENIX OpML Slack Workspace…

1 条评论
Joel's Cashew Pesto and Doogh

2019年7月27日

Joel's Cashew Pesto and Doogh

Got basil? Got cucumbers? Here's something #yummy I made up. I don't get much coding time anymore, but I can still use…

7 条评论
How the Experts Do It: Production ML at Scale

2019年6月7日

How the Experts Do It: Production ML at Scale

Machine learning is driving virtually every major online service we use. In this panel, top experts from across the…

4 条评论
Support Traps — A cautionary tale for infrastructure engineers

2019年1月12日

Support Traps — A cautionary tale for infrastructure engineers

BLUF: Avoid the support trap — a kind of success trap many platform engineering teams experience. In 2016, I started…

31 条评论

See all articles

USENIX OpML '20 - Session 4 - Algorithms

Joel Young

ML Infrastructure | Gen AI, Leadership

RIANN: Real-time Incremental Learning with Approximate Nearest Neighbor on Mobile Devices

Rise of the Machines: Removing the Human-in-the-Loop

Cluster Serving: Distributed Model Inference using Big Data Streaming in Analytics Zoo

Scalable AutoML for Time Series Forecasting using Ray

Joel Young的更多文章

社区洞察

其他会员也浏览了

DeepSeek: AI as an Age of Enablement and Disruption - Challenging Hardware Market Assumptions and Scaling Laws

Memory Layers by Meta: Redefining Scalability in AI Architectures

How To Download & Run DeepSeek R1 Locally? A Guide With Example

Beginners Guide to RAG

Artificial Intelligence #26

Attention is not Exactly What you Need. Introducing Mamba!

DALLE 3 Explained: Improving Image Generation with Better Captions

What Hardware Do You Need for RAG with GenAI?

NewMind AI Journal #18

Edge AI and Vision Insights

RIANN: Real-time Incremental Learning with Approximate Nearest Neighbor on Mobile Devices

Rise of the Machines: Removing the Human-in-the-Loop

Cluster Serving: Distributed Model Inference using Big Data Streaming in Analytics Zoo

Scalable AutoML for Time Series Forecasting using Ray

Joel Young的更多文章

USENIX OpML '20 - Session 8 - Bias, Ethics, and Privacy

USENIX OpML '20 - Session 7 - Model Training

USENIX OpML '20 - Session 6 - Applications and Experiences

USENIX OpML '20 - Session 5 - Model Deployment Strategies

Features, Explainability, and Analytics OpML '20 Session 3

Joel's Cashew Pesto and Doogh

How the Experts Do It: Production ML at Scale

Support Traps — A cautionary tale for infrastructure engineers

社区洞察

其他会员也浏览了

DeepSeek: AI as an Age of Enablement and Disruption - Challenging Hardware Market Assumptions and Scaling Laws

Memory Layers by Meta: Redefining Scalability in AI Architectures

How To Download & Run DeepSeek R1 Locally? A Guide With Example

Beginners Guide to RAG

Artificial Intelligence #26

Attention is not Exactly What you Need. Introducing Mamba!

DALLE 3 Explained: Improving Image Generation with Better Captions

What Hardware Do You Need for RAG with GenAI?

NewMind AI Journal #18

Edge AI and Vision Insights