USENIX OpML '20 - Session 4 - Algorithms
Join us for the OpML '20 session on algorithms for operational machine learning, hosted on the USENIX OpML Slack Workspace channel for our Ask-Me-Anything session with the authors. It will be Friday, July 31 from 9am - 10:30am, PDT. To join, just join the free slack workspace above and go to the channel!
Why are we covering new algorithms in a Production ML conference? Simple - new ML techniques emerge daily, and in our field, new innovations make it into production reality in record time!
This session covers new algorithmic innovations, from scalable AutoML with Ray to Real Time Incremental learning, in a practical and production context. From LinkedIn, Intel and University of Merced, the talks cover how to improve ML scale, iteration and speed, how to increase automation and reduce human reliance and thereby deploy new models faster, how to make adaptive ML with incremental learning, and more!
RIANN: Real-time Incremental Learning with Approximate Nearest Neighbor on Mobile Devices
Jiawen Liu and Zhen Xie, University of California, Merced; Dimitrios Nikolopoulos, Virginia Tech; Dong Li, University of California, Merced
Approximate nearest neighbor (ANN) algorithms are the foundation for many applications on mobile devices. Real-time incremental learning with ANN on mobile devices is emerging. However, incremental learning with current ANN algorithms on mobile devices is difficult, because data is dynamically and incrementally generated on mobile devices and as a result, it is difficult to reach high timing and recall requirements on indexing and search. Meeting the high timing requirements is critical on mobile devices because of short user response time and low battery lifetime.
We introduce an indexing and search system for graph-based ANN on mobile devices called RIANN. By constructing ANN with dynamic ANN construction properties, RIANN enables high flexibility for ANN construction to meet the high timing and recall requirements in incremental learning. To select an optimal ANN construction property, RIANN incorporates a statistical prediction model. RIANN further offers a novel analytical performance model to avoid runtime overhead and interaction with mobile devices. In our experiments, RIANN significantly outperforms the state-of-the-art ANN (2.42x speedup) on Samsung S9 mobile phone without compromising search time or recall. Also, for incrementally indexing 100 batches of data, the state-of-the-art ANN satisfies 55.33% batches on average while RIANN can satisfy 96.67% with minimum impact on recall.
Rise of the Machines: Removing the Human-in-the-Loop
Viral Gupta and Yunbo Ouyang, LinkedIn
Most large-scale online recommender systems like notifications recommendation, newsfeed ranking, people recommendations, job recommendations, etc. often have multiple utilities or metrics that need to be simultaneously optimized. The machine learning models that are trained to optimize a single utility are combined together through parameters to generate the final ranking function. These combination parameters drive business metrics. Finding the right choice of the parameters is often done through online A/B experimentation, which can be incredibly complex and time-consuming, especially considering the non-linear effects of these parameters on the metrics of interest. In this talk we will present how we build generic solution to solve the problem at scale.
Cluster Serving: Distributed Model Inference using Big Data Streaming in Analytics Zoo
Jiaming Song, Dongjie Shi, QiYuan Gong, Lei Xia, and Jason Dai, Intel
As deep learning projects evolve from experimentation to production, there is increasing demand to deploy deep learning models for large-scale, real-time distributed inference. While there are many tools available for relevant tasks (such as model optimization, serving, cluster scheduling, workflow management, etc.), it is still a challenging process for many deep learning engineers and scientists to develop and deploy distributed inference workflow that can scale out to large clusters in a transparent fashion.
To address this challenge, we have developed Cluster Serving, an automated and distributed serving solution that supports a wide range of deep learning models (such as TensorFlow, PyTorch, Caffe, BigDL, and OpenVINO). It provides simple publish-subscribe (pub/sub) and REST APIs, through which users can easily send their inference requests to the input queue using simple Python or HTTP APIs. Cluster Serving will then automatically manage the scale-out and real-time model inference across a large cluster, using distributed Big Data streaming frameworks (such as Apache Spark Streaming and Apache Flink).
In this talk, we will present the architecture design for Cluster Serving, and discuss the underlying design patterns and tradeoffs to deploy deep learning models on distributed Big Data streaming frameworks in production. In addition, we will also share real-world experience and "war stories" of users who have adopted Cluster Serving to develop and deploy distributed inference workflow.
Scalable AutoML for Time Series Forecasting using Ray
Shengsheng Huang and Jason Dai, Intel
Time Series Forecasting is widely used in real world applications, such as network quality analysis in Telcos, log analysis for data center operations, predictive maintenance for high-value equipment, and etc. Recently there's a trend to apply machine learning and deep learning methods to such problems, and there's evidence that they can outperform traditional methods (such as autoregression and exponential smoothing) in several well-known competitions and real-world use cases.
However, building the machine learning applications for time series forecasting can be a laborious and knowledge-intensive process. In order to provide an easy-to-use time series forecasting toolkit, we have applied Automated Machine Learning (AutoML) to time series forecasting. The toolkit is built on top of Ray (a distributed framework for emerging AI applications open-sourced by UC Berkeley RISELab), so as to automate the process of feature generation and selection, model selection and hyper-parameter tuning in a distributed fashion. In this talk we will share how we build the AutoML toolkit for time series forecasting, as well as real-world experience and take aways from earlier users.
We hope to see you at the session!
Joel Young and Nisha Talagala, USENIX OpML '20 Co-Chairs