snap.ml: making large scale machine learning accessible to individual data scientists and departments
Scott Soutter
HPC and AI executive responsible for global strategic programs. Former distributed AI product manager and product management leader.
For you data geeks: an announcement.
tl;dr: IBM figured out how to do with 4 servers, in 1.5 minutes something which takes Google cloud systems 90 servers and 70 minutes to accomplish.
For the past two years as the product manager for PowerAI, I have been focused on making deep learning work better — which is a fantastic problem to solve and helps push applied AI in to the mainstream. But the reality is that most organizations are not ready for deep learning and have been using very traditional machine learning techniques to act on data. Unlike deep learning, these ML approaches often scale poorly, quickly plateauing in accuracy and they generally do not take advantage of the huge performance available through accelerated computing. This changes now.
With a new set of compute libraries (called snap.ml), IBM Research in Zurich have figured out how to solve three of the most commonly used machine learning functions using accelerated systems (logistic regression, linear regression, and support vector machines). They will be coming in to PowerAI as a technology preview in a couple of months.
IBM research able to take a workload which on the Google cloud runs on 60 machine learning nodes, and 29 “feeder” nodes (preprocessing and serving data) and execute it on 4 systems: 178 CPUs down to 8 CPUs and 16 GPUs. We also took the compute time down substantially, from 70 minutes on the Google cloud down to 91.5 seconds on our four server cluster.
From the standpoint of energy savings and compute efficiency this is a great big deal: with what was literally tons of compute now available in about one quarter of a rack IBM is making these machine learning functions accessible to individual data scientists and departments.
And we’re just getting started. Stay tuned as we learn more from our work and are able to speed up other common analytic and machine learning problems.
So, PowerAI gets bigger this year. I’m not taking my eye off the ball on deep learning, but am very pleased to be able to add this new capability in to the platform.
https://arxiv.org/abs/1803.06333
Senior Principal Engineer, Machine Learning, Data Center Platform Application Engineering
7 年Great work by the Zurich labs Scott!