Scaling AI on HPC

Scaling AI on HPC

(HPC and AI Part 2 )

In the HPC DevCon keynote, we highlighted some of the opportunities created by running AI applications on HPC infrastructure (AI-on-HPC), and the equally promising results of executing HPC usages with the addition of AI techniques (HPC-on-AI). Let’s start by discussing AI-on-HPC.

AI Deep Learning (DL) applications excel in knowledge discovery, which involves ingesting large volumes of mostly unstructured data, identifying structure and patterns, and classifying clusters and features within a large multi-dimensional space. All of this is done with moderate to no supervision; the parallel processing capabilities inherent to Neural Network structures provide almost limitless extendability when coupled with the bandwidth provided by HPC.

The underlying infrastructure of HPC masters aggregation and scale. HPC enables the highest level of system compute performance, massive memory pools, and optimized communication infrastructure with best cross-nodes bandwidth and throughput. These crucial capabilities allow Deep Learning to scale and solve the largest and most complex challenges.

The potential for DL scalability that is achievable with HPC infrastructure finds an example in the work done by Prabhat and the team at US Department of Energy, Office of Science, along with Berkeley Lab in collaboration with Intel Labs. The team put together a 15-PetaFLOP Deep Learning system for solving scientific pattern classification problems. The system scales training of a single model to ~9600 Intel? Xeon Phi? processor-based nodes on the Cori supercomputer, to effectively extract weather patterns in a 15TB climate dataset [see 1 below]. Their results demonstrate the advantages of optimizing and scaling DL structures onto many-core HPC systems.

Another major benefit of using HPC infrastructure for Deep Learning comes from the greatly improved response time for DL training iterations. Developing a DL network solution requires an iterative process that involves data scientists or researchers, and compute-intensive experimentation. The ability to effectively explore, examine and optimize the network materially shortens the time to train models and can contribute to higher quality results.

The next post will discuss the great potential of applying Deep Learning capabilities and techniques to significantly enhance key HPC usages (HPC-on-AI).

Thanks for your interest,

Gadi

 You can watch here the portion of the HPC DevCon Keynote relevant to this article.


  1. Kurth, Thorsten et Al.: Deep learning at 15PF: supervised and semi-supervised classification for scientific data,  Proceedings of the International Conference for High Performance Computing SC '17, Networking, Storage and Analysis, Article No. 7 https://dl.acm.org/citation.cfm?doid=3126908.3126916


 

Chakravarthy Nagarajan

Principal Solutions Architect specialized in Machine Learning and Gen AI

7 年

Interesting

要查看或添加评论,请登录

Gadi Singer的更多文章

社区洞察

其他会员也浏览了