The future of AI Compute Servers?  The answer is Blowin In The Wind.

The future of AI Compute Servers? The answer is Blowin In The Wind.

Commodity compute server buyers, beware the winds of change!

Read about server trends today, and you will find that public cloud is on the rise, and that hyperscalers and manufacturers of "white box" commodity servers - like Super Micro, Huawei, and Inspur - seem to be gaining traction.

The uptick in white box commodity servers follows the trend towards "good enough". It has long been acknowledged that x86-based commodity servers lack the scale, performance, reliability and security of mainframes and RISC-based systems, but for most workloads, the belief was that they are "good enough".

Cue the winds of change

 A new trend is upon us. Researchers are now pointing out that servers used by hyperscalers, and x86-based servers in general, may not have the compute architecture needed to support AI workloads. This is especially the case for Deep Neural Networks, image processing and natural language processing, which require high bandwidth interconnects between CPU and GPU memory as terabytes of information are used during training and inferencing of AI models.

 Gartner’s ‘Market Guide for Compute Platforms 2018’ recommends specifically to IT leaders that on-premises IT infrastructures should be built with support for business-critical artificial intelligence and in-memory applications, by including servers that can exploit technologies such as accelerators and persistent memory. Another Gartner research document ‘Market Guide for Machine Learning infrastructure 2018’ states that due to factors including total cost of ownership (TCO), data gravity, ease of use and lack of data scientists across end users, a majority of organizations leverage on-premises ecosystems for building machine learning models.

 Read my blog, ‘Looking under the hood of your AI infrastructure’ for more details. It talks about how x86 servers and the IBM Power AC922 were used for model training and evaluation, highlighting the significant difference in performance between them.

 Jim McGregor, Principal Analyst at TIRIAS Research, in a Forbes article entitled "The Winds Are Changing In Servers -- AI Leads To Opportunity for IBM & Power," has stated "IBM appears the best positioned to benefit from the tremendous interest in AI." With various stumbles by Intel, there's "...a renewed opportunity for IBM with its Power architecture." Click here to read the article. 

 It states that IBM Power is the only processor architecture using NVIDIA’s NVLink interface directly in the processor itself which significantly improves performance. As Jim McGregor, Gartner, and other researchers have emphasized – while GPUs and FPGAs provide the raw compute power required for AI workloads; their effectiveness is severely limited unless there are high bandwidth interconnects between these powerful processors and memory.

 It goes beyond just hardware

 It is commonly understood and agreed, among those who study machine learning and infrastructure: that while popular deep learning frameworks, including TensorFlow, Caffe, Torch and Chainer can efficiently leverage multiple GPUs in a single system, scaling to multiple, clustered servers with GPUs is difficult, at best.

 Let’s take an example. A powerful convolutional neural network, called ResNet 101, was trained on IBM Power Servers for image classification on a dataset of 7.5 million images. Not only was the IBM model more accurate, but it took just 7 hours vs. 10 days for x86 servers (published by Microsoft) for the same task. Much of the efficiency in training was due to software called Distributed Deep Learning (DDL), which was able to scale training across 256 GPUs with 95% efficiency!

 The Forbes article mentions not just DDL, but Machine Learning libraries like Snap ML, which further optimize the training of neural networks. Some of these libraries were developed specifically for image and video recognition tasks, key components in the AI field of computer vision.

 And speaking of software:

 IBM PowerAI enterprise: IBM’s enterprise software distribution which combines popular open source deep learning frameworks, efficient AI development tools, and accelerated IBM Power Systems. The popular deep learning frameworks mentioned above are deployed swiftly with pre-built binaries, instead of data scientists having to spend time and effort downloading them and their dependencies (no easy task).

 IBM PowerAI Vision: IBM’s computer vision technology which can be used to train object detection and image classification models, as well as perform video analytics, without deep learning or coding expertise. The software is so simple to use that we actually gave it to school children to learn how to train models!

 A final note

 Did you know that Google deployed the IBM POWER9 chip (the ‘Zaius’ platform) in its datacenter for production workloads? Some reasons given were more cores and threads for Google Search and more memory bandwidth for RNN machine learning execution. Click here to read about it. And of course, you would have heard about Summit, the world’s most powerful supercomputer, which is also based on IBM POWER9 AC922.

 As businesses race to drive digital transformation and improved client experience with technologies like AI and Big Data & Analytics; commodity servers will not be sufficient. The Google story should be ample proof – they chose to use Power servers in their hyperscale datacentres rather than x86 based systems.

 “Cheap and good’ is not good enough anymore. Beware commodity compute makers, the winds of change are here!

Please share your experiences with accelerated compute and AI models.

Be Happy,

Eric

John Hennessey

Strategic Clients - Asia Pacific Markets, Broadcom Software | Connecting Everything

6 年

Well written Eric, an interesting insight.

Sinisa (Sin) Nikolic

Director Asia Pacific - High Performance Computing, CSP and Artificial Intelligence at Lenovo

6 年

Eric,? I think you have hit the nail on the head here.?? Thank you for the share and insight.?

要查看或添加评论,请登录

Eric Schnatterly的更多文章

  • Just-In-Time or Just-In-Case

    Just-In-Time or Just-In-Case

    The coronavirus pandemic has tested the resolve of every nation and has exposed the frailties of our healthcare…

    8 条评论
  • Quantum "Cubism"

    Quantum "Cubism"

    Quantum, like art, blurs the lines Just like an impressionistic painting, from Vincent van Gogh or Henri de…

    3 条评论
  • When you're right to be wrong

    When you're right to be wrong

    I think we can all agree, Amazon founder Jeff Bezos is no dummy. You don't build such companies and amass such wealth…

    11 条评论
  • Breaking News: Data Storage Matters

    Breaking News: Data Storage Matters

    Storage should be Big News! Say what? In the November 7, 2018 publication of Forbes, an article appeared whose title…

    1 条评论
  • Killing the Mainframe

    Killing the Mainframe

    Fat chance. Over the past 7 decades, and with each new technology trend, prognosticators have predicted the demise of…

    56 条评论
  • Grandma knows best - Be Kind

    Grandma knows best - Be Kind

    I learned "The Golden Rule" from my Grandmother, who embodied the principles of this maxim. In simple terms, The Golden…

    10 条评论
  • 4 out of 5 Fortune 100 Companies do this...

    4 out of 5 Fortune 100 Companies do this...

    Companies large and small, from every industry, and from around the globe, depend on IBM Power Systems to "get the job…

    2 条评论
  • Looking under the hood of your AI infrastructure

    Looking under the hood of your AI infrastructure

    At a recent gathering of Machine Learning (ML) experts, which I had the temerity to attend, many spoke enthusiastically…

    9 条评论
  • The Hazards of Early Success

    The Hazards of Early Success

    Recently, I had the opportunity to hear astronaut – Colonel Chris Hadfield – speak about his space expeditions and how…

    2 条评论
  • Data is the new currency (and dial-tone)

    Data is the new currency (and dial-tone)

    This notion that data has become fungible currency is not a new idea. A decade ago, Clive Humby declared that “data was…

    3 条评论

社区洞察

其他会员也浏览了