Supercomputing Part 1
Sachin Chheda
Driving the next generation of IT, Cloud, and AI services and innovation
This is a long overdue (series of posts), but going back to the #Supercomputing Conference felt so much like a homecoming that I couldn’t not write about it!?
The first time I attended the ACM/IEEE Supercomputing event was when it was held in Pittsburgh in the 90’s. It was a modest affair – it was one of the first industry conferences I had attended – as a student one doesn’t really get a chance to go unless you are presenting a paper, but this was local and the local Pittsburgh Supercomputing Center and my alma mater were heavily involved. I remember being in awe of the performance of some of the fastest computer systems in the world.? The fastest system on the November 1996 Top 500 list was a system built by Hitachi and had performance measured in hundreds of gigaFLOPs–mind you this was the 90s and we were still in the sub-100 MHz clock speed era. That system from Hitachi had 2048 PA-RISC cores and a whopping 128GB of memory. I also remembered how it uniquely utilized a 3-D crossbar interconnect. Looking back, I found out that it was in service for over a decade researchers solve some challenging particle physics problems.
In the early 2000’s, the Supercomputing community was a familiar community. During my time working at HP on what was then the HP Integrity Server family based on the Intel Itanium II microprocessors. Those systems were regulars in the top 500 list including a top 10 entry for a system–delivering teraflops scale performance. Systems back then used Myrinet, Quadrics, and the emerging Infiniband interconnect to string together thousands of cores. Storage was a simple file system spread across by thousands of spindles of spinning rust – focused on sequential performance. The problems these systems solved were the same – astrophysics, CFD, crash simulation, etc.?
领英推荐
Here we are in 2022, decades after the first time I attended the show. This year's top 500 entry i.e. the fastest supercomputer is the OLCF-5 aka Frontier is capable of delivering exaFLOPs of performance. Frontier has tens of thousands of GPUs, each with 128GB of memory! That’s the same amount of memory the top supercomputer had in total, except now it's per GPU. The most of the systems in the top 500 now mostly use Infiniband and Ethernet networking with throughput per port in 100’s of Gb/s. Storage has come a long way. While spinning rust aka HDD can still be seen in the data centers, NVMe (flash on the PCIe bus) delivers the performance. The scale has grown too – these computers are connected to petabytes and exabytes of storage. The workloads too have evolved – in addition to addressing the increasingly complex traditional high performance computing workflows, these systems are being used in medical and life sciences (drug discovery, protein folding, and so much more) as well as the red hot field of artificial intelligence including machine learning/deep learning.?
In the next post I’ll cover how I found the annual Supercomputing conference has evolved.