A decade of transformational system architecture
A look back – and a look forward
What a difference a decade makes! A few days ago, IBM’s Summit and Sierra systems just became the world’s fastest computers, on the heels of the publication documenting the latest innovations in the IBM POWER9 systems. Just a decade ago, innovation in systems architecture had essentially ceased with POWER6, and the high-profile defection of Apple as a partner and user of the Power architecture left the IBM systems business dispirited.
The recent success with Summit and Sierra marks the completion of a remarkable transformation of the most storied systems in the history of computing, Power enterprise servers. The transformation began a decade ago when IBM asked me to take charge of the company’s system architectures as Chief Architect and Senior Manager for all its system lines. The dual roles of line manager and technical lead would give me the power needed to turn around the sagging fortunes of IBM’s enterprise architecture line. With POWER6, we had broadly fallen behind the competition. It was clear that to turn around the Power line, innovation was broadly needed in all aspects of the system.
The first new system we launched was POWER7 with an emphasis on innovations that markedly improved application performance and I/O. A low-power laptop-class out-of-order core for which I was the a design lead and chief microarchitect just before taking on this new role was the perfect CPU for this new direction: low-power execution enabled a many-core design with up to four applications simultaneously executing on each core (SMT4), massively boosting system performance. I also created a new integrated floating point and vector architecture (VSX) to simplify the design and reduce the chip area – the new design made the system easier to program and reduced area, allowing us to double the number of floating point units in each core, and make better use of them in software. Finally, a new IO design architecture made use of industry standard PCIe devices while enhancing system reliability.
For the next-generation POWER8, the focus was on improving performance with a focus on ease of use, flexibility and code portability. As the many-core revolution took hold, making efficient use of the increased parallelism between many cores was becoming increasingly difficult for programmers. A new technology called transactional memory promised to lighten the programmers’ burden of writing error-free parallel code while improving performance. My team developed the new transactional memory technology for both server brands (that is, both the mainframe "z" servers and the enterprise Power system lines) and we were the first to bring the revolutionary technique to market.
To increase flexibility, our systems needed to be more open, but "not invented here" (NIH) was still a pervasive attitude at IBM, While IBM had created many innovations over the years, user needs were diverse and no company could address all these needs alone. To create more flexibility, we created a way to attach programmable accelerators, a new breed of computing engines my team had created a few years earlier for the Playstation 3 and was capturing the industry in a storm. Based on our experience with coherent accelerators, we created the coherent accelerator processor interface (CAPI) to enable anybody to connect efficiently to our new systems.
The final challenge was software portability and was related on how to count bits. (Really!) Since the invention of computing, there had been two ways to number bits – with numbers ascending left to right, or right to left. The debates over the right way of counting were as futile as the arguments on whether to open an egg on the big or little end as storied in the classic “Gulliver’s travels”, and deservedly the two ways of numbering the bits were called big-endian (counting bits left to right), and little-endian (right to left).
While there was no clear winner, there were clear losers: the programmers who needed to rewrite and retest every program to work in both environments, and the users who were stuck with incompatible systems and applications. Personal computers were little-endian. Enterprise systems were big-endian. And with the ongoing convergence of these systems, this dissonance was becoming bigger rather than smaller and an obstacle in entering into markets such as cloud computing. When our company’s CEO needed to be briefed on why the numbering of bits mattered to our ability to offer attractive solutions to our customers, things had clearly gotten out of hand.
I was asked to lead a small team defining a new environment for our systems that allowed our systems to run little-endian code efficiently. After all, the increasingly popular Linux operating system was being developed primarily as a little-endian system; internet powerhouses like Google, Yahoo and Facebook were developing all their solutions using little-endian programming; and the massive volumes of personal computers and cell phones meant that all new peripherals were now developed to use the same little-endian numbering.
The solution was clear: create a new environment capable of dealing with little-endian data, to give users the choice between our own systems and those based on little-endian Intel CPUs. To create certainty and clarity, this transition needed to be done quickly and decisively. In 2013, I led a small team to define the new environment. This was not a traditional “big company project” but rather a small startup with the CEO’s support with a focus on complete and rapid transformation. Within a few months, the team had defined the new environment and compilers that enabled the commercial launch of the new environment less than a year after the project had started, concurrently with the launch of systems based on POWER8 with our launch partner Ubuntu 14.04.
This transformation had far-reaching implications – an environment that could easily use little-endian programs was more interesting to many system developers and internet companies, and enabled us to launch OpenPOWER as a way to create an alliance around a more open system. It also enabled the integration of GPU accelerators originally designed to accelerate game graphics in PCs and game consoles as processing engines using Nvidia’s NVLink prototocol with an approach similar to the open CAPI interconnect. (The same GPUs now powering the world’s fastest Summit and Sierra computers.)
The final generation in this transformation is POWER9. After addressing performance, ease of use and portability for application programs and making Linux applications a flagship for the new environment, it was left to POWER9 to optimize the system architecture for the Linux operating system itself and for the cloud, with a new virtual memory system optimized for Linux based on radix-page tables. The new specification implemented in POWER9 was adopted by the OpenPOWER Foundation as specification for POWER3.0 indicating the completion of a remarkable transformation in the space of a decade, and a start of the next generation of systems.
It is this new environment that is the foundation for Summit and Sierra, and in a final transformative move, I started PowerAI, a new project to adopt open source AI technology and GPU acceleration to and create an enterprise application suite for developing new AI applications that continues to transform how Power enterprise servers are used today.
Top 8% Financial Advisor & Unit Manager at Sun Life PH
1 年Hi, Michael, This may be a long shot. But my Facebook account was hacked and I BADLY NEED HELP. I've been in contact with Meta Support via email for several days already but I still could not access my account. Please! I need help :(
Thanks, Subramaniam!? It's been my privilege and great fortune to work with so many dedicated colleagues like yourself to create exciting, transformative new technologies that have an impact!?As technologists, we have a unique opportunity to build a better tomorrow for the world at large.? Even when we compete, we collaborate to develop new and better solutions for everybody. I feel truly blessed by the opportunities I've had.? Happy holidays!
Senior Principal Engineer, Machine Learning, Data Center Platform Application Engineering
5 年A nice reminder of the path traversed to achieve this. Superb efforts Michael!!!