IBM's POWER10 chip is too small !!
Thanks to Mathew S. and Unspash

IBM's POWER10 chip is too small !!

Yes! That’s right, POWER10 is IBM’s first 7nm chip with 18 Billion transistors.  IBM showcased the POWER10 chip with amazing set of innovations and enhancements in the Hot Chips conference last week (17th AUG 2020). Here are 10 big innovations and enhancements about POWER10 that’s sure to excite you:

1.    The Power10 core adheres to the open sourced  POWER Instruction Set Architecture (“ISA”) Specification version 3.1.  You can find details of the ISA in this e-book: https://ibm.ent.box.com/s/hhjfw0x0lrbtyzmiaffnbxh2fuo0fog0 

2.    Power10 core has over 200 new instructions that have been added and they are aimed at making the microarchitecture more efficient at certain functions and new functions.

3.    There are 16 cores on the Power10 die, but IBM is activating 15 of them in SMT8 mode (or 30 in SMT4 mode) for applications to use. 1-2 POWER10 chips per socket means 30 SMT8 cores or 60 SMT4 cores planned with a total of 240 threads per socket.

Note: SMT stands for Simultaneous multithreading (SMT) which is a processor technology that allows multiple instruction streams (or threads) to run concurrently on the same physical processor, improving overall throughput. To the operating system, each hardware thread is treated as an independent logical processor. POWER9 support. SMT4 is 4 threads per core and SMT8 is 8 threads per core. SMT modes can be dynamically changed.

4.    For the SMT8 version of the Power10 core has 2 MB L2 cache segment on the core, which is four times as large as the L2 cache on the Power9 core. There are twice as many load and store units and four times as many memory management units, and these feed into those SIMD and matrix math units, which are 2X and 4X as big as on Power9, respectively. Initial tests show some impressive results: POWER10 socket, normalized for 4 GHz performance ~3.25X the integer performance, ~3X the floating point or Java performance, and ~2.25X the memory bandwidth performance of the POWER9 socket.

5.    The power consumption per core has been cut into ~half. That’s a factor of 2.6X improvement in performance per watt.

6.    The chip has 128 MB of L3 cache, and it appears IBM will not be using embedded DRAM design (POWER9) but rather regular SRAM.

7.    Each socket can support ~ a Terabyte/second in bandwidth with its PowerAXON interface and the same TB/s bandwidth in the OMI.

Notes: PowerAXON refers to a collection of high-speed I/O interconnects used by the POWER9 chip which is also part of POWER10.

Open Memory Interface (OMI) provide a technology-agnostic and low-latency means of connecting accelerators and memory to a CPU.

8.    One mid blowing features for POWER10 will be its ability to aggregate memory across a cluster of  POWER servers. The idea is: Local processors can map its local memory to neighbour processors, so workloads that need more memory can use the neighbour processor memory instead of swapping pages to much slower storage. Local memory is connected by the low latency and high bandwidth OMI connection for up to 4TB of memory and high bandwidth without the limitations and expense of high-bandwidth memory. The PowerAXON interface can be used to connect to 16 other POWER10 sockets.

9.    AI-optimized ISA is part of the POWER10. The plan would be to use POWER10 chips for inference without the need for accelerators. The core will have bandwidth to incorporate AI related compute. x4 – x32 matrix math acceleration is expected compared to POWER9. POWER10 will support two flavours of 16-bit half precision floating point – the IEEE float-16 format and Google’s Bfloat-16 alternative which has a greater dynamic range than FP16. Some precision will be sacrificed without giving up the dynamic range for numbers, and for AI this is a trade-off that has very little impact on machine learning training and inference results and allows for more data to be run through the math units in any given time, which has a huge impact on machine learning results. The POWER10 chip will also support 4-bit, 8-bit, and 16-bit integer operations, which are mostly important for machine learning inference workloads.

10. POWER10 core is intended to improve accelerated encryption performance with ~4x the number of AES encryption engines compared to POWER9. It also brings capabilities to improve container security such as hardware-enforced container safeguard and isolation capabilities.

Further Reading:

 1. IBM Brings An Architecture Gun To A Chip Knife Fight:

https://www.nextplatform.com/2020/08/18/ibm-brings-an-architecture-gun-to-a-chip-knife-fight/

2. Hot Chips 2020 Live Blog on POWER10:

https://www.anandtech.com/show/15985/hot-chips-2020-live-blog-ibms-power10-processor-on-samsung-7nm-1000am-pt

3. IBM POWER10 Mega Chip For Hybrid Cloud Is Revealed: https://www.forbes.com/sites/tiriasresearch/2020/08/17/ibm-power10-mega-chip-for-hybrid-cloud-is-revealed/#23f80cc46d13

Abhishek Kumar

Welcome to Absolute

3 年

????

回复
Deepak C Shetty

Principal, Learning Content Development, IBM Power, WW | Pre-Sales Technical Consulting - Hybrid Cloud / AI / CognitiveOfferings | IBM Redbooks Author | Developer Advocate | OpenShift Advocacy | Opensource Enthusiast

4 年

Great summary, thanks for the nice article Gerard Suren Saverimuthu

Sunny Panjabi

CISSP | Data & AI | AWS, GCP & Azure Certified

4 年

Intel did not catch up yet with these levels of capabilities

回复

要查看或添加评论,请登录

社区洞察

其他会员也浏览了