How to understand to Infini(Band)ty and Beyond
Fancy Wang
Helping Global Enterprises Optimize Network Performance | Ethernet Card & Switch Solutions
Fancy Wang 0426 2021
The following part is from Gilad Shainer, Sr. Vice President of Marketing
The heart of a data center is the network that connects all the compute and storage elements together. In order to get these elements working together and form a supercomputer (for research, cloud or deep learning), the network must be highly efficient and extremely fast. InfiniBand is an industry standard technology that was (and continues to be) developed with the vision of forming a highly scalable, pure software-defined network (SDN). Back in 2003, it connected one of the top three supercomputers in the world. The June 2020 TOP500 supercomputing list stated that InfiniBand connects seven of the top ten supercomputers in the world. InfiniBand is strongly adopted for deep learning infrastructures, and is increasingly being used for hyperscale cloud data centers such as Microsoft Azure and others. The performance, scalability, and efficiency advantages of InfiniBand continue to drive the growing and strong adoption of InfiniBand, as it is the ideal technology for compute and data intensive applications.
InfiniBand provides key advantages: It is a full-transport offload network, which means that all the network operations are managed by the network and not by the CPU or the GPU; it enables the most efficient data traffic, which means that more data gets transported with less overhead; it is the only 200 gigabit-per-second high-performance end-to-end network today; it has the lowest latency compared to any other standard or proprietary network; and most importantly, it incorporates data processing engines inside the network that accelerate data processing for deep learning and high-performance computing.
The answer to why InfiniBand presents these advantages and continuously maintains a one-generation-ahead technology leadership can be found in the four main InfiniBand technology fundamentals:
A very smart endpoint – an endpoint that can execute and manage all of the network functions (unlike Ethernet or proprietary networks), and can therefore increase the CPU or GPU time that can be dedicated for the real applications. Since the endpoint is located near CPU/GPU memory, it can also manage memory operations in a very effective and efficient way—for example, via RDMA or GPUDirect RDMA / storage.
A switch network that is designed for scale – it is a pure software-defined network (SDN). InfiniBand switches, for example, do not require an embedded server within every switch appliance for managing the switch and for running its operating system (as needed in the case of other networks). This makes InfiniBand a leading cost-performance network fabric compared to Ethernet or any proprietary network out there. It also enables unique technology innovations such as In-Network Computing, which means that data calculations get performed on the data as it is being transferred in the network. An important example is the Scalable Hierarchical Aggregation and Reduction Protocol (SHARP)? technology, which has demonstrated great performance improvements for scientific and deep learning application frameworks.
Centralized management – you can manage, control and operate the InfiniBand network from a single place. You can also design and build any sort of network topology and customize and optimize the data center network for its target applications. There is no need to create multiple and different switch boxes for the different parts of the network, and there is no need to deal with so many complex network algorithms. The philosophy behind InfiniBand technology is to improve performance on the one side and to reduce OPEX on the other.
Last but not least, InfiniBand is a standard technology ensuring backward and forward compatibility, and is open source with open APIs. So by carrying software from one generation to the next, you protect investments. And unlike proprietary networks that require to invent the same wheel over and over again, InfiniBand enjoys the support of a large software eco-system and rich set of software frameworks.
InfiniBand-connected data centers can of course be easily connected to external Ethernet networks via InfiniBand-to-Ethernet (200 nano-second) low-latency gateways. InfiniBand also offers long-reach connectivity from tens-to-thousands of miles, enabling remote data centers to connect to each other.
The InfiniBand Trade Association (IBTA) has just released an update to the InfiniBand roadmap, calling out the future generations of InfiniBand illustrated in Figure 1.
A typical InfiniBand adapter or switch port includes 4 differential serial pairs, also referred to as an InfiniBand 4X port. The latest InfiniBand roadmap specifies NDR 400 gigabit per second (Gb/s) for an InfiniBand 4X port as the next speed, followed by XDR 800Gb/s, and then GDR 1.6 terabit per second (1600Gb/s). This roadmap is the most aggressive interconnect roadmap in the industry, targeting to sustain the generation-ahead advantage, and to provide the needed data speeds for the future compute and data-intensive applications.
A key technology enabling high supercomputing performance and scalability is In-Network Computing. Engines of In-Network Computing refer to pre-configured or programmable computing engines located on the datapath of network adapters or switches. These engines can process data or perform pre-defined algorithmic tasks on the data as it is being transferred within the network. Two examples of such engines are InfiniBand hardware MPI tag matching and InfiniBand Scalable Hierarchical Aggregation and Reduction Protocol (SHARP).
SHARP has been described in multiple earlier publications, including recently at ISC’20 in a paper titled “Scalable Hierarchical Aggregation and Reduction Protocol (SHARP) Streaming-Aggregation Hardware Design and Evaluation” by Richard L. Graham, Lion Levi, Devendar Burredy, Gil Bloch, Gilad Shainer, David Cho, George Elias, Daniel Klein, Joshua Ladd, Ophir Maor, Ami Marelli, Valentin Petrov, Evyatar Romlet, Yong Qin, and Ido Zemah.
The InfiniBand hardware MPI tag matching technology is illustrated in Figure 2.
The Message Passing Interface (MPI) standard allows for matching messages to be received based on tags embedded in the message. Processing every message to evaluate whether its tags match the conditions of interest can be time consuming and wasteful.
MPI send/receive operations require matching source and destination message parameters to deliver data to the correct destination. The order of matching must follow the order in which sends and receives are posted. The key challenges for providing efficient tag-matching support include managing the metadata needed for tag matching, temporary copies of data to minimize the latency between tag-matching and data delivery, keeping track of posted receives that have not been matched, unexpected message arrivals, and overlapping tag-matching and the associated data delivery with on-going computation.
Support for asynchronous hardware-based tag matching and data delivery is provided by HDR InfiniBand ConnectX-6 network adapters and beyond. Network hardware-based tag matching reduces the latency of multiple MPI operations while also increasing overlap between MPI compute and communication, as shown in Figure 3.
The Ohio State University MVAPICH team has demonstrated a 1.8X performance increase with InfiniBand hardware tag matching. The team has also demonstrated a 1.4X performance increase for 3DStencil applications at 128 nodes on the Texas Advance Compute Center Frontera supercomputer.
The suite of InfiniBand In-Network Computing engines does not exist in any other network, whether it is the long-present Ethernet or proprietary networks such as Omnipath, Aries, or Slingshot (referred to as “HPC Ethernet” for marketing purposes). So while InfiniBand delivers many advantages such as high data throughput, extremely low latency, and advanced adaptive routing and congestion control mechanisms, it is InfiniBand’s In-Network Computing technology – which transforms the InfiniBand network into a data processing unit – that supplies the main reason for the growing usage of InfiniBand in supercomputing, deep learning and large scale cloud platforms.
??? Engineer & Manufacturer ?? | Internet Bonding routers to Video Servers | Network equipment production | ISP Independent IP address provider | Customized Packet level Encryption & Security ?? | On-premises Cloud ?
1 年Fancy Wang You talked about InfiniBand's remarkable capabilities, especially in the context of supercomputing and data-intensive applications. Given the rapid advancements in AI and machine learning, how do you see InfiniBand evolving to address the unique networking demands of future quantum computing applications? Quantum computing presents distinct challenges and opportunities, and I'm curious about your insights on how InfiniBand might play a role in this emerging field.