Top 10 Advantages of InfiniBand
InfiniBand (abbreviated as IB) is a computer network communication standard for high-performance computing that provides extremely high throughput and low latency for computer-to-computer data interconnection.
In the latest Top 500 list of the world’s most powerful supercomputers, #InfiniBand networks once again topped the list of supercomputer interconnect devices with absolute numbers and performance advantages, a significant increase from the previous list. Throughout this list, the following three trends can be summarized.
NVIDIA Selene, the best-performing enterprise supercomputer available, and Microsoft’s Azure public cloud are leveraging InfiniBand networks to leverage their superb business performance.
Whether it is the evolution of data communication technology, the innovation of Internet technology, or the upgrade of visual presentation, all are thanks to more powerful computing, larger capacity and more secure storage, and more efficient network; InfiniBand network-based cluster architecture solution not only provides higher bandwidth network services, but also reduces the consumption of computing resources by network transmission load and reduces latency and perfectly integrates HPC with data centers.
Why are InfiniBand networks so highly valued in the Top 500? Its performance benefits play a decisive role. NADDOD summaries the top 10 advantages of InfiniBand as follows.
1. Simple Network Management
InfiniBand is the first network architecture that is truly designed natively for SDN and is managed by a subnet manager.
The subnet manager configures the local subnet and ensures continuous operation. All channel adapters and switches must implement an SMA that works with the subnet manager to handle the traffic. Each subnet must have at least one subnet manager for initial management and reconfiguration of the subnet when the link is connected or disconnected. An arbitration mechanism is used to select one subnet manager as the master subnet manager, while the other subnet managers work in standby mode (each subnet manager in standby mode backs up the topology information of this subnet and verifies that this subnet is operational). If the primary subnet manager fails, a standby subnet manager takes over the management of the subnet to ensure uninterrupted operation.
2. High Bandwidth
Since the birth of InfiniBand, the development of InfiniBand network rate has been faster than Ethernet for a long time, mainly because InfiniBand is used for interconnection between servers in high-performance computing, which requires higher bandwidth.
he abbreviations for each rate are as follows:
3. CPU Offload
A key technology for accelerated computing is CPU offload, and the InfiniBand network architecture allows data to be transferred with minimal CPU resources, which is accomplished by:
It is also possible to use GPU Direct technology, which can directly access data in GPU memory and transfer data from GPU memory to other nodes. This can accelerate computational applications such as AI, Deep Learning, etc.
4. Low Latency
This is divided into two main parts for comparison, one on the switch, as a layer 2 technology in the network transport model, Ethernet switches generally use MAC table lookup addressing and store-and-forward (some products have borrowed InfiniBand’s Cut-though technology). Due to the need to consider complex services such as IP, MPLS, QinQ and other processing, resulting in a long Ethernet switch processing process, generally in a number of us (cut-though support will be in more than 200ns), while InfiniBand switches are very simple to process at layer 2. At the NIC level, as mentioned earlier, with RDMA technology, NICs do not need to go through the CPU to forward messages, which greatly accelerates the delay of message processing in encapsulation and decapsulation, and the general InfiniBand NIC send and receive delay (write, send) is 600ns, while the send and receive delay of Ethernet-based TCP UDP applications based on Ethernet will have a send/receive delay of about 10us, a difference of more than ten times.
5. Scalability and Flexibility
A major advantage of the IB network is that a single subnet can deploy a 48,000 nodes to form a huge Layer 2 network. Moreover, IB networks do not rely on broadcast mechanisms such as ARP and do not generate broadcast storms or additional bandwidth waste.
Multiple IB subnets can also be connected via routers and switches.
IB supports multiple network topologies.
领英推荐
When the scale is small, it is recommended to use 2-layer fat-tree. larger scale can use 3-layer fat-tree network topology. Above a certain scale, Dragonfly+ topology can be used to save some costs.
6. QoS
How does an IB network provide QoS support if several different applications are running on the same subnet and some of them need higher priority than others?
QoS is the ability to provide different priority services for different applications, users or data flows. High-priority applications can be mapped to different port queues, and messages in the queue can be sent first.
InfiniBand implements QoS using Virtual Lanes (VLs), which are discrete logical communication links that share a physical link, each of which can support up to 15 standard virtual lanes and one management channel (VL15).
7. Network Stability and Resilience
Ideally, the network is very stable and free of failures. But long-running networks inevitably experience some failures. How does InfiniBand handle these failures and recover quickly?
NVIDIA IB solutions provide a mechanism called Self-Healing Networking, a hardware capability that is based on IB switches. Self-Healing Networking allows link failures to be recovered in just 1 millisecond, which is 5000x faster than normal recovery times.
8. Optimized Load Balancing
A very important requirement inside a high-performance data center is how to improve the utilization of the network. One way is using load balancing.
Load balancing is a routing strategy that allows traffic to be sent over multiple available ports.
Adaptive Routing is one such feature that allows traffic to be distributed evenly across switch ports. AR is supported in hardware on the switch and is managed by Adaptive Routing Manager.
When AR is on, Queue Manager on the switch monitors traffic on all GROUP EXIT ports, equalizes the load on each queue, and directs traffic to underutilized ports.AR supports dynamic load balancing to avoid network congestion and maximize network bandwidth utilization.
9. Network Computing - SHARP
IB switches also support the network computing technology, SHARP - Scalable Hierarchical Aggregation and Reduction Protocol.
SHARP is a software based on the switch hardware and is a centrally managed software package.
SHARP can offload aggregate communication that was running on CPUs and GPUs to the switch, optimizing aggregate communication, avoiding multiple data transfers between nodes, and reducing the amount of data that needs to be transferred over the network. Therefore, SHARP can greatly improve the performance of accelerated computing, based on MPI applications such as AI, machine learning, etc.
10. Support a Variety of Network Topologies
InfiniBand networks can support a very large number of topo’s, such as:
Support for different network topo, thus meeting different needs, such as:
InfiniBand, with its unparalleled technical advantages, greatly simplifies high-performance network architecture and reduces latency caused by multi-level architectural hierarchies, providing strong support for the smooth upgrade of access bandwidth for critical computing nodes. The trend is for InfiniBand networks to enter more and more usage scenarios.