Top 10 Advantages of InfiniBand

Top 10 Advantages of InfiniBand

InfiniBand (abbreviated as IB) is a computer network communication standard for high-performance computing that provides extremely high throughput and low latency for computer-to-computer data interconnection.

In the latest Top 500 list of the world’s most powerful supercomputers, #InfiniBand networks once again topped the list of supercomputer interconnect devices with absolute numbers and performance advantages, a significant increase from the previous list. Throughout this list, the following three trends can be summarized.

  • Supercomputers based on InfiniBand networks are significantly ahead of other network technologies with 197 units. InfiniBand-based supercomputers lead the Top 100 systems in particular, and InfiniBand networks have become the standard for performance-conscious supercomputers.
  • NVIDIA networking products are the dominant interconnects in the Top500 systems, with more than two-thirds of the supercomputers using NVIDIA networking, and the performance and technology leadership of NVIDIA networking has been widely recognized.
  • It is also worth noting that InfiniBand networks are widely used not only in the traditional #HPC business, but also in enterprise-class data centers and public clouds. NVIDIA Selene, the number one performance enterprise supercomputer, and Microsoft’s Azure public cloud are both leveraging InfiniBand networks to deliver superb business performance.

NVIDIA Selene, the best-performing enterprise supercomputer available, and Microsoft’s Azure public cloud are leveraging InfiniBand networks to leverage their superb business performance.

Whether it is the evolution of data communication technology, the innovation of Internet technology, or the upgrade of visual presentation, all are thanks to more powerful computing, larger capacity and more secure storage, and more efficient network; InfiniBand network-based cluster architecture solution not only provides higher bandwidth network services, but also reduces the consumption of computing resources by network transmission load and reduces latency and perfectly integrates HPC with data centers.

Why are InfiniBand networks so highly valued in the Top 500? Its performance benefits play a decisive role. NADDOD summaries the top 10 advantages of InfiniBand as follows.

1. Simple Network Management

InfiniBand is the first network architecture that is truly designed natively for SDN and is managed by a subnet manager.

The subnet manager configures the local subnet and ensures continuous operation. All channel adapters and switches must implement an SMA that works with the subnet manager to handle the traffic. Each subnet must have at least one subnet manager for initial management and reconfiguration of the subnet when the link is connected or disconnected. An arbitration mechanism is used to select one subnet manager as the master subnet manager, while the other subnet managers work in standby mode (each subnet manager in standby mode backs up the topology information of this subnet and verifies that this subnet is operational). If the primary subnet manager fails, a standby subnet manager takes over the management of the subnet to ensure uninterrupted operation.

No alt text provided for this image

2. High Bandwidth

Since the birth of InfiniBand, the development of InfiniBand network rate has been faster than Ethernet for a long time, mainly because InfiniBand is used for interconnection between servers in high-performance computing, which requires higher bandwidth.

No alt text provided for this image

he abbreviations for each rate are as follows:

  • SDR - Single Data Rate
  • DDR - Double Data Rate
  • QDR - Quad Data Rate
  • FDR?- Fourteen Data Rate
  • EDR?- Enhanced Data Rate
  • #HDR?- High Dynamic Range
  • NDR - Next Data Rate
  • XDR - eXtreme Data Rate

3. CPU Offload

A key technology for accelerated computing is CPU offload, and the InfiniBand network architecture allows data to be transferred with minimal CPU resources, which is accomplished by:

  • Hardware offload of the entire transport layer protocol stack
  • Bypass kernel, zero copy
  • RDMA, which writes data from one server’s memory directly to another’s memory without CPU involvement

No alt text provided for this image

It is also possible to use GPU Direct technology, which can directly access data in GPU memory and transfer data from GPU memory to other nodes. This can accelerate computational applications such as AI, Deep Learning, etc.

No alt text provided for this image

4. Low Latency

This is divided into two main parts for comparison, one on the switch, as a layer 2 technology in the network transport model, Ethernet switches generally use MAC table lookup addressing and store-and-forward (some products have borrowed InfiniBand’s Cut-though technology). Due to the need to consider complex services such as IP, MPLS, QinQ and other processing, resulting in a long Ethernet switch processing process, generally in a number of us (cut-though support will be in more than 200ns), while InfiniBand switches are very simple to process at layer 2. At the NIC level, as mentioned earlier, with RDMA technology, NICs do not need to go through the CPU to forward messages, which greatly accelerates the delay of message processing in encapsulation and decapsulation, and the general InfiniBand NIC send and receive delay (write, send) is 600ns, while the send and receive delay of Ethernet-based TCP UDP applications based on Ethernet will have a send/receive delay of about 10us, a difference of more than ten times.

No alt text provided for this image


5. Scalability and Flexibility

A major advantage of the IB network is that a single subnet can deploy a 48,000 nodes to form a huge Layer 2 network. Moreover, IB networks do not rely on broadcast mechanisms such as ARP and do not generate broadcast storms or additional bandwidth waste.

Multiple IB subnets can also be connected via routers and switches.

IB supports multiple network topologies.

No alt text provided for this image

When the scale is small, it is recommended to use 2-layer fat-tree. larger scale can use 3-layer fat-tree network topology. Above a certain scale, Dragonfly+ topology can be used to save some costs.

No alt text provided for this image

6. QoS

How does an IB network provide QoS support if several different applications are running on the same subnet and some of them need higher priority than others?

QoS is the ability to provide different priority services for different applications, users or data flows. High-priority applications can be mapped to different port queues, and messages in the queue can be sent first.

InfiniBand implements QoS using Virtual Lanes (VLs), which are discrete logical communication links that share a physical link, each of which can support up to 15 standard virtual lanes and one management channel (VL15).

No alt text provided for this image

7. Network Stability and Resilience

Ideally, the network is very stable and free of failures. But long-running networks inevitably experience some failures. How does InfiniBand handle these failures and recover quickly?

NVIDIA IB solutions provide a mechanism called Self-Healing Networking, a hardware capability that is based on IB switches. Self-Healing Networking allows link failures to be recovered in just 1 millisecond, which is 5000x faster than normal recovery times.

No alt text provided for this image

8. Optimized Load Balancing

A very important requirement inside a high-performance data center is how to improve the utilization of the network. One way is using load balancing.

Load balancing is a routing strategy that allows traffic to be sent over multiple available ports.

Adaptive Routing is one such feature that allows traffic to be distributed evenly across switch ports. AR is supported in hardware on the switch and is managed by Adaptive Routing Manager.

When AR is on, Queue Manager on the switch monitors traffic on all GROUP EXIT ports, equalizes the load on each queue, and directs traffic to underutilized ports.AR supports dynamic load balancing to avoid network congestion and maximize network bandwidth utilization.

9. Network Computing - SHARP

IB switches also support the network computing technology, SHARP - Scalable Hierarchical Aggregation and Reduction Protocol.

SHARP is a software based on the switch hardware and is a centrally managed software package.

SHARP can offload aggregate communication that was running on CPUs and GPUs to the switch, optimizing aggregate communication, avoiding multiple data transfers between nodes, and reducing the amount of data that needs to be transferred over the network. Therefore, SHARP can greatly improve the performance of accelerated computing, based on MPI applications such as AI, machine learning, etc.

No alt text provided for this image

10. Support a Variety of Network Topologies

InfiniBand networks can support a very large number of topo’s, such as:

  • Fat Tree
  • Torus
  • Dragonfly+
  • Hypercube
  • HyperX

Support for different network topo, thus meeting different needs, such as:

  • Easy network scaling
  • Reduced TCO
  • Maximizing blocking ratio
  • Minimizing latency
  • Maximizing transmission distance

No alt text provided for this image

InfiniBand, with its unparalleled technical advantages, greatly simplifies high-performance network architecture and reduces latency caused by multi-level architectural hierarchies, providing strong support for the smooth upgrade of access bandwidth for critical computing nodes. The trend is for InfiniBand networks to enter more and more usage scenarios.

要查看或添加评论,请登录

社区洞察

其他会员也浏览了