登录查看更多内容

Top 10 Advantages of InfiniBand

Serenity H

发布日期: 2022年12月30日

InfiniBand (abbreviated as IB) is a computer network communication standard for high-performance computing that provides extremely high throughput and low latency for computer-to-computer data interconnection.

In the latest Top 500 list of the world’s most powerful supercomputers, #InfiniBand networks once again topped the list of supercomputer interconnect devices with absolute numbers and performance advantages, a significant increase from the previous list. Throughout this list, the following three trends can be summarized.

Supercomputers based on InfiniBand networks are significantly ahead of other network technologies with 197 units. InfiniBand-based supercomputers lead the Top 100 systems in particular, and InfiniBand networks have become the standard for performance-conscious supercomputers.
NVIDIA networking products are the dominant interconnects in the Top500 systems, with more than two-thirds of the supercomputers using NVIDIA networking, and the performance and technology leadership of NVIDIA networking has been widely recognized.
It is also worth noting that InfiniBand networks are widely used not only in the traditional #HPC business, but also in enterprise-class data centers and public clouds. NVIDIA Selene, the number one performance enterprise supercomputer, and Microsoft’s Azure public cloud are both leveraging InfiniBand networks to deliver superb business performance.

NVIDIA Selene, the best-performing enterprise supercomputer available, and Microsoft’s Azure public cloud are leveraging InfiniBand networks to leverage their superb business performance.

Whether it is the evolution of data communication technology, the innovation of Internet technology, or the upgrade of visual presentation, all are thanks to more powerful computing, larger capacity and more secure storage, and more efficient network; InfiniBand network-based cluster architecture solution not only provides higher bandwidth network services, but also reduces the consumption of computing resources by network transmission load and reduces latency and perfectly integrates HPC with data centers.

Why are InfiniBand networks so highly valued in the Top 500? Its performance benefits play a decisive role. NADDOD summaries the top 10 advantages of InfiniBand as follows.

1. Simple Network Management

InfiniBand is the first network architecture that is truly designed natively for SDN and is managed by a subnet manager.

The subnet manager configures the local subnet and ensures continuous operation. All channel adapters and switches must implement an SMA that works with the subnet manager to handle the traffic. Each subnet must have at least one subnet manager for initial management and reconfiguration of the subnet when the link is connected or disconnected. An arbitration mechanism is used to select one subnet manager as the master subnet manager, while the other subnet managers work in standby mode (each subnet manager in standby mode backs up the topology information of this subnet and verifies that this subnet is operational). If the primary subnet manager fails, a standby subnet manager takes over the management of the subnet to ensure uninterrupted operation.

2. High Bandwidth

Since the birth of InfiniBand, the development of InfiniBand network rate has been faster than Ethernet for a long time, mainly because InfiniBand is used for interconnection between servers in high-performance computing, which requires higher bandwidth.

he abbreviations for each rate are as follows:

SDR - Single Data Rate
DDR - Double Data Rate
QDR - Quad Data Rate
FDR?- Fourteen Data Rate
EDR?- Enhanced Data Rate
#HDR?- High Dynamic Range
NDR - Next Data Rate
XDR - eXtreme Data Rate

3. CPU Offload

A key technology for accelerated computing is CPU offload, and the InfiniBand network architecture allows data to be transferred with minimal CPU resources, which is accomplished by:

Hardware offload of the entire transport layer protocol stack
Bypass kernel, zero copy
RDMA, which writes data from one server’s memory directly to another’s memory without CPU involvement

It is also possible to use GPU Direct technology, which can directly access data in GPU memory and transfer data from GPU memory to other nodes. This can accelerate computational applications such as AI, Deep Learning, etc.

4. Low Latency

This is divided into two main parts for comparison, one on the switch, as a layer 2 technology in the network transport model, Ethernet switches generally use MAC table lookup addressing and store-and-forward (some products have borrowed InfiniBand’s Cut-though technology). Due to the need to consider complex services such as IP, MPLS, QinQ and other processing, resulting in a long Ethernet switch processing process, generally in a number of us (cut-though support will be in more than 200ns), while InfiniBand switches are very simple to process at layer 2. At the NIC level, as mentioned earlier, with RDMA technology, NICs do not need to go through the CPU to forward messages, which greatly accelerates the delay of message processing in encapsulation and decapsulation, and the general InfiniBand NIC send and receive delay (write, send) is 600ns, while the send and receive delay of Ethernet-based TCP UDP applications based on Ethernet will have a send/receive delay of about 10us, a difference of more than ten times.

5. Scalability and Flexibility

A major advantage of the IB network is that a single subnet can deploy a 48,000 nodes to form a huge Layer 2 network. Moreover, IB networks do not rely on broadcast mechanisms such as ARP and do not generate broadcast storms or additional bandwidth waste.

Multiple IB subnets can also be connected via routers and switches.

IB supports multiple network topologies.

领英推荐

Cisco and Nvidia Connect to Deliver an All-in-One AI…

Data Center Knowledge 9 个月前

News – AI and 5G Advanced ready RAN Compute portfolio

Ericsson Networks 1 年前

Chuck Robbins On Cisco Splunk's AI Advantage And Why…

CRN 11 个月前

When the scale is small, it is recommended to use 2-layer fat-tree. larger scale can use 3-layer fat-tree network topology. Above a certain scale, Dragonfly+ topology can be used to save some costs.

6. QoS

How does an IB network provide QoS support if several different applications are running on the same subnet and some of them need higher priority than others?

QoS is the ability to provide different priority services for different applications, users or data flows. High-priority applications can be mapped to different port queues, and messages in the queue can be sent first.

InfiniBand implements QoS using Virtual Lanes (VLs), which are discrete logical communication links that share a physical link, each of which can support up to 15 standard virtual lanes and one management channel (VL15).

7. Network Stability and Resilience

Ideally, the network is very stable and free of failures. But long-running networks inevitably experience some failures. How does InfiniBand handle these failures and recover quickly?

NVIDIA IB solutions provide a mechanism called Self-Healing Networking, a hardware capability that is based on IB switches. Self-Healing Networking allows link failures to be recovered in just 1 millisecond, which is 5000x faster than normal recovery times.

8. Optimized Load Balancing

A very important requirement inside a high-performance data center is how to improve the utilization of the network. One way is using load balancing.

Load balancing is a routing strategy that allows traffic to be sent over multiple available ports.

Adaptive Routing is one such feature that allows traffic to be distributed evenly across switch ports. AR is supported in hardware on the switch and is managed by Adaptive Routing Manager.

When AR is on, Queue Manager on the switch monitors traffic on all GROUP EXIT ports, equalizes the load on each queue, and directs traffic to underutilized ports.AR supports dynamic load balancing to avoid network congestion and maximize network bandwidth utilization.

9. Network Computing - SHARP

IB switches also support the network computing technology, SHARP - Scalable Hierarchical Aggregation and Reduction Protocol.

SHARP is a software based on the switch hardware and is a centrally managed software package.

SHARP can offload aggregate communication that was running on CPUs and GPUs to the switch, optimizing aggregate communication, avoiding multiple data transfers between nodes, and reducing the amount of data that needs to be transferred over the network. Therefore, SHARP can greatly improve the performance of accelerated computing, based on MPI applications such as AI, machine learning, etc.

10. Support a Variety of Network Topologies

InfiniBand networks can support a very large number of topo’s, such as:

Fat Tree
Torus
Dragonfly+
Hypercube
HyperX

Support for different network topo, thus meeting different needs, such as:

Easy network scaling
Reduced TCO
Maximizing blocking ratio
Minimizing latency
Maximizing transmission distance

InfiniBand, with its unparalleled technical advantages, greatly simplifies high-performance network architecture and reduces latency caused by multi-level architectural hierarchies, providing strong support for the smooth upgrade of access bandwidth for critical computing nodes. The trend is for InfiniBand networks to enter more and more usage scenarios.

要查看或添加评论，请登录

Serenity H的更多文章

Laser Marking FAQs

2024年11月15日

Laser Marking FAQs

What is Laser Marking and How Does it Work? Laser marking refers to the process of using a laser beam to mark or label…
NCLU Smooths Networking Transition to Cumulus Linux

2019年2月19日

NCLU Smooths Networking Transition to Cumulus Linux

The communication industry is embracing Linux-based switch hardware and software to build affordable, agile and…
Open Switch—One Contributor to Open Source Network

2018年12月22日

Open Switch—One Contributor to Open Source Network

With the higher and higher demand for network agility and scalability, traditional networking has been no longer…
TCP/IP vs. OSI: What’s the Difference Between the Two Models?

2017年11月3日

TCP/IP vs. OSI: What’s the Difference Between the Two Models?

When we are talking about layer 2 switches and layer 3 switches, we are actually referring to the layers of a generic…
Network Switch, Router & Firewall—Why Need All Three?

2017年11月3日

Network Switch, Router & Firewall—Why Need All Three?

There are three basic devices that are utilized in almost every network—switch, router and firewall. They can be…
My friend took these photos at Bridlington Sea Front. What a beautiful place!

2017年7月8日

My friend took these photos at Bridlington Sea Front. What a beautiful place!
Easiest Connectivity Solutions for 25G and 100G

2017年5月27日

Easiest Connectivity Solutions for 25G and 100G

Due to the increasing number of connected devices in use and their need for fast-based data processing, data centers…
The Role of Parallel Fiber in 40GbE and Beyond

2017年5月16日

The Role of Parallel Fiber in 40GbE and Beyond

In order to meet the overwhelming trend of growing bandwidth, different standards for single-mode and multimode fibers…
CMR, CMP and LSZH MTP/MPO Cable

2017年5月11日

CMR, CMP and LSZH MTP/MPO Cable

Multifiber MTP/MPO cable is a preferable choice for high-density telecom and datacom cabling. For the outer jacket of…
Connectivity Solutions for Parallel to Duplex Optics

2017年5月4日

Connectivity Solutions for Parallel to Duplex Optics

Since we have discussed connectivity solutions for two duplex optics or two parallel optics in the last post (see…

See all articles

Top 10 Advantages of InfiniBand

Serenity H

1. Simple Network Management

2. High Bandwidth

3. CPU Offload

4. Low Latency

5. Scalability and Flexibility

领英推荐

6. QoS

7. Network Stability and Resilience

8. Optimized Load Balancing

9. Network Computing - SHARP

10. Support a Variety of Network Topologies

Serenity H的更多文章

社区洞察

其他会员也浏览了

FS & PicOS? Innovations: RoCE Lossless Network for HPC

Memory Expansion for High Performance Computing, Utilizing the CXL Interface (Part 1 of 3)

Innovative 800G Transceiver Solutions for Leading Future Networks

Revolutionizing AI/ML: Edgecore’s AGS8200 & Intel? Habana? Gaudi? 2’s Breakthrough

FibreChannel Still Winning in the Data Center

Ultra Ethernet Consortium Set to Enable Scaling of Networking Interconnects for AI and HPC

NADDOD Leads in Compatibility and Performance on Thor2 & CX7

Syrotech Networks: Pioneering Quality-Assured Optical Transceivers in India's Tech Revolution

Saturday Special - Top 5 Must Reads: 12/2/2023

Battle of the Data Center Giants: InfiniBand vs. Ethernet — Exploring the Key Differences

1. Simple Network Management

2. High Bandwidth

3. CPU Offload

4. Low Latency

5. Scalability and Flexibility

领英推荐

6. QoS

7. Network Stability and Resilience

8. Optimized Load Balancing

9. Network Computing - SHARP

10. Support a Variety of Network Topologies

Serenity H的更多文章

Laser Marking FAQs

NCLU Smooths Networking Transition to Cumulus Linux

Open Switch—One Contributor to Open Source Network

TCP/IP vs. OSI: What’s the Difference Between the Two Models?

Network Switch, Router & Firewall—Why Need All Three?

My friend took these photos at Bridlington Sea Front. What a beautiful place!

Easiest Connectivity Solutions for 25G and 100G

The Role of Parallel Fiber in 40GbE and Beyond

CMR, CMP and LSZH MTP/MPO Cable

Connectivity Solutions for Parallel to Duplex Optics

社区洞察

其他会员也浏览了

FS & PicOS? Innovations: RoCE Lossless Network for HPC

Memory Expansion for High Performance Computing, Utilizing the CXL Interface (Part 1 of 3)

Innovative 800G Transceiver Solutions for Leading Future Networks

Revolutionizing AI/ML: Edgecore’s AGS8200 & Intel? Habana? Gaudi? 2’s Breakthrough

FibreChannel Still Winning in the Data Center

Ultra Ethernet Consortium Set to Enable Scaling of Networking Interconnects for AI and HPC

NADDOD Leads in Compatibility and Performance on Thor2 & CX7

Syrotech Networks: Pioneering Quality-Assured Optical Transceivers in India's Tech Revolution

Saturday Special - Top 5 Must Reads: 12/2/2023

Battle of the Data Center Giants: InfiniBand vs. Ethernet — Exploring the Key Differences