登录查看更多内容

The Gaudi 3 (Intel AI) Cluster is Pretty Neat

Tony Grayson

Defense, Business, and Technology Executive | VADM Stockdale Leadership Award Recipient | Ex-Submarine Captain | LinkedIn Top Voice | Author | Top 10 Datacenter Influencer | Veteran Advocate |

发布日期: 2024年4月17日

AI accelerators, such as Intel's Gaudi 3, are crucial for enhancing AI training and inference capabilities, but their effectiveness greatly hinges on the architecture of the clusters they are part of. The decision by the Gaudi team to integrate Ethernet, enhanced with RDMA and RoCE protocol extensions, marks a strategic divergence from traditional InfiniBand usage.

Traditionally, InfiniBand has been the go-to choice for building high-performance computing environments due to its low latency and high throughput. However, Intel's decision to use Ethernet with RDMA and RoCE for the Gaudi 3 accelerators hinges on several strategic factors:

Cost-Effectiveness: Ethernet hardware and management tools are less expensive than those for InfiniBand, which can reduce overall deployment costs.
Broader Compatibility and Flexibility: Ethernet is ubiquitous in data centers, and using it allows for greater flexibility in integrating with existing network infrastructures without the need for specialized hardware.
Advanced Ethernet Capabilities: With the advancements in Ethernet technologies, including the introduction of RDMA over Converged Ethernet (RoCE), Ethernet now supports many of the high-performance features traditionally exclusive to InfiniBand, such as low-latency and lossless data transfer.

Each Gaudi 3 node comprises eight-way configurations capable of delivering up to 14.7 petaflops at FP8 precision. These nodes utilize OSFP links that are essential for high-speed data transmission, necessitating the use of retimers to handle doubled speeds effectively. The internal configuration of the Gaudi 3 includes 24 ports, with 21 dedicated to creating a dense, all-to-all network essential for high-bandwidth communications between accelerators.

When scaling up, these nodes are grouped into sub-clusters. A typical sub-cluster might consist of sixteen Gaudi 3 nodes. The networking within these sub-clusters employs high-performance switches like Broadcom's Tomahawk 5 StrataXGS, which supports up to 51.2 Tb/sec. These switches are divided into two halves: one interfacing directly with the servers at 800 Gb/sec and the other connecting upwards to the spine network, ensuring robust scalability and redundancy.

?ukasz ?ukowski 7 个月前

FS & PicOS? Innovations: RoCE Lossless Network for HPC

FS.com 3 周前

Exploring the Value of Intel? Accelerator Engines

Prowess Consulting 12 个月前

For larger deployments, the network architecture expands into multiple sub-clusters. To scale to 4,096 Gaudi 3 accelerators across 512 server nodes, the design links 32 sub-clusters. This is achieved by interconnecting 96-leaf switches with three banks of sixteen spine switches. This arrangement allows for multiple paths for inter-node communication, which is critical for maintaining high levels of data integrity and system availability across extensive computing tasks.

In the context of inference, where rapid response times are crucial, integrating Ethernet with RDMA and RoCE in Gaudi 3 accelerators significantly enhances data throughput and latency, directly impacting the performance of real-time AI applications. This network setup allows efficient data exchange across nodes, which is crucial for deploying models that require real-time inference, like those used in video analysis and online transaction systems.

Furthermore, the Gaudi 3 has demonstrated significant advantages over Nvidia's H100 in performance comparisons. For instance, in training complex AI models like Llama2 and GPT-3, the Gaudi 3 shows improvements ranging from 1.4X to 1.7X. These gains underscore the effective use of Ethernet in enhancing data flow between nodes, which is critical for tasks that require extensive data sharing, such as training large AI models.

By integrating advanced Ethernet capabilities instead of relying on InfiniBand, Intel's Gaudi 3 AI accelerators reflect a strategic adaptation to modern data centers' evolving demands and infrastructures. This approach ensures compatibility with broader network environments and enhances the cost-effectiveness and scalability of AI operations, paving the way for more widespread adoption and deployment of AI technologies.

Datacenters, Network, and More

5,107 位关注者

Ken C.

7 个月

From an article elsewhere: "Intel's Gaudi 3 may be a potentially attractive alternative to the H100 if Intel can hit an ideal price (which Intel has not provided, but an H100 reportedly costs around $30,000–$40,000) and maintain adequate production. AMD also manufactures a competitive range of AI chips, such as the AMD Instinct MI300 Series, that sell for around $10,000–$15,000." https://arstechnica.com/information-technology/2024/04/intels-gaudi-3-ai-accelerator-chip-may-give-nvidias-h100-a-run-for-the-money/ Those prices need to be chopped by AT LEAST an order of magnitude if there will be any hope of widespread involvement from truly academic researchers; and not just the academic PI [Principal Investigators] at the top of the grant-funding pile) Absent that, this societal impact from this tech will be exclusively decided in well-funded tech firms. Can anyone think of any adverse consequences happening from tech giants deciding widespread societal impact?

1 次回应

查看更多评论

要查看或添加评论，请登录

查看全部

The Gaudi 3 (Intel AI) Cluster is Pretty Neat

Tony Grayson

Defense, Business, and Technology Executive | VADM Stockdale Leadership Award Recipient | Ex-Submarine Captain | LinkedIn Top Voice | Author | Top 10 Datacenter Influencer | Veteran Advocate |

领英推荐

Datacenters, Network, and More

5,107 位关注者

更多精彩文章

社区洞察

其他会员也浏览了

FibreChannel Still Winning in the Data Center

The Rise of the DPU

Nvidia's AI-Led Rise Mirrors Cisco's 90s Internet-Led Surge

What is Single Pair Ethernet? And How to Test and Certify it!

Intel's New 3rd Gen Xeon Scalable processors – the only x86 Data Center Processors with Built-In AI

Intel Pushes the High Performance Computing Envelope with the Intel? Xeon? 4th Gen.

Revolutionizing AI/ML: Edgecore’s AGS8200 & Intel? Habana? Gaudi? 2’s Breakthrough

Evolution of Data Center Networking Designs and Systems for AI Infrastructure – Part 2

Enfabrica: Accelerating AI GPU Communication

Global Computer CPU Market Share | Thriving worldwide | Insights

领英推荐

Datacenters, Network, and More

5,107 位关注者

Thinking Sketchy: How Life as a Submariner Teaches Adaptability, Observation, and Creative Problem-Solving

2024年11月15日

Adapt and Overcome: Why Diverse Perspectives Are the Military’s Best Weapon

2024年11月15日

Protecting Guam’s Digital Infrastructure: A Vital Line in Pacific Security

2024年11月15日

Guam: The Strategic Cornerstone of U.S. Defense in the Pacific

2024年11月14日

Why AI is Trending Local: Solving the Bandwidth Crisis for Image and Video Processing

2024年11月14日

The Path to AI Monopoly: Creating Value Where Others Can’t Compete

2024年11月7日

Navigating Financial Barriers in AI-as-a-Service: Capital Costs as a Competitive Divide for Startups and Hyperscalers

2024年11月6日

Balancing Speed and Innovation: Why PWR SMRs Are Today’s Quickest Nuclear Solution While Gen IV Reactors Shape the Future

2024年11月5日

The Dance of Sales: Unraveling Human Nature in Every Transaction

2024年10月23日

HALEU Bottleneck: The Hidden Challenge Delaying the Next Generation of Nuclear Reactors

2024年10月22日

社区洞察

其他会员也浏览了

FibreChannel Still Winning in the Data Center

The Rise of the DPU

Nvidia's AI-Led Rise Mirrors Cisco's 90s Internet-Led Surge

What is Single Pair Ethernet? And How to Test and Certify it!

Intel's New 3rd Gen Xeon Scalable processors – the only x86 Data Center Processors with Built-In AI

Intel Pushes the High Performance Computing Envelope with the Intel? Xeon? 4th Gen.

Revolutionizing AI/ML: Edgecore’s AGS8200 & Intel? Habana? Gaudi? 2’s Breakthrough

Evolution of Data Center Networking Designs and Systems for AI Infrastructure – Part 2

Enfabrica: Accelerating AI GPU Communication

Global Computer CPU Market Share | Thriving worldwide | Insights