AI Requirements for Datacenter Networking
Author: Pete Welcher. Coauthor: Brad Gregory.
This blog is a sequel to Brad Gregory’s introduction to Typical AI Network Traffic Patterns. Brad’s blog covers what AI training and inferencing each need from a datacenter network infrastructure at a high-level. Intent: executive summary.
This blog gets more technical, while briefly covering what the major datacenter switch vendors recommend for infrastructure.
The most stringent near-term datacenter demands are the high performance for LLM training. The recommended designs can support inferencing in the datacenter as well, at least in the short term.
The reason I said “in the short term” is that, as Brad notes, future edge inferencing for real-time applications of AI control and agentic AI may require fast low-latency WAN connections, per various sources. Other use of AI models, may not have such stringent timing requirements.
Fun fact: Concerning low-latency WAN/Edge, the speed of light is a limiting factor. The stock market trading networks shifted to wireless/microwave rather than fiber optic networking because the speed of light is faster in air than fiber (294,000 km/sec vs. 200,000, per Google search).
For what it’s worth, Google search does come up with multiple article titles such as “The Future of Inferencing is at the Edge”.
This blog is a bit long, and is structured as follows:
For your visual amusement, here’s an image ChatGPT/DALL-E generated from the prompt “Create image showing a datacenter network supporting AI training”.
Why Build Your Own?
Buying AI services from a major vendor (Cloud or AI vendor) is the fastest way to get started. Especially given how costly it can be, and the time it takes, to get an AI section of datacenter built. Even more, when you consider that datacenters seem to be perennially lagging power and cooling, as those requirements continue rapid growth. I’ve seen enough sparsely populated racks over the years due to such issues. And to weight considerations too!
So buying AI datacenter capacity may be needed in the short term anyway, while your datacenter buildout takes place.
It may make good sense to fund AI as OpEx in the short term, to get a better handle on your organization’s needs, funding, etc. Aka “stall for better data”.
How strongly do you feel AI’s role will be for your organization, going forward? Do the executives agree, and will they fund it?
In the long run, it may be less costly to build your own capacity. It’s a heck of a commitment: AI datacenters and networking are NOT cheap. And take time to build. Although your needs may not require building or acquiring an entire nuclear power plant, like some of the AI and Cloud providers are reportedly doing.
DeepSeek’s Impact
DeepSeek has potentially impacted the business case.
The latest I’ve seen says DeepSeek may or may not have fudged the performance specs. I saw some coverage stating why there was suspicion, but haven’t seen anything I regard as definitive. Perhaps because only the Chinese know for sure?
My impression is DeepSeek still may have been able to get comparable results at significantly lower costs, just not as spectacularly lower as before. (Cf. online threads starting around 2/2/2025.)
If LLM’s can be trained much more cheaply does that make it more attractive for businesses to do AI in-house rather than with a provider (CapEx vs OpEx)?
It does reduce the barrier to entry/competition. As articles have noted.
AI-aaS still may have a lot of impact for those who don’t/can’t hire to do in-house training, etc. Or don’t want to fund the infrastructure for in-house training, just for inferencing?
Did the US AI firms not optimize, viewing the high cost as barrier to entry/competition PLUS (here’s the cynical part) a way to attract higher amounts of capital? Thereby funding later work to optimize performance?
There’s a thought I had, then saw more or less reflected someone’s comment about DeepSeek. The thought: Will AI discover modular "small targetted LLM's" to replace the one-giant-LLM-that does it all? Apologies, I can’t find "someone's" blog or article I’d read that said that.
Merged Vendor Notes
There were a lot of similarities across vendors as to AI datacenter requirements. The main differences were touting their relevant datacenter switches.
Even if you prefer one vendor, reading the literature from the others might still be informative!
Here are the important common themes, broken out into topical areas:
AI/ML infrastructure requirements:
Use RoCEv2 (RDMA over Converged Ethernet v2):
Building Lossless Networks:
Congestion Management:
Network Visibility and Automation:
Network Design Examples:
Per Vendor Highlights
Vendor-specific congestion avoidance mechanisms were noted above.
Other per-vendor highlights follow.
Arista:
Arista noted that Meta is using Arista for AI deployments
Noteworthy: Arista recently announced their smaller switch “distributed switch” technology with virtual stacking. This seems like a very interesting approach apparently allowing you to grow AI compute clusters. It does not require stacking cables. Juniper has virtual stacking but with dedicated cabling between stack members.
See the Arista 7060X6 switch link below for lots of technical details.
See also the AI Network WP (White Paper) link. It loosely sketches out several fabric designs with different switch sizes and scales.
领英推荐
Cisco:
Cisco’s writeups were arguably a bit more deeply technical than the others. All had a good amount of detail. Cisco Nexus 9000 switches support the above features with intelligent buffering and telemetry.
Cisco had the most design detail, going into details of sample switch models, port counts and speeds for a couple of redundant non-blocking CLOS fabric designs. I call this “fabric ports and bandwidth math”, and intend to write a follow-on blog demonstrating it.
HPE:
HPE didn’t seem to have much of technical depth to say about high-speed networking for AI. Their marketing literature has some good business level discussion of general requirements, e.g. quality of data, security, etc. I.e. less technical, more management-directed content. I poked around their website some but did not find more technical content, perhaps just missed it. Their documents ultimately ended up with more of a compute/storage focus. E.g. their “data fabric”. That’s outside the present document’s scope and perhaps not very relevant to AI, other than perhaps for storing massive amounts of AI training materials.
Juniper:
I treated Juniper as separate from HPE, since various pundits think the HPE acquisition will fall through.
Juniper had good details re congestion mechanisms, perhaps in a bit more depth than the others.
Juniper slammed Infiniband, positioning RoCE/Ethernet as open technology.
They stated that their solution has 3 foci:
Juniper then elaborated on this, mentioning their chipsets, congestion controls, Apstra blueprints. And noting Apstra’s support for Nvidia rail-optimized designs.
I have provided a second Juniper link below, for a document that goes into a lot more detail, including what I call “fabric ports and bandwidth math”, worked out in detail. That does get rather technical, lots of details!
NOTE: Juniper presented a lot of great AI datacenter content at Cloud Field Day 20, which I was a delegate at. Their linked documents below cover some of the same material, but the recorded videos go deeper. Highly recommended: follow the link below.
Nokia:
Nokia was unique in supporting both Infiniband or Ethernet, likely based on their customer base. The others recommend Ethernet as open, simpler, and more flexible. And what they have in their inventory! Some also mention Infiniband in passing.
Nokia re Infiniband: traditional for RDMA, but RoCEv2 provides RDMA over Ethernet w/ Infiniband payload … potential of ultra Ethernet transport.
Nokia also mentioned their NVLink “infinity fabric” for GPU to GPU communication, vs. frames going to leaf and back (higher latency but cheaper).
Nokia also discussed rail-optimized versus CLOS topologies.
Nokia advised: technology is fast evolving in this space: collaborate with friends. Also avoid snowflake designs, look for holistically optimized designs.
Nokia went briefly into some of the non-network planning and design topics, including: preparation for buildout, purchase, power, land, cooling, ease of operations, staffing, automation, toolchain, future growth.
Design Diagrams Galore
To their credit, all the vendors got down to brass tacks with topology diagrams, albeit in various degrees of detail. As noted above, some even showed various size datacenters with specific switch models (Arista, Cisco).
The good news there is that some (Cisco) even did the port counting for how many spine and leaf switches and how many links between them. I call this “port and bandwidth math”.
Since this blog is already too long, I’m considering covering “port/bandwith math” in a follow-on blog. The details are important!
Conclusions
Traditional Networking and Switch vendors prefer fabric topologies and high-speed Ethernet switches. Little surprise there? The key point to that is consistent simple design that scales up well.
The Cisco Blueprint link below provides extensive discussion of most of the factors mentioned above, along with several fabric design diagrams. The Arista AI Networking document below has some good topology diagrams and discussion.
Latency considerations mean that at most a two-layer spine-leaf network topology should be used, unless tremendous scale is needed of course.
Where possible, using a single (possibly very large) switch for the back end network minimizes latency: single hop between XPU’s.
The front end needs to support control/management traffic and communications into the training cluster, unless out of band management is used of course.
Arista’s virtual stacking technology provides an interesting design alternative to using big chassis switches for the spine role.
Nokia is a bit more agnostic re datacenter technologies (Infiniband versus high-speed and ultra Ethernet) and perhaps chip/hardware-centric approaches as high-performance alternative. Note their historical telecom focus. The Nokia link below is fairly generic, but their presentation deck goes into a lot more good detail including fabric diagrams.
Concerning Infiniband vs High-Speed Ethernet, personally, I think simplicity is good. The fewer different technologies used, the simpler buildout and operations will be.
Note: For both Arista and Cisco, I did a small experiment after taking summary notes about their documents. ChatGPT did a very good job of succinctly summarizing the key document in both cases. For Brad’s predecessor blog, we saw a ChatGPT summary that got the main points right but was way too verbose.
Links
Note: The first Nokia link above has links to short videos, including ones about topics like RoCEv2, and PFC/ECN for congestion control.
Somewhat Related Links
Miscellany
Reminder: you may want to check back on my articles on LinkedIn to review any comments or comment threads. They can be a quick way to have a discussion, correct me, or share you perspectives on technology.
Hashtags: #PeterWelcher #BradGregory #CCIE1773 #AINetworking #AIDataCenter
FTC disclosure statement: https://www.dhirubhai.net/pulse/ftc-disclosure-statement-peter-welcher-y8wle/ ? ?
Twitter: @pjwelcher
LinkedIn: Peter Welcher, https://www.dhirubhai.net/in/pjwelcher/
Mastodon: @[email protected]
? ?