To InfiniBand, maybe beyond?
https://www.youtube.com/watch?v=pKXDVsWZmUU

To InfiniBand, maybe beyond?

Nvidia's latest roadmap was teased at Computex in Taiwan last month. Whilst details were a little light on PFLOPS and TDP for either the GPU or CPU, we did get some interesting information for the next-gen products.


  • GPU: Rubin (HBM3e to HBM4 memory) - TSMC 3N process
  • CPU: Vera (NVIDIA's 2nd gen ARM processor) - TSMC 3N process
  • Interconnect: NVLink6 (2x performance to 3600 GB/sec)
  • NIC: ConnectX9 (2x speed to 1.6Tb/sec)
  • Switch: SpectrumX1600 (2x speed to support CX9 NICs)

?

NVIDIA appear to have moved to a tick-tock approach to releases, something Intel famously developed before their own fabs got stuck on 14nm for 6 years (2016 to 2021).

https://en.wikipedia.org/wiki/Tick%E2%80%93tock_model

?

Essentially a new architecture every 2 years, with a process improvement (node reduction, memory upgrade, both/other), they are calling Ultra, squeezed in every other year.??

  • For Hopper, the H200 didn't get that nomenclature, however that would essentially be Hopper-Ultra for the memory improvements (141GB memory and 4.8 TB/sec bandwidth).
  • For Blackwell, the B200 will be Blackwell-Ultra and increases memory from 8Hi to 12Hi, so expect ~50% more memory and increases to bandwidth again
  • For Rubin, that moves to HMB4 and 8Hi memory, Rubin-Ultra increases that to 12Hi, and assume the similar 50% memory capacity, and bandwidth increases again.

?

Now, whilst most (including me!) are looking at Rubin and Vera, I noticed something about the networking side of things that doesn't appear to have gotten any coverage.? Let's look at that switch and network card…

SWITCHES

Ethernet - Spectrum-X800 (2024) @ 400G with BlueField3 DPU

Ethernet - Spectrum-X800 Ultra (2025) @ 800G with ConnectX8 NIC

IB/Ethernet - Spectrum-X1600 @ (2026) 1600G with ConnectX9 NIC

That last one is noteworthy.? There's no next-gen Quantum-2 or BlueField3.

?

Is NVIDIA converging their InfiniBand and Ethernet switches into one, and abandoning BlueField??

?

NETWORK CARDS

What happened to BlueField-3X and 4?

Another piece of the puzzle is that Jensen's presentation doesn't have a roadmap for the BlueField DPU beyond the current BlueField3, announced at GTC in 2021.

A little light research doesn't yield much for a next-gen BlueField, other than what Wikipedia expects (BlueField-4 @ 800G) https://en.wikipedia.org/wiki/Nvidia_BlueField however this slide from Patrick Kennedy at ServeTheHome shows that there is/was plans for a BlueField-3X and 4, however the speed was pegged at 400G with 'only' improvements to on-device processing.?

https://www.servethehome.com/nvidia-shows-dpu-roadmap-combining-arm-cores-gpu-and-networking/

An updated slide from Dec 2023, apparently from NVIDIA, on Wccftech from Dec 2023, pushes BlueField and Quantum updates out to 2H 2024.

https://wccftech.com/nvidia-vera-rubin-next-gen-hpc-ai-gpu-architecture-2025/


Since we're midway through 2024, with an updated presentation sans DPU and dedicated InfiniBand switch, it's possible these have been abandoned for a Spectrum+ConnectX future...


Thanks for staying informed with our latest insights on Infrastructure as a Newsletter. You can also join the conversation on my podcast, Tech Insider, available on YouTube and wherever you get your pods from.


If you have any questions or would like to discuss solutions for your specific project, connect with me directly.

Contact Us

?

要查看或添加评论,请登录

Nick Hume的更多文章

  • Behind the Curtain: AWS re:Invent 2024 Highlights

    Behind the Curtain: AWS re:Invent 2024 Highlights

    Expanding on my post from last week, it was great to see AWS leaning back into their engineering roots at re:Invent…

    3 条评论
  • OCP Global Summit 2024 Series

    OCP Global Summit 2024 Series

    For the final piece of the Global Summit wrap up, I focus on Networking, both inside the server and between racks, and…

  • OCP Global Summit 2024 Series

    OCP Global Summit 2024 Series

    We've touched on the power innovations at the summit, so obviously, the next logical step is to talk about cooling…

    2 条评论
  • OCP Global Summit 2024 Series

    OCP Global Summit 2024 Series

    Originally planned as a two-part reflection, my series from the fantastic OCP Summit has grown into a series! Up next:…

    2 条评论
  • OCP Global Summit 2024 Series

    OCP Global Summit 2024 Series

    It’s been a busy conference season, with the AI Hardware and Edge AI Summit, Yotta 2024, and OCP’s Global Summit all…

    3 条评论
  • AI for real life

    AI for real life

    As I’ve been busy with my day job(s) and various projects, like the Tech Insider Podcast, I haven’t put my hands to the…

    1 条评论
  • Apple, not Artificial, Intelligence

    Apple, not Artificial, Intelligence

    Just last month, Apple hosted their yearly WWDC - an event where they showcase all the updates to their platforms…

  • Oh great, another podcast...

    Oh great, another podcast...

    As you may have seen (or heard my "Ausmerican" accent) recently, I've started a podcast, and wanted to share a little…

    2 条评论
  • OCP 2024 Regional Summit wrap

    OCP 2024 Regional Summit wrap

    The Open Compute Project (OCP) Regional Summit was hosted in Lisbon, Portugal last month, the 5th (and largest)…

  • Here come the Inferencing ASIC's

    Here come the Inferencing ASIC's

    The tidal wave of Generative AI (GenAI) has mostly consisted of training large language models (LLM's), like GPT-4, and…

    25 条评论