OCP Global Summit 2024 Series

OCP Global Summit 2024 Series

It’s been a busy conference season, with the AI Hardware and Edge AI Summit, Yotta 2024, and OCP’s Global Summit all taking place in the past month or so. The OCP Global Summit has become a personal favorite of mine; the diversity of presenters and industry verticals is unmatched, along with more focus on technical and engineering talks, rather than sales pitches.

As in previous years, I’ve been reflecting on the conference and poring over the 22 tracks and more than 430 presentations — thousands of hours of content!

Having spent the week prior at Yotta in Vegas, it was great to delve deeper into the engineering and technical depth in the workshops and expo hall, where OCP really shines.

Attendance this year jumped from 4,500 to 7,500, and booths in the expo went from 70 to 100. It was BIG, and while there’s plenty that I find interesting, I'm merely scratching the surface on topics and depth.?

The themes, like many conferences in the past two years, were AI with generous servings of power, heat, and cooling/sustainability.

Given the wealth of content, I’ve decided to share my reflections across two articles.

In this first part, I’ll share my initial impressions of the keynotes, and in the next, I’ll dive deeper across power challenges, cooling updates, and (for me) highly anticipated interconnectivity news, specifically UALink and UEC (Ultra Ethernet Consortium).

?

Keynotes Highlights

Meta’s Advancements in AI Infrastructure

Omar Baldonado from Meta kicked things off sharing their latest AI infrastructure initiatives. Meta is harnessing AMD’s MI300X inference systems, built on their Grand Teton platform. They’re also introducing a new NIC ASIC in collaboration with Marvell, aiming to boost network performance and efficiency. On the networking front, they’re utilizing 51.2 Tbps switches powered by 博通 ’s Tomahawk and 思科 technologies, coupled with lossless fabric solutions from Arista Networks and Broadcom’s Ramon and Jericho chips (respectively). It’s clear that Meta is pushing the envelope to support the massive computational demands of AI workloads, and awesome they continue to share their experiences with the wider OCP community.

?

NVIDIA’s NVL72 Design Contribution

Ian Buck from 英伟达 took the stage to unveil their NVL72 design contribution. This includes a hefty 1,400-amp bus bar, a blind-mate manifold, and a modular compute tray design. These innovations are geared toward enhancing power delivery and cooling efficiency, critical factors as we scale up AI and high-performance computing systems. The modularity and robustness of their design signal NVIDIA’s commitment to solving the practical challenges that come with increasing computational densities.

?

An Unlikely Alliance: Intel and AMD

In what felt like a moment when hell froze over, 英特尔 and AMD announced the formation of an x86 Ecosystem Advisory Group. This collaboration is particularly intriguing given their historic rivalry. However, with the rising tide of ARM-based chips like NVIDIA’s Grace and offerings from Ampere exerting pressure on the x86 stronghold, it seems they’ve recognized the need to join forces. This alliance could be a strategic move to bolster the x86 ecosystem against the growing momentum of ARM architectures in data centers.

?

GEICO’s Cloud Repatriation Journey

A real-world data driven presentation came from GEICO , who discussed their decision to reverse their “All-In” cloud strategy that began back in 2014. They undertook a thorough assessment of 30,000 instances, accounting for a $300 million spend. By repatriating certain workloads back on-premises, they’ve managed to save over 50%—that’s a whopping $150 million in cost reductions. GEICO’s experience underscores the importance of continually evaluating cloud strategies and finding the right balance between cloud and on-premises solutions to optimize costs and performance.? Like I've been saying, and seeing (at least in small numbers), customers of hyperscaler's are starting to vote with their wallet, and AI has them materially altering their strategies.

?

Silicon Innovations from Microsoft and Google

Peering behind the curtain of 微软 and 谷歌 , revealed a little more about their silicon journeys. Microsoft shared updates on their custom silicon efforts, aiming to enhance performance and efficiency across their vast array of services. Google showcased how they’re leveraging robotics within their data centers, highlighting advancements in automation that improve operational efficiency and reduce the potential for human error. Seeing both delve deeper into hardware innovation emphasizes the critical role of silicon optimization in meeting today’s computational demands, though no new silicon was announced.

?

It's clear this is an industry in rapid evolution—driven by efficiency, performance, and scalability in the AI era.


Stay tuned for part two later this week!

Nick Hume

Transformative Digital Infrastructure Executive | Expert in Sustainable AI & Liquid Cooling | Podcast Host | ex-AMZN | ex-MSFT

3 周
回复

Loved reading this, Nick!

要查看或添加评论,请登录

社区洞察

其他会员也浏览了