TL;DR - 2023 OCP Global Summit
https://www.opencompute.org/

TL;DR - 2023 OCP Global Summit

Whether you're a startup seeking to lay a strong tech foundation, a mid-size company aiming to optimize infrastructure, or a large corporation lacking a comprehensive technology strategy, my expertise can help. With broad experience as a technology executive, and specializing in sustainable digital infrastructure and scaling out operations, Hume Consulting offers advisory services to design technology strategies, diagnose and resolve operational inefficiencies, and utilize a data driven approach to add value across your business.

Let's collaborate to bring focus and strategic depth to your technology endeavors.


I was super thankful for being able to attend Open Compute Project's (OCP) Global Summit in San Jose last year and planned to head down again this year. Unfortunately, a client engagement took priority this year, however the great team at OCP have started publishing this year's content online, starting with the kick-off Keynote, and there's plenty to be excited about.

I plan to cover topics such as immersion cooling, data center, HPC, and networking, as more workshops, presentations and sessions are released online.


What is OCP?

Open Compute Project is exactly what is says on the tin.

It's a collaboration across the tech community, focused on redesigning technology to support the changing demands on infrastructure. With roots at Facebook circa 2009, a team challenged with supporting an exploding social media platform designed a more efficient to build and run, an at scale infrastructure. This innovation was an integral part of initiation of the OCP Foundation in 2011.

In a historically proprietary based industry, OCP has brought together all the big hitting heavy weights across the globe, encouraging openness of collaboration, community and standards. The board has members from Microsoft, Intel, Meta and Google.


2023 Global Summit review

I've embedded the full keynote below, however there's a few key things that interested me during the presentations.

#INTEL

"AI Training models are 10x YoY" - Zane Ball (Intel)

  • Global DC power usage expected to 2x in 5 years! (2022 to 2027)
  • Intel's Sierra Forest processor, during 1H 2024 has 288-cores (importantly efficiency cores, not performance) and could power a rack with up to 11,000-cores

My rough math that is 1 core per RU and at least 400W per proc (could be higher) utilizing Intel 3 fabrication process - about 15kW, for just the CPU in the rack. Add anything currently standard (for air cooling) 1 GPU per RU to that mix, adds another 26kW. >40kW.

I appreciate Zane (and Intel's) openness when talking about alternate cooling technologies like liquid and immersion. Importantly, in 2021 Intel partnered with Lubrizol to warranty their chips in immersion cooling.

Zane also covers an important topic of the practical implementation of smaller, more targeted AI models "expert models" for example Meta's LLaMA 2 for getting similar outcomes using fractions of compute (and power).

?

#MARVELL

"Network is the new bottleneck"- Loi Nguyen (Marvell)

  • DCs today are ~32MW | New Builds are 1000W (1GW)
  • This is the capacity of a typical nuclear power plant (100,000 homes)
  • Campus Regions today are ~1GW | New Builds are multi-GW!
  • AI accelerates bandwidth 6x in 3 years, at least!? 1.6T interconnects are coming.

Even to produce a 32,000 GPU cluster, there is a minimum of 7:1 oversubscription as there isn't a big enough switch to connect without blocking.

One of my favorite presentations!

?

#BROADCOM

"The network is the compute" - Ram Velaga (Broadcom)

  • Ethernet is an open standard by default, with a large ecosystem
  • RDMA vs UltraEthernet; big improvements like better scaling, selective retransmit, and ease of configuration
  • Basically, every limitation of Infiniband will be a feature/fix in UltraEthernet

Ethernet continues to evolve, with even NVIDIA, who push Infiniband creating their own high-speed (51.2T) Ethernet switch, given customer need.

I'm very excited for UltraEthernet - you can find out more here -> Ultra Ethernet Consortium


#PROMERSION

(the immersion project is the) "largest project in Open Compute...is no longer hypothetical" - Rolf Brink (Promersion)

  • Door HX, Cold Plate and Immersion are the three streams of next-gen cooling
  • Even storage is pushing power limits, current OCP Storage chassis at 2kW
  • Liquid in any form, is here, TODAY
  • Cold plate is becoming commodity, immersion is still emerging

Rolf gives a great overview of the cooling industry and presents a realistic position that no one cooling solution will be the only solution, and all datacenters will be dealing with some form of liquid cooling, in the next 5 years.

?

#META

"...we are very, very far from having a single solution that could work for all of these different kinds of workloads" - Dan Rabinovitsj (Meta)

  • AI is pushing every infrastructure boundary
  • Optimizing infrastructure for 1-2 workloads means compromising others

This is by far my favorite chart during the keynote.

Whilst every chart was a hockey stick, showing growth of spend, power, GPUs, heat, etc, Dan visualizing that a one-size-fits-all strategy, at least for AI, is incredibly hard, expensive, and ultimately in-efficient.


Wrap-up

OCP, is a significant initiative that has reshaped the tech community's approach to infrastructure. Its collaboration and emphasis on open standards have drawn in major industry players, fostering innovation and cooperation.

Intel's focused on AI training models and the growth in global data center power usage is a noteworthy trend. Marvell's insights into the challenges of network capacity in data centers are also crucial, and Broadcom's evolution of Ethernet standards is an interesting move. The immersion cooling project's growth and the practicality of different cooling solutions, as presented by Promersion, are important in the context of energy-efficient data centers. Lastly, Meta's emphasis on the complexity of infrastructure optimization for AI workloads is a compelling perspective.


Have questions about digital infrastructure, future trends or AI? I'd love to connect and help.

Rolf Brink

Driving the global growth and adoption of liquid cooling technologies for data centers

1 年

Hi Nick Hume, great summary! What I liked a lot was the synergy in all the keynotes and technical presentations. For many years we have been talking about liquid cooling being part of the future. This is the year in which we are discussing it in terms of "NOW". The challenges are real, the work is being done and the solutions are out there being deployed. Not tomorrow, but today! This is how all the keynotes and technical sessions were beautifully tied together this summit. It shows how we are at a paradigm shift and stepping into a whole new era for the datacenter industry. Another thing I really loved was the liquid bar which was pulled together at the last minute by Allison Boen... It was great to catch up with so many on Tuesday evening while enjoying the effects of liquid cooling and some snacks! I'm looking forward to your future coverage on immersion cooling, HPC, and networking as the rest of the summit's content is released. It's professionals like you who bring depth and breadth to these discussions, making them accessible and engaging for the rest of us.

Allison Boen

Immersion Cooling Advisor & Influencer ?? 30 Years Infrastructure Power & Cooling@ Alcatex ?? Shell Immersion Cooling Fluid Brand Ambassador?? SVP @ DatanovaX - ????OCP Immersion Community Outreach Lead

1 年

Nick Hume I completely agree with Rob Coyle that there is so much to unpack and I personally am still on my "Immersion High" from the content. Starting with the Keynotes - Rolf Brink really brought the excitment for Liquid cooling content, but he followed an equally impressive Keynote by Intel Corporation who metioned their collaboration with Shell and their Immersion Cooling Fluid! Now that was EPIC in my book. My personal favorite was the Liquid Bazaar event I mentioned in my post and the open mic opportunity to ask questions was standing room only - speaks to the interest around that subject. There is more to glow about... but I agree... An “Immersed” with Allison Boen is coming up on this topic! I love the community around OCP and I was so grateful to meet so many in person that I have literally only met on a video call!

Brian Kinkade

Keeping Computing Cool - Talk to me about Immersion Cooling

1 年

Nick, I brought my boss so he could get up-to-speed. Seeing OCP Global Summit through his eyes was awesome. We have a great team of people moving the ball forward....scrum might apply to software but it also applies to rugby too. Glad to be on Team Immersion.

Rob Coyle

Breaking down barriers to sustainability, accessibility, and cost-efficiency... an open approach to data center infrastructure is essential for achieving a more sustainable future.

1 年

Thanks Nick! So much to unpack from the event. I am still coming down from the excitement! Not only was this the biggest event from an attendance perspective, but we also had some great announcements that you covered in your article. We also had more hands-on demos in our Experience Center then ever before. It's hard to pick just one or two things to highlight, but we saw lots of work in accelerators, supporting hardware and multiple cooling strategies related to the demand of AI. Any specific news from our event that you are most curious about?

要查看或添加评论,请登录

Nick Hume的更多文章

  • Behind the Curtain: AWS re:Invent 2024 Highlights

    Behind the Curtain: AWS re:Invent 2024 Highlights

    Expanding on my post from last week, it was great to see AWS leaning back into their engineering roots at re:Invent…

    3 条评论
  • OCP Global Summit 2024 Series

    OCP Global Summit 2024 Series

    For the final piece of the Global Summit wrap up, I focus on Networking, both inside the server and between racks, and…

  • OCP Global Summit 2024 Series

    OCP Global Summit 2024 Series

    We've touched on the power innovations at the summit, so obviously, the next logical step is to talk about cooling…

    2 条评论
  • OCP Global Summit 2024 Series

    OCP Global Summit 2024 Series

    Originally planned as a two-part reflection, my series from the fantastic OCP Summit has grown into a series! Up next:…

    2 条评论
  • OCP Global Summit 2024 Series

    OCP Global Summit 2024 Series

    It’s been a busy conference season, with the AI Hardware and Edge AI Summit, Yotta 2024, and OCP’s Global Summit all…

    3 条评论
  • AI for real life

    AI for real life

    As I’ve been busy with my day job(s) and various projects, like the Tech Insider Podcast, I haven’t put my hands to the…

    1 条评论
  • To InfiniBand, maybe beyond?

    To InfiniBand, maybe beyond?

    Nvidia's latest roadmap was teased at Computex in Taiwan last month. Whilst details were a little light on PFLOPS and…

  • Apple, not Artificial, Intelligence

    Apple, not Artificial, Intelligence

    Just last month, Apple hosted their yearly WWDC - an event where they showcase all the updates to their platforms…

  • Oh great, another podcast...

    Oh great, another podcast...

    As you may have seen (or heard my "Ausmerican" accent) recently, I've started a podcast, and wanted to share a little…

    2 条评论
  • OCP 2024 Regional Summit wrap

    OCP 2024 Regional Summit wrap

    The Open Compute Project (OCP) Regional Summit was hosted in Lisbon, Portugal last month, the 5th (and largest)…

社区洞察

其他会员也浏览了