Notable Trends in Next-Generation AI Data Centers

Notable Trends in Next-Generation AI Data Centers

Highlights:

  • The data center colocation sector has seen remarkable investments in recent years including in green data centers, and a new breed of companies offering GPU cloud computing is emerging breathtakingly fast to grab funding.
  • Data center availability is at a multi-year low. The price charged by colocation providers rose by an average of 35 percent between 2020 and 2023.
  • Next-generation AI data centers have new requirements for locations, infrastructure designs and operations. Space, power, cooling and connectivity are all major constraints for growing data centers.?
  • Accelerated interest in modular, containerized data centers is widespread.
  • A centralized digital infrastructure for AI will give way to decentralized AI infrastructure everywhere. Hence, the network is the computer.

Here, we highlight the most notable market trends in data centers in recent years as of? October 2024. Starting from an overall taxonomy framework of data center infrastructure made by Tracxn, let’s discuss what is changing in the landscape.


Credit: Tracxn

If we consider data center funding through equity funding alone, below are the countries, cities and companies that stand out.

Credit: Tracxn

In the past year, the data center colocation sector has seen incredible funding growth and is the top-funded business model in the data center industry.

Credit: Tracxn

Big Investors and Big Deals

“Hyperscale operators continue to aggressively expand their operations, while both enterprise and consumer-oriented cloud markets keep on growing rapidly. This is driving an ever-increasing need for data center capacity,” said John Dinsdale, a chief analyst at Synergy Research Group. “The level of data center investment required is too much for even the biggest data center operators, causing an influx of new money from external investors. In quick succession ownership of four of the top six US data center operators has changed hands, while the two biggest names in the industry – Equinix and Digital Realty – are increasingly turning to joint ventures to help fund their growth. Over the last 18 months there has been a very notable shift in buyers with private equity investors becoming a lot more active than data center operators.” (Source)

Blackstone and Canada Pension Plan Investment (CPP) acquired AirTrunk for $16.11 billion (not competed yet, the transaction is subject to approval from the Australian Foreign Investment Review Board.) and Blackstone also acquired QTS for $10 billion. KKR and Global Infrastructure Partners acquired CyrusOne for $15 billion. DigitalBridge and IFM acquired Switch Inc. for $11 billion. Digital Realty acquired Interxion for $8.4 billion and DuPont Fabros Technology for $7.6 billion. Brookfield and Ontario Teachers’ Plan acquired Compass Datacenters for $5.5 billion, Brookfield also acquired Data4 for a $3.8 billion. Equinix acquired Telecity for $3.8 billion and Verizon's data centers for $3.6 billion. EQT Infrastructure acquired EdgeConneX for $2.5 billion. These are just some top PE investors making moves in recent years. Data Center Dynamics keeps an updated list here.

Cloud service providers (CSPs) are rapidly constructing state-of-the-art facilities. Because of supply constraints, they are also partnering with colocation providers (known as “colos”) that are similarly expanding their infrastructures.

Looking beyond the scope of the traditional colocation sector, there is a new breed of companies that is emerging breathtakingly fast (see their funding below). They offer high-performance computing (HPC) as a service, or GPU cloud. Some work closely with Nvidia to operate data centers primarily powered by the latest GPUs. Coreweave is an example. Currently in our tracking, there are over 100 companies in this group with different value offerings.


Credit: Tracxn

The number of data centers in the U.S. has doubled in the last three years. But as McKinsey reported: “Tight supply is already apparent in the market. Prices charged by colocation providers for available data center capacity in the United States fell steadily from 2014 to 2020 in most primary markets but then rose by an average of 35 percent between 2020 and 2023.”?

McKinsey’s scenario analysis estimates that “global demand for data center capacity could rise at an annual rate of between 19 and 22 percent from 2023 to 2030 to reach an annual demand of 171 to 219 gigawatts (GW).” Opportunities abound for owners and operators of data centers, companies in data center construction, equipment suppliers, and energy and power supply value chains. But there are new location, design and operational requirements to address.

Credit: McKinsey

Data Center Infrastructure Needs to be Re-Designed to Support GenAI

Big-money moves might support the changes needed for next-generation data centers.

The nature of modern advanced AI workloads is transforming where and how data centers are being designed and operated. We selected highlights from a whitepaper by Zayo, who shared findings from interviews with clients.

“Generative AI (GenAI) architectures are fundamentally different and cannot be deployed in traditional data centers. These architectures are divided into two types. These architectures typically reside (today) in the same data center, but they are logically and physically distinct. (and might reside in different data centers in the future)

  • Training clusters: built to ingest massive amounts of data and train models, these are akin to an HPC architecture, and are designed for the highest performance. These clusters are essentially the back end of any generative AI model.?
  • Inference clusters: these are built to run live data through a trained AI model to make a prediction or solve a task. The architecture of these is more cloud-like but still focused on overall performance. These support the interface users use to ask questions, process data, etc.?

Training clusters are causing the most significant shift in data center design because they are just different — and much more intensive. They run a distributed synchronous job, or to put it another way, a single workload spread across every server node in the cluster. Every node in the cluster passes large amounts of data to and from other nodes, and each node crunches data as quickly as possible.?

The scale of these clusters is enormous. GPT-3 was trained on a massive cluster with 285,000 CPU cores and 10,000 GPUs. GPT-4 is even larger. Designing an AI training cluster, even for an enterprise or university, requires cramming hundreds or thousands of servers into a space. And these servers are not only new servers, they’re highly specialized with demanding designs. The design of these clusters comes with substantial physical constraints that are already forcing data center designers to rethink how a data center should be built.?

The first constraint is size. Data center availability is at a multi-year low. Many data centers don’t have enough physical space to deploy AI infrastructure. To cope with this problem, we’re already seeing accelerated interest in modular, containerized designs that can be easily deployed without the extensive costs and timelines needed for traditional data center construction. Overall, here is a good sum-up of all current constraints, or how data centers should be upgraded to support GenAI computing.”

Credit: Zayo

Changes in Data Center Deployments

Where are these AI-capable data centers being deployed? They are less dependent on being near population centers and consume up to 300% more power compared to traditional data centers. From Zayo’s paper:

“Tier 1 markets already are struggling to maintain rates of growth, simply because power isn’t available. There’s a natural shift toward secondary markets that have substantial power generation and distribution capacity.

Based on the interest in modular AI data center containers, colocation companies or enterprises are looking to deploy them near existing data centers, at network PoPs, or in owned commercial space.

Our customers are coming to us with conversations about placing AI infrastructure in and around industrial sites. It might not seem apparent, but both IDC and Amazon think that generative AI will have the most impact across the manufacturing sector.

Generative AI infrastructure will be integrated into smart homes and cities, streaming services, facial recognition technology, and autonomous vehicles. Any generative AI use case that requires low latency to process and react to sensor data in real time will reside at the edge.

Many of these trends are converging into a new, emerging paradigm called distributed AI. As generative AI is beginning to pervade thousands of use cases across millions of users, it’s becoming clear that a huge, centralized digital infrastructure for AI will give way to AI infrastructure everywhere. AI clusters will be close to users, will interact with each other, and will be able to serve all the emerging use cases by being massively parallel, resilient, and built for longlasting value.”

Change in Data Center Connectivity?

“Backend network demands are a new limitation. To meet the need for speed and data sharing at scale, the servers in AI training clusters are connected to a massive backend network. Most AI training clusters use Infiniband for their backend networks because of its very high throughput (up to 400Gbps) and low latency. This requirement for a performant back-end network, connecting each server, means that cabling is another constraint.”?

Beyond cabling, all performant connectivity between all components and computing assets within servers (Scale Up) and between servers (Scale Out) are crucial since the time spent in networking is a major factor in deciding the total time required for AI workloads. Broadcom gives a picture about scale up and scale out here. (more to that in the future)


Credit: Broadcom

We can anticipate major shifts in data center connectivity as inference and training clusters become geographically dispersed because the former prioritizes power availability while the latter prioritizes minimizing latency.

From Zayo’s paper: “A huge spike in demand for network infrastructure will emerge and we will begin to see more data center-to-data center, and more data center-to-cloud traffic as generative AI moves from pilot to production.”?

CBRE also said: “AI usage and identification of new use cases grew tremendously in H1 2023.... It is revolutionizing network requirements, performance capabilities, and new enterprise use cases such as predictive analytics. This will sharply increase network infrastructure demand, which is critical for transporting high data volumes between locations and interconnection systems.

As Broadcom said, “The network is the computer!” Lots of innovations around the connectivity for AI computing are worth another article to dive deeper, so stay tuned for that.

The data center sector is full of big players. The opportunity is huge and the stakes are very high, so where are the opportunities for startups? We’re tracking hundreds of companies to stay on top of that.?

Let us know your investing interest, criteria, and what data/signal you look for in companies. Welcome to recommend your portfolio companies that should be on our radar for our investor network.

Apply to Join Global League Club - We collect and clean data/signals, so you focus on deal-making.

Our previous newsletters can be read here on LinkedIn.


要查看或添加评论,请登录

Global League的更多文章

社区洞察

其他会员也浏览了