登录查看更多内容

Breaking Big Tech's AI Stranglehold: The Case for Distributed Artificial Intelligence

Anand Iyer

Managing Partner Canonical | Venture Partner Lightspeed

发布日期: 2024年11月15日

Microsoft and BlackRock are raising a $30B fund just to build AI data centers. That's more than NASA's entire budget for a decade, just for buildings to house GPUs.

OpenAI went from a few thousand GPUs for GPT-3 to what analysts estimate is over 25,000 A100s for GPT-4.

Meta is upping the ante with plans to invest a staggering $40 billion in AI infrastructure in 2024 alone, including an $800 million AI-optimized data center in Alabama.

Tesla is taking a unique approach, spending $1 billion on AI infrastructure in Q1 2024 and planning a massive data center at its Giga Texas facility with 50,000 NVIDIA GPUs and 20,000 Tesla HW4 AI computers.

Google Cloud added $2.5B in AI revenue in one quarter.

Every modern AI cluster now demands more power than entire cities. The new standard isn't megawatts - it is gigawatts. Microsoft and OpenAI aren't asking regions about tax breaks anymore; they're asking "can you guarantee us 2-3GW of stable power?" That's enough electricity to power 2M American homes.

NVIDIA controls 90% of AI chips and still can't keep up. The waitlist for H100s stretches several quarters into the future. The heat from these GPU clusters is so intense that companies are forced to build near water sources or in cold climates. Geography has become destiny in AI.

But here's what's not widely known: the public cloud providers own mere basis points of the world's total GPU compute capacity. For perspective at its peak, Ethereum had the equivalent compute power of 10-20 million high-end GPUs, far more than all AI companies combined (h/t Jared Quincy Davis from Foundry). Even today's iPhone 16 Pro has more compute power than some datacenter GPUs. The problem isn't a lack of compute power, it's how we are organizing it.

The dirty secret of AI infrastructure is its inefficiency. Even the most sophisticated organizations running pre-training workloads achieve less than 80% GPU utilization, sometimes dropping below 50%. They're forced to keep 10-20% of their GPUs as "healing buffer" due to frequent failures. Modern H100 systems contain over 35K components. They are not just chips, they are entire data centers compressed into boxes, and they fail constantly.

AI hardware infrastructure is being built like did data centers in the 1990s, not like we build cloud services today. The current model is stuck in what industry experts call the "parking lot business" - forcing companies into rigid 3 year GPU reservations instead of true cloud-like elasticity. This creates massive inefficiencies: capital tied up in idle hardware, geographic constraints due to power requirements, and inability to scale dynamically with demand.

The environmental cost of this AI arms race is staggering. The heat output is so intense that Microsoft is experimenting with underwater data centers. These mega-facilities aren't just consuming city-scale power - they're reshaping our planet's resources.

The internet's success wasn't built on mega-data centers - it was built on protocols that let millions of computers work together. The same revolution should happen in AI.

Distributed AI

The building blocks for distributed AI already exist. We do not need to invent new technologies, we just need to apply proven approaches in new ways. From privacy-preserving training methods to efficient computing architectures, the technical foundation is ready:

Federated learning protocols that enable collaborative training while keeping data private
Mesh networks that can coordinate thousands of smaller compute nodes
New chip architectures that prioritize efficiency over raw power
Edge computing that brings AI closer to where data is generated

The economic case for distributed AI isn't just about democratization - it's about fundamental efficiency gains that make sense even in purely business terms. By breaking free from centralized mega-facilities, we can unlock multiple layers of value:

Lower capital requirements through shared infrastructure
Better resource utilization through dynamic allocation
Reduced cooling costs through geographic distribution
Faster innovation through parallel experimentation

领英推荐

This AI newsletter is all you need #92

Towards AI 11 个月前

LLM Inference War Begins

AIM 6 个月前

SambaNova’s Chip Competes with NVIDIA

Sramana Mitra 4 个月前

The Open Source Imperative

The open source movement gave us Linux, which now runs 96% of the world's servers. It gave us Python, which powers most AI development. Now we need open source to break AI infrastructure free.

Open source models matching closed ones with fraction of the resources
Distributed training protocols being developed in the open
Community-driven alternatives to proprietary AI tools
Collaborative approaches to dataset creation and curation

As Zuck argues, the concentration of AI capability in a few hands may be as dangerous as widespread access. Open source helps ensure balanced development, faster security patching, and eliminates single points of failure. When Meta released Llama 3, they proved smaller models can match the performance of much larger ones through better architecture - the 8B parameter model nearly matches their 70B model's performance.

Three Pillars

Democratized AI requires progress across three fundamental areas:

Mesh Computing Networks: we need networks that can dynamically connect and coordinate heterogenous computing resources across the globe
Community-Driven Infrastructure: collaborative coordination of resources and governance
Open Source Foundation Models: the democratization of core AI technology through open source

Mesh networks provide the compute, community infrastructure coordinates it effectively, and open source models ensure everyone can participate in and benefit from AI advancement.

Conclusion

When you use ChatGPT today, you're dependent on OpenAI's servers, their decisions, their pricing, and their policies. But imagine if AI worked more like cryptocurrency networks - where you could choose from thousands of providers, run your own node if you wanted to, and have a voice in the system's governance. Your device could contribute processing power while you sleep, earning credits you use for AI services. Your gaming PC could help train medical AI models during idle times, improving healthcare while generating value for you.

Now, consider what it takes to train a competitive language model today: tens of thousands of GPUs, gigawatt-scale power requirements, and hundreds of millions in infrastructure costs. Even well-funded university labs and research institutions can't compete. When a single training run costs more than most universities' annual research budgets, we've created a system where only tech giants can participate in foundational AI research.

The next wave of AI breakthroughs won't come from building bigger clusters, they will come from building smarter ones. As we've seen with Meta's Llama 3 and Google's distributed training approaches, the future lies in better architectures, not bigger hardware. The question isn't whether we have enough compute power. Ethereum proved we do.

And we have seen this democratization story before. Bitcoin and Ethereum showed us that millions of people will contribute their computing power to a shared network when given the right incentives. These networks aren't controlled by any single company - they're owned and operated by their communities. The same people who use them also build and maintain them. This radical idea transformed finance, creating a $3 trillion+ ecosystem that operates 24/7 without any central authority.

By the way, this isn't about cheaper AI access. It’s about who controls the future of intelligence. When AI systems that impact billions of lives are controlled by a handful of companies, we all become dependent on their judgment and goodwill.

The tools exist. The technology works. The economics make sense.

We just need to build it.

David de Hilster

Co-Author of NLP++ & Adjunct Professor at Northeastern University Miami

3 个月

I agree. Distributed AI is the correct way to go. But statistical models are not. Even if LLM and all statistical models were distributed, this cannot solve the fact that they are inherently and always be unntrustworthy. We need to build AI by hand and distribute that task given it will require tens of thousands of people. Here is my take: https://nluglob.org/the-next-revolution-in-human-language/

2 次回应

Aunkur Arya

3 个月

Agree. Made a small, personal investment in a company called webAI that has real traction with Enterprise customers using local compute. Check them out.

1 次回应

Pavel Uncuta

??Founder of AIBoost Marketing, Digital Marketing Strategist | Elevating Brands with Data-Driven SEO and Engaging Content??

4 个月

Love the concept of tapping into unused compute power for AI! ?? Let's unlock the potential in everyday devices. #DistributedAI #Innovation ??

查看更多评论

要查看或添加评论，请登录

Anand Iyer的更多文章

The Four Acts of Virtuals' Evolution

2025年2月18日

The Four Acts of Virtuals' Evolution

Shopify transformed the ecommerce landscape by enabling millions of entrepreneurs to easily launch online stores – a…

4 条评论
Privacy-Preserving Machine Learning with Fully Homomorphic Encryption

2024年3月19日

Privacy-Preserving Machine Learning with Fully Homomorphic Encryption

Most LLMs (Large Language Models) today are trained primarily on publicly available data. This can limit their…

8 条评论
AI x Blockchain: Key Takeaways from our Generative NYC event

2023年9月29日

AI x Blockchain: Key Takeaways from our Generative NYC event

Fascinating chat with Illia Polosukhin Alex Atallah on the future of AI x Blockchain. 3 high-level takeaways: Hardware…

3 条评论
Introducing Canonical Crypto

2022年6月2日

Introducing Canonical Crypto

From Trusted to Trustless..

62 条评论
Welcoming SZNS to the Pear Family

2021年10月19日

Welcoming SZNS to the Pear Family

Non-Fungible Tokens (NFTs) have proven to be the perfect on-ramp, bridging non-crypto natives to the crypto world. With…

1 条评论
$SUSHI

2021年7月19日

$SUSHI

A few friends, especially those entrenched in VC, have reached out to me over the last few days asking for a take on…

1 条评论
The Evolution of Managed Marketplaces

2017年6月9日

The Evolution of Managed Marketplaces

Some of you may be too young to remember, but the physical Yellow Pages was one of the main ways you could find local…

2 条评论
On Market Sizing

2017年5月17日

On Market Sizing

I meet with like-minded product managers or engineers about 2–3 times a month who are interested in or are working on…
How much does an employer pay for a W2 Full Time Employee?

2015年8月18日

How much does an employer pay for a W2 Full Time Employee?

[originally posted this on Medium -…

1 条评论
The Work-Family Imbalance

2015年4月6日

The Work-Family Imbalance

This post originally appeared on TechCrunch on April 4, 2015:…

5 条评论

See all articles

Breaking Big Tech's AI Stranglehold: The Case for Distributed Artificial Intelligence

Anand Iyer

Managing Partner Canonical | Venture Partner Lightspeed

Distributed AI

领英推荐

The Open Source Imperative

Three Pillars

Conclusion

Anand Iyer的更多文章

社区洞察

其他会员也浏览了

AI Breakthroughs: AMD’s MI325x Chip, Google’s Imagen 3, Tesla’s Cybercab, and the AI Glasses Privacy Challenge

AMD Processors and Microsoft's AI Adoption

Leading Practices for GPUaaS and LLMaaS Success: A Detailed Guide

Colossus Supercomputing: The Next Leap for Hyperscale Data Centres

Colossus: A New Paradigm in AI Computing Infrastructure

How do we leverage Cloud GPUs to boost the performance of AI/ML workloads?

Inside the H200 Tensor Core GPU: An In-Depth Architectural Analysis

Inside the H200 Tensor Core GPU: An In-Depth Architectural Analysis

Where is the AI/ML App building cost game heading towards?

?? Why Nvidia Lost Over $590 Billion in Stock Market Value

Distributed AI

领英推荐

The Open Source Imperative

Three Pillars

Conclusion

Anand Iyer的更多文章

The Four Acts of Virtuals' Evolution

Privacy-Preserving Machine Learning with Fully Homomorphic Encryption

AI x Blockchain: Key Takeaways from our Generative NYC event

Introducing Canonical Crypto

Welcoming SZNS to the Pear Family

$SUSHI

The Evolution of Managed Marketplaces

On Market Sizing

How much does an employer pay for a W2 Full Time Employee?

The Work-Family Imbalance

社区洞察

其他会员也浏览了

AI Breakthroughs: AMD’s MI325x Chip, Google’s Imagen 3, Tesla’s Cybercab, and the AI Glasses Privacy Challenge

AMD Processors and Microsoft's AI Adoption

Leading Practices for GPUaaS and LLMaaS Success: A Detailed Guide

Colossus Supercomputing: The Next Leap for Hyperscale Data Centres

Colossus: A New Paradigm in AI Computing Infrastructure

How do we leverage Cloud GPUs to boost the performance of AI/ML workloads?

Inside the H200 Tensor Core GPU: An In-Depth Architectural Analysis

Inside the H200 Tensor Core GPU: An In-Depth Architectural Analysis

Where is the AI/ML App building cost game heading towards?

?? Why Nvidia Lost Over $590 Billion in Stock Market Value