登录查看更多内容

Vertical Hyper-“Scaling” into AI Dominance

Sam Bobo

Product Manager of Artificial Intelligence @ Microsoft specializing in Conversational AI and Generative AI technologies | Former IBM Watson

发布日期: 2023年10月20日

Anyone reading the news on Artificial Intelligence over the last few months should be noticing a pattern among large hyperscaler providers — IBM, Microsoft, Amazon, Google, (and Facebook):

Pattern 1 — Large Language Models: these players are both (1) building proprietary large language models, either via partnership or research & development (2) open sourcing and/or incorporating open-source models directly on their Platform as a Service.

Microsoft/Facebook: Introducing Llama 2 on Azure (microsoft.com)
Microsoft/OpenAI: Microsoft and OpenAI extend partnership — The Official Microsoft Blog
Amazon/Anthropic: Anthropic \ Expanding access to safer AI with Amazon
Google Bard:

AND

Pattern 2 - Chip Design: Vertically integrating and tackling one of the most costly and scarce resources in the space to-date: chips

Amazon: Amazon is racing to catch up in generative A.I. with custom AWS chips (cnbc.com)
Microsoft: Microsoft to Debut AI Chip Next Month That Could Cut Nvidia GPU Costs — The Information
OpenAI: Exclusive: ChatGPT-owner OpenAI is exploring making its own AI chips | Reuters
Google: Tensor Processing Units (TPUs) | Google Cloud — for reference, Google trailblazed with the introduction of the TPU over traditional GPUs.

All this to preserve lock-in and continued dominance derived from economies of scale within the tech sector by reducing the high costs associated with training AI models and running continued inference on them as API or SDKs tapping into those models get invoked by a plethora of solutions relying on them for value-driven features.

Certainly Nvidia is dominating the market with its AI optimized A100 GPUs (NVIDIA A100 | NVIDIA) and as silicon resources as well as fab plants become scarce, large organizations are taking matters into their own hands by developing proprietary chipsets.

Lets explore the value chain with Artificial Intelligence and why vertical integration is necessary

AI Services & Models

Stating the obvious: AI services are commoditized, both in terms of Conversational Intelligence-era services — speech-to-text, text-to-speech, natural language understanding, etc AND eventually Generative Intelligence-era capabilities such as text-to-image, summarization, and more. I delved into this strong assertation in a previous blog post (mentioned below).

In summary:

These systems compete on the merit of accuracy and performance, constantly pushing the boundaries and forcing the recalculation of that benchmarks are considered “table-stakes.” Furthermore, there are open source packages widely available to use that perform similar characteristics.
Availability of these services are mere table-stakes for Platforms as a Service (more later) offering AI capabilities, specifically natural language or conversational in nature.

Under the economic model of commodities, customers are price sensitive and are willing to quickly move to an equivalent substitute. While the former part of that statement is true, the latter is much harder due to (1) company-wide preferences and/or partnerships and their associated cultural propagation and use within an organization (2) data egress fees for removing data and (3) overall platform lock-in from proprietary managed versions of specific services.

Furthermore, hyperscalers are tackling the Open Source market for LLMs, including broad announcements to host LLaMA 2 on Azure or Google cloud? Why would these companies host competitor models, whether proprietary or not, when the hosting company offers one of their own? Simple — two primary reasons (1) cost amortization — (more later in the piece) and (2) switching cost — offering a popular model adds additional value to companies already on the platform and entices others. Why move when there is already a contract in place and/or many workloads already operate on that cloud, its hard to switch.

Which leads us to…

Platform(s) as a Service (“PaaS”)

Applications working with Natural Language Processing capabilities or Artificial Intelligence more broadly requires a tremendous amount of industry and proprietary data to fine tune, refine, and iterate on models deployed within applications/solutions. Within the AI realm, there exists an entire ecosystems of tools essential for Data Scientists, Conversational Designers, Engineers, and the like, these services range from:

Machine Learning Operations (MLOps) — MLOps, short for Machine Learning Operations, is an engineering discipline that aims to unify machine learning systems development and deployment, applying to entire lifecycle — from integrating with model generation, orchestration, health diagnostics, governance, and business metrics.
Data Management — from databases (Vector, SQL, noSQL, Graph, etc), notebooks, data lakes and warehouses, and the like, ranging from programmatic to visual, these services simplify the data discovery, cleansing, training, and optimizing for model creation and utilization.
Security, Networking, — providing Distributed Denial of Service (DDoS) attack management, endpoint protection, code scanning / linting, health probes, and mroe
Application Management — from the operating systems, container images, asynchronous functions, object storage, and more

领英推荐

Breaking down Gemma, Google’s new open-source AI model

Fast Company 1 年前

TAI #132: Deepseek v3 – 10x+ Improvement in Both…

Towards AI 3 个月前

TAI #140: xAI’s Grok-3 steps up the AI Arms Race

Towards AI 1 个月前

These platforms offer one-stop-shop destinations for the entire design, development, deployment, and ongoing monitoring of applications. As I argued in “The Commoditization of AI Systems”

To differentiate here, the source of competitive advantage for this segment is derived from the user experience and tooling as part of vertical and horizontal integration within the space, not from the engines themselves.

The ability to easily remove friction, shorten time-to-market and/or “hello world” and drive automation, specifically for complex capabilities drive competitive advantage for hyperscaler clouds. This is why tooling has been an integral focus native within these offerings. One evidence of such is IBM Watsonx which seeks to integrate many of the aforementioned capabilities, targeted for speech scientists, into a single platform experience.

All of these platforms and services rely on underlying hardware, the Infrastructure layer.

Infrastructure/Servers

Hyperscalers came into existence based on initial capital investment in servers. This acquisition of server capacity happened over time as large technology companies required more to develop software and handle daily internal course of business to service customers. While virtualization, as a concept, had been around since the 1970s, it was not until Amazon launched AWS in 2002 that the concept of Cloud Computing boomed and started catalyzing the market for hyperscalers. Soon after, large technology organization whom had historically built this capacity soon realize the value of renting out unused servers, specifically at an as-needed basis.

Smaller organizations who therefore could not submit enough financial capital to acquire a server and/or did not require a server for a long period of time could simply rent one and pay for the usage as needed. This was a novel idea at the time. Furthermore, as software functionality helped to automate scalability (dynamically increasing and decreasing capacity based on demand), failover and redundancy (ensuring that if a datacenter goes down and there is an outage, that there is a backup ready), and even down to selecting the specific hardware required to get a particular job done.

Referring back to AI services and why hyperscalers would host non-proprietary models on their platforms, the first reason this was done was amortization. Servers are a fixed cost asset, have a depreciation / life expectancy, as well as get amortized as cost on a set of offerings, all of this meaning that companies can spread cost over time and ensure full utilization of a hardware investment, not losing money for idol time. Therefore, is compute power can be used to run inference jobs on open source models, why wouldn't companies offer those models on their platform, especially when they can add a managed service on top to make additional money for service reliability and routine upgrades.

This leads to the final question — how can hyperscalers whom have all of this hardware already continue to drive competitive advantage? The answer is driving cost advantage, and that is done through chips.

Chips

GPUs are specialized hardware devices that can perform parallel computations faster than general-purpose CPUs. They were originally designed for graphics rendering in video games and other visual applications, but they have been increasingly used for AI training and inference since the late 1990s and early 2000s . GPUs can accelerate the operations and calculations that are essential for deep learning algorithms that underpin modern Artificial Intelligence systems.

Some notable characteristics that differentiate AI-optimized chipsets versus traditional GPUs include:

Smaller Transistor Size: AI chips incorporate a massive number of smaller transistors which run faster and consume less energy than larger ones
Compute: AI chips execute a large number of calculations in parallel rather than sequentially, as in CPUs. Additionally, they calculate numbers with low precision in a way that successfully implements AI algorithms but reduces the number of transistors needed for the same calculation
Memory Access: AI chips speed up memory access by, for example, storing an entire AI algorithm in a single AI chip; and using programming languages built specifically to efficiently translate AI computer code for execution on an AI chip.

An AI chip a thousand times as efficient as a CPU provides an improvement equivalent to 26 years of Moore’s Law-driven CPU improvements. — AI Chips: What They Are and Why They Matter — Center for Security and Emerging Technology (georgetown.edu)

Specifically, AI optimized chips are essential for reducing costs for training massive models such as those demonstrated by Large Language Models (“LLMs”).

Typically, a few chip fabricators existed: Nvidia, AMD, and Intel. Large hyperscalers such as Microsoft, IBM, Amazon, Google, and more were dependent on the technological advancement of these chip manufacturers and competed on time-to-market and cost in the “race” for generative AI capabilities. Furthermore, there is an inherent cost to scale server capacity which all requires chip-sets to function.

In this world where Artificial Intelligence is driving the largest compute resources and putting the most pressure on hyperscaler datacenters, cost is the largest obstacle to overcome. Currently, these datacenters are limited by the current available chip sets and pass those costs onto the customer. The additional cost burden on the customer for additional training and subsequent inferences against a model make it cost prohibitive to enter and/or thrive in the market long term.

In reality, the company that innovates in the chip space and offers specialized chips optimized for AI will become more dominant in the hyperscaler wars, unless tooling and removal of friction become more dominant than cost as compute from general players line Nvidia. Dominant positions will first be realized be new entrants into the AI ecosystem followed by those whom the switching cost can be written off. Right now, Google and TensorFlow are leading the race and if AI is booming the way 2023 has shaped the AI field to be, that will be a major factor in helping compete with Azure and AWS.

This is an exciting space and intriguing thought exercise to undertake thinking about vertical integration. Stay tuned for more developments in AI!

Speaking Artificially

592 位关注者

要查看或添加评论，请登录

Sam Bobo的更多文章

The Youth of our AI Nation — Their Preceptive Minds

2025年3月28日

The Youth of our AI Nation — Their Preceptive Minds

Original Post on Medium: The Youth of our AI Nation — Their Preceptive Minds | by Sam Bobo | Mar, 2025 | Medium…
A LLM Trying to “Catch’em All” Playing Pokémon

2025年3月7日

A LLM Trying to “Catch’em All” Playing Pokémon

Original Post on Medium: A LLM Trying to “Catch’em All” Playing Pokémon | by Sam Bobo | Mar, 2025 | Medium Growing up…
Preserving AI for Social Good — How AI can Preserve History

2025年2月28日

Preserving AI for Social Good — How AI can Preserve History

Original publication on Medium: Preserving AI for Social Good — How AI can Preserve History | by Sam Bobo | Feb, 2025 |…
Adaptation Artifacts for Tuning Models

2025年2月21日

Adaptation Artifacts for Tuning Models

Originally posted on Medium: Adaptation Artifacts for Tuning Models | by Sam Bobo | Feb, 2025 | Medium Perfect theories…
My Journey into AI

2025年2月14日

My Journey into AI

Original Post on Medium: My Journey into AI. A short post detailing my entry into AI… | by Sam Bobo | Feb, 2025 |…

1 条评论
The Opportunities Lost with Human Replacement

2025年1月31日

The Opportunities Lost with Human Replacement

Original post on Medium: The Opportunities Lost with Human Replacement | by Sam Bobo | Jan, 2025 | Medium “The Giver”…
Grounding the World Around Us — Inference on the Edge

2025年1月24日

Grounding the World Around Us — Inference on the Edge

Original Post on Medium: Grounding the World Around Us — Inference on the Edge | by Sam Bobo | Jan, 2025 | Medium Take…
AI Model Management and Lock-In Potentials

2025年1月17日

AI Model Management and Lock-In Potentials

Original Post in Medium: AI Model Management and Lock-In Potentials | by Sam Bobo | Jan, 2025 | Medium Entering the…
Blogging Year in Review 2024

2024年12月27日

Blogging Year in Review 2024

Originally posted on Medium: 2024 Blogging Year in Review. Humbled by my readership, I recap 2024… | by Sam Bobo | Dec,…
Standardizing for Commoditization — Anthropic’s Model Context Protocol (MCP)

2024年12月20日

Standardizing for Commoditization — Anthropic’s Model Context Protocol (MCP)

Originally posted on Medium: Standardizing for Commoditization — Anthropic’s Model Context Protocol (MCP) | by Sam Bobo…

1 条评论

See all articles

Vertical Hyper-“Scaling” into AI Dominance

Sam Bobo

Product Manager of Artificial Intelligence @ Microsoft specializing in Conversational AI and Generative AI technologies | Former IBM Watson

AI Services & Models

Platform(s) as a Service (“PaaS”)

领英推荐

Infrastructure/Servers

Chips

Speaking Artificially

592 位关注者

Sam Bobo的更多文章

社区洞察

其他会员也浏览了

?? Nvidia Releases Open-Source AI, Competes with OpenAI

AI News Roundup: Dec'23 Highlights

AI NEWS YOU MISSED ?#53

What is Microsoft & Nvidia's Megatron-Turing?

NVLM: Unpacking Nvidia's Bold Move in the Open Source AI Race

(How-to) Smaller, Faster, Cheaper. The Rise of Mixture of Experts & LLAMA2 on Microsoft Azure

Go Small to Go Big: Small Wins Prove AI Isn’t a Bubble

How DeepSeek hits Nvidia

DeepSeek: China’s Game-Changing AI and the Future of Global AI Competition

From Telephones to AI Agents—Alexander Graham Bell Would be Proud

AI Services & Models

Platform(s) as a Service (“PaaS”)

领英推荐

Infrastructure/Servers

Chips

Speaking Artificially

592 位关注者

Sam Bobo的更多文章

The Youth of our AI Nation — Their Preceptive Minds

A LLM Trying to “Catch’em All” Playing Pokémon

Preserving AI for Social Good — How AI can Preserve History

Adaptation Artifacts for Tuning Models

My Journey into AI

The Opportunities Lost with Human Replacement

Grounding the World Around Us — Inference on the Edge

AI Model Management and Lock-In Potentials

Blogging Year in Review 2024

Standardizing for Commoditization — Anthropic’s Model Context Protocol (MCP)

社区洞察

其他会员也浏览了

?? Nvidia Releases Open-Source AI, Competes with OpenAI

AI News Roundup: Dec'23 Highlights

AI NEWS YOU MISSED ?#53

What is Microsoft & Nvidia's Megatron-Turing?

NVLM: Unpacking Nvidia's Bold Move in the Open Source AI Race

(How-to) Smaller, Faster, Cheaper. The Rise of Mixture of Experts & LLAMA2 on Microsoft Azure

Go Small to Go Big: Small Wins Prove AI Isn’t a Bubble

How DeepSeek hits Nvidia

DeepSeek: China’s Game-Changing AI and the Future of Global AI Competition

From Telephones to AI Agents—Alexander Graham Bell Would be Proud