Vertical Hyper-“Scaling” into AI Dominance
Anyone reading the news on Artificial Intelligence over the last few months should be noticing a pattern among large hyperscaler providers — IBM, Microsoft, Amazon, Google, (and Facebook):
Pattern 1 — Large Language Models: these players are both (1) building proprietary large language models, either via partnership or research & development (2) open sourcing and/or incorporating open-source models directly on their Platform as a Service.
AND
Pattern 2 - Chip Design: Vertically integrating and tackling one of the most costly and scarce resources in the space to-date: chips
All this to preserve lock-in and continued dominance derived from economies of scale within the tech sector by reducing the high costs associated with training AI models and running continued inference on them as API or SDKs tapping into those models get invoked by a plethora of solutions relying on them for value-driven features.
Certainly Nvidia is dominating the market with its AI optimized A100 GPUs (NVIDIA A100 | NVIDIA) and as silicon resources as well as fab plants become scarce, large organizations are taking matters into their own hands by developing proprietary chipsets.
Lets explore the value chain with Artificial Intelligence and why vertical integration is necessary
AI Services & Models
Stating the obvious: AI services are commoditized, both in terms of Conversational Intelligence-era services — speech-to-text, text-to-speech, natural language understanding, etc AND eventually Generative Intelligence-era capabilities such as text-to-image, summarization, and more. I delved into this strong assertation in a previous blog post (mentioned below).
In summary:
Under the economic model of commodities, customers are price sensitive and are willing to quickly move to an equivalent substitute. While the former part of that statement is true, the latter is much harder due to (1) company-wide preferences and/or partnerships and their associated cultural propagation and use within an organization (2) data egress fees for removing data and (3) overall platform lock-in from proprietary managed versions of specific services.
Furthermore, hyperscalers are tackling the Open Source market for LLMs, including broad announcements to host LLaMA 2 on Azure or Google cloud? Why would these companies host competitor models, whether proprietary or not, when the hosting company offers one of their own? Simple — two primary reasons (1) cost amortization — (more later in the piece) and (2) switching cost — offering a popular model adds additional value to companies already on the platform and entices others. Why move when there is already a contract in place and/or many workloads already operate on that cloud, its hard to switch.
Which leads us to…
Platform(s) as a Service (“PaaS”)
Applications working with Natural Language Processing capabilities or Artificial Intelligence more broadly requires a tremendous amount of industry and proprietary data to fine tune, refine, and iterate on models deployed within applications/solutions. Within the AI realm, there exists an entire ecosystems of tools essential for Data Scientists, Conversational Designers, Engineers, and the like, these services range from:
领英推荐
These platforms offer one-stop-shop destinations for the entire design, development, deployment, and ongoing monitoring of applications. As I argued in “The Commoditization of AI Systems”
To differentiate here, the source of competitive advantage for this segment is derived from the user experience and tooling as part of vertical and horizontal integration within the space, not from the engines themselves.
The ability to easily remove friction, shorten time-to-market and/or “hello world” and drive automation, specifically for complex capabilities drive competitive advantage for hyperscaler clouds. This is why tooling has been an integral focus native within these offerings. One evidence of such is IBM Watsonx which seeks to integrate many of the aforementioned capabilities, targeted for speech scientists, into a single platform experience.
All of these platforms and services rely on underlying hardware, the Infrastructure layer.
Infrastructure/Servers
Hyperscalers came into existence based on initial capital investment in servers. This acquisition of server capacity happened over time as large technology companies required more to develop software and handle daily internal course of business to service customers. While virtualization, as a concept, had been around since the 1970s, it was not until Amazon launched AWS in 2002 that the concept of Cloud Computing boomed and started catalyzing the market for hyperscalers. Soon after, large technology organization whom had historically built this capacity soon realize the value of renting out unused servers, specifically at an as-needed basis.
Smaller organizations who therefore could not submit enough financial capital to acquire a server and/or did not require a server for a long period of time could simply rent one and pay for the usage as needed. This was a novel idea at the time. Furthermore, as software functionality helped to automate scalability (dynamically increasing and decreasing capacity based on demand), failover and redundancy (ensuring that if a datacenter goes down and there is an outage, that there is a backup ready), and even down to selecting the specific hardware required to get a particular job done.
Referring back to AI services and why hyperscalers would host non-proprietary models on their platforms, the first reason this was done was amortization. Servers are a fixed cost asset, have a depreciation / life expectancy, as well as get amortized as cost on a set of offerings, all of this meaning that companies can spread cost over time and ensure full utilization of a hardware investment, not losing money for idol time. Therefore, is compute power can be used to run inference jobs on open source models, why wouldn't companies offer those models on their platform, especially when they can add a managed service on top to make additional money for service reliability and routine upgrades.
This leads to the final question — how can hyperscalers whom have all of this hardware already continue to drive competitive advantage? The answer is driving cost advantage, and that is done through chips.
Chips
GPUs are specialized hardware devices that can perform parallel computations faster than general-purpose CPUs. They were originally designed for graphics rendering in video games and other visual applications, but they have been increasingly used for AI training and inference since the late 1990s and early 2000s . GPUs can accelerate the operations and calculations that are essential for deep learning algorithms that underpin modern Artificial Intelligence systems.
Some notable characteristics that differentiate AI-optimized chipsets versus traditional GPUs include:
An AI chip a thousand times as efficient as a CPU provides an improvement equivalent to 26 years of Moore’s Law-driven CPU improvements. — AI Chips: What They Are and Why They Matter — Center for Security and Emerging Technology (georgetown.edu)
Specifically, AI optimized chips are essential for reducing costs for training massive models such as those demonstrated by Large Language Models (“LLMs”).
Typically, a few chip fabricators existed: Nvidia, AMD, and Intel. Large hyperscalers such as Microsoft, IBM, Amazon, Google, and more were dependent on the technological advancement of these chip manufacturers and competed on time-to-market and cost in the “race” for generative AI capabilities. Furthermore, there is an inherent cost to scale server capacity which all requires chip-sets to function.
In this world where Artificial Intelligence is driving the largest compute resources and putting the most pressure on hyperscaler datacenters, cost is the largest obstacle to overcome. Currently, these datacenters are limited by the current available chip sets and pass those costs onto the customer. The additional cost burden on the customer for additional training and subsequent inferences against a model make it cost prohibitive to enter and/or thrive in the market long term.
In reality, the company that innovates in the chip space and offers specialized chips optimized for AI will become more dominant in the hyperscaler wars, unless tooling and removal of friction become more dominant than cost as compute from general players line Nvidia. Dominant positions will first be realized be new entrants into the AI ecosystem followed by those whom the switching cost can be written off. Right now, Google and TensorFlow are leading the race and if AI is booming the way 2023 has shaped the AI field to be, that will be a major factor in helping compete with Azure and AWS.
This is an exciting space and intriguing thought exercise to undertake thinking about vertical integration. Stay tuned for more developments in AI!