Vertical Hyper-“Scaling” into AI Dominance
Imaged by Bing Image Creator powered by DALL-E

Vertical Hyper-“Scaling” into AI Dominance

Anyone reading the news on Artificial Intelligence over the last few months should be noticing a pattern among large hyperscaler providers — IBM, Microsoft, Amazon, Google, (and Facebook):

Pattern 1 — Large Language Models: these players are both (1) building proprietary large language models, either via partnership or research & development (2) open sourcing and/or incorporating open-source models directly on their Platform as a Service.

AND

Pattern 2 - Chip Design: Vertically integrating and tackling one of the most costly and scarce resources in the space to-date: chips

All this to preserve lock-in and continued dominance derived from economies of scale within the tech sector by reducing the high costs associated with training AI models and running continued inference on them as API or SDKs tapping into those models get invoked by a plethora of solutions relying on them for value-driven features.

Certainly Nvidia is dominating the market with its AI optimized A100 GPUs (NVIDIA A100 | NVIDIA) and as silicon resources as well as fab plants become scarce, large organizations are taking matters into their own hands by developing proprietary chipsets.

Lets explore the value chain with Artificial Intelligence and why vertical integration is necessary

AI Services & Models

Stating the obvious: AI services are commoditized, both in terms of Conversational Intelligence-era services — speech-to-text, text-to-speech, natural language understanding, etc AND eventually Generative Intelligence-era capabilities such as text-to-image, summarization, and more. I delved into this strong assertation in a previous blog post (mentioned below).

In summary:

  • These systems compete on the merit of accuracy and performance, constantly pushing the boundaries and forcing the recalculation of that benchmarks are considered “table-stakes.” Furthermore, there are open source packages widely available to use that perform similar characteristics.
  • Availability of these services are mere table-stakes for Platforms as a Service (more later) offering AI capabilities, specifically natural language or conversational in nature.

Under the economic model of commodities, customers are price sensitive and are willing to quickly move to an equivalent substitute. While the former part of that statement is true, the latter is much harder due to (1) company-wide preferences and/or partnerships and their associated cultural propagation and use within an organization (2) data egress fees for removing data and (3) overall platform lock-in from proprietary managed versions of specific services.

Furthermore, hyperscalers are tackling the Open Source market for LLMs, including broad announcements to host LLaMA 2 on Azure or Google cloud? Why would these companies host competitor models, whether proprietary or not, when the hosting company offers one of their own? Simple — two primary reasons (1) cost amortization — (more later in the piece) and (2) switching cost — offering a popular model adds additional value to companies already on the platform and entices others. Why move when there is already a contract in place and/or many workloads already operate on that cloud, its hard to switch.

Which leads us to…

Platform(s) as a Service (“PaaS”)

Applications working with Natural Language Processing capabilities or Artificial Intelligence more broadly requires a tremendous amount of industry and proprietary data to fine tune, refine, and iterate on models deployed within applications/solutions. Within the AI realm, there exists an entire ecosystems of tools essential for Data Scientists, Conversational Designers, Engineers, and the like, these services range from:

  • Machine Learning Operations (MLOps) — MLOps, short for Machine Learning Operations, is an engineering discipline that aims to unify machine learning systems development and deployment, applying to entire lifecycle — from integrating with model generation, orchestration, health diagnostics, governance, and business metrics.
  • Data Management — from databases (Vector, SQL, noSQL, Graph, etc), notebooks, data lakes and warehouses, and the like, ranging from programmatic to visual, these services simplify the data discovery, cleansing, training, and optimizing for model creation and utilization.
  • Security, Networking, — providing Distributed Denial of Service (DDoS) attack management, endpoint protection, code scanning / linting, health probes, and mroe
  • Application Management — from the operating systems, container images, asynchronous functions, object storage, and more

These platforms offer one-stop-shop destinations for the entire design, development, deployment, and ongoing monitoring of applications. As I argued in “The Commoditization of AI Systems”

To differentiate here, the source of competitive advantage for this segment is derived from the user experience and tooling as part of vertical and horizontal integration within the space, not from the engines themselves.

The ability to easily remove friction, shorten time-to-market and/or “hello world” and drive automation, specifically for complex capabilities drive competitive advantage for hyperscaler clouds. This is why tooling has been an integral focus native within these offerings. One evidence of such is IBM Watsonx which seeks to integrate many of the aforementioned capabilities, targeted for speech scientists, into a single platform experience.

All of these platforms and services rely on underlying hardware, the Infrastructure layer.

Infrastructure/Servers

Hyperscalers came into existence based on initial capital investment in servers. This acquisition of server capacity happened over time as large technology companies required more to develop software and handle daily internal course of business to service customers. While virtualization, as a concept, had been around since the 1970s, it was not until Amazon launched AWS in 2002 that the concept of Cloud Computing boomed and started catalyzing the market for hyperscalers. Soon after, large technology organization whom had historically built this capacity soon realize the value of renting out unused servers, specifically at an as-needed basis.

Smaller organizations who therefore could not submit enough financial capital to acquire a server and/or did not require a server for a long period of time could simply rent one and pay for the usage as needed. This was a novel idea at the time. Furthermore, as software functionality helped to automate scalability (dynamically increasing and decreasing capacity based on demand), failover and redundancy (ensuring that if a datacenter goes down and there is an outage, that there is a backup ready), and even down to selecting the specific hardware required to get a particular job done.

Referring back to AI services and why hyperscalers would host non-proprietary models on their platforms, the first reason this was done was amortization. Servers are a fixed cost asset, have a depreciation / life expectancy, as well as get amortized as cost on a set of offerings, all of this meaning that companies can spread cost over time and ensure full utilization of a hardware investment, not losing money for idol time. Therefore, is compute power can be used to run inference jobs on open source models, why wouldn't companies offer those models on their platform, especially when they can add a managed service on top to make additional money for service reliability and routine upgrades.

This leads to the final question — how can hyperscalers whom have all of this hardware already continue to drive competitive advantage? The answer is driving cost advantage, and that is done through chips.

Chips

GPUs are specialized hardware devices that can perform parallel computations faster than general-purpose CPUs. They were originally designed for graphics rendering in video games and other visual applications, but they have been increasingly used for AI training and inference since the late 1990s and early 2000s . GPUs can accelerate the operations and calculations that are essential for deep learning algorithms that underpin modern Artificial Intelligence systems.

Some notable characteristics that differentiate AI-optimized chipsets versus traditional GPUs include:

  • Smaller Transistor Size: AI chips incorporate a massive number of smaller transistors which run faster and consume less energy than larger ones
  • Compute: AI chips execute a large number of calculations in parallel rather than sequentially, as in CPUs. Additionally, they calculate numbers with low precision in a way that successfully implements AI algorithms but reduces the number of transistors needed for the same calculation
  • Memory Access: AI chips speed up memory access by, for example, storing an entire AI algorithm in a single AI chip; and using programming languages built specifically to efficiently translate AI computer code for execution on an AI chip.

An AI chip a thousand times as efficient as a CPU provides an improvement equivalent to 26 years of Moore’s Law-driven CPU improvements. — AI Chips: What They Are and Why They Matter — Center for Security and Emerging Technology (georgetown.edu)

Specifically, AI optimized chips are essential for reducing costs for training massive models such as those demonstrated by Large Language Models (“LLMs”).

Typically, a few chip fabricators existed: Nvidia, AMD, and Intel. Large hyperscalers such as Microsoft, IBM, Amazon, Google, and more were dependent on the technological advancement of these chip manufacturers and competed on time-to-market and cost in the “race” for generative AI capabilities. Furthermore, there is an inherent cost to scale server capacity which all requires chip-sets to function.

In this world where Artificial Intelligence is driving the largest compute resources and putting the most pressure on hyperscaler datacenters, cost is the largest obstacle to overcome. Currently, these datacenters are limited by the current available chip sets and pass those costs onto the customer. The additional cost burden on the customer for additional training and subsequent inferences against a model make it cost prohibitive to enter and/or thrive in the market long term.

In reality, the company that innovates in the chip space and offers specialized chips optimized for AI will become more dominant in the hyperscaler wars, unless tooling and removal of friction become more dominant than cost as compute from general players line Nvidia. Dominant positions will first be realized be new entrants into the AI ecosystem followed by those whom the switching cost can be written off. Right now, Google and TensorFlow are leading the race and if AI is booming the way 2023 has shaped the AI field to be, that will be a major factor in helping compete with Azure and AWS.

This is an exciting space and intriguing thought exercise to undertake thinking about vertical integration. Stay tuned for more developments in AI!

要查看或添加评论,请登录

Sam Bobo的更多文章

社区洞察

其他会员也浏览了