Scale to Zero for AI Workloads

AI workloads are expensive regardless of how they are implemented. Third-party APIs with significant markups. Cloud-prem SaaS companies with hefty management fees. And the total cost of ownership for either renting GPUs from AWS or buying directly.

In the short term, there’s a significant arbitrage with a great DevOps team — namely, how do you scale expensive workloads to zero when they aren’t in use? Or right-size them accordingly as the load increases or decreases? Doing this can flip unprofitable unit economics (or provide more efficiency to the money a startup has raised).

An obvious objection: we already have serverless environments that scale to zero — Google Cloud Run, AWS Lambda, WebAssembly runtimes, and more. The problem is that these runtimes are explicitly tuned for generic workloads and aren’t made for specialized hardware (read: GPUs).

There’s two elements to “scale to zero” for GPU-bound workloads.

First, the actual machines. On AWS, this would be autoscaling groups (ASGs). As CPU load increases (or another metric you’re measuring), this will scale up instances (virtual machines). But ASGs on their own are rarely sufficient to scale to zero. You also are probably bin-packing multiple workloads on expensive GPU-powered machines. Maybe running different models at different times or rolling out new versions of models. For this, you probably want to deploy with a different primitive than raw machine images, something like a container. And for that, you need Kubernetes.

The second scale-to-zero mechanism is scaling the actual workload (the pods, deployments, etc.). There’s not really a great way to do this today. Most organizations have built their own hacks. Knative provides the machinery but can be challenging to deploy and manage and comes with its own heavyweight dependencies (like Istio). The high-level workflow is this: queue up the requests and launch a new deployment if an endpoint is unavailable.

Scale-to-zero will probably be necessary for the near term as organizations either need to deploy (1) on-prem models for data security or (2) custom models or infrastructure to serve a particular use case.


Originally posted on https://matt-rickard.com/scale-to-zero-for-ai-workloads

要查看或添加评论,请登录

Matt Rickard的更多文章

  • Lessons from llama.cpp

    Lessons from llama.cpp

    Llama.cpp is an implementation of Meta’s LLaMA architecture in C/C++.

  • To be, or not to be; ay, there’s the point.

    To be, or not to be; ay, there’s the point.

    It doesn’t have the same ring to it as the Hamlet that we know, but this is from the first published version of Hamlet…

  • AI Agents Today

    AI Agents Today

    The term AI agent is used loosely. It can mean almost anything.

  • Norvig's Agent Definition

    Norvig's Agent Definition

    There’s no consensus on what an AI agent means today. The term is used to describe everything from chatbots to for…

    1 条评论
  • The Lucretius Problem

    The Lucretius Problem

    Just as any river is enormous to someone who looks at it and who, before that time, has not seen one greater. So, too…

    1 条评论
  • Eroom's Law

    Eroom's Law

    Despite advances in technology and increased spending, the number of new drugs approved per billion dollars spent on…

    1 条评论
  • Copilot is an Incumbent Business Model

    Copilot is an Incumbent Business Model

    The Copilot business model has been the prevailing enterprise strategy of AI. An assistant that helps you write the…

    1 条评论
  • What if Google Wasn’t The Default?

    What if Google Wasn’t The Default?

    Google has paid Apple to be the default search on their operating systems since 2002. But recent antitrust cases…

  • The Cost of Index Everything

    The Cost of Index Everything

    Many AI products today are focused on indexing as much as possible. Every meeting, every document, every moment of your…

  • Strategies for the GPU-Poor

    Strategies for the GPU-Poor

    GPUs are hard to come by, often fetching significant premiums in their aftermarket prices (if you can find them). Cloud…

社区洞察

其他会员也浏览了