PART 2: Why all companies should begin adopting Advanced Computing Architectures?

PART 2: Why all companies should begin adopting Advanced Computing Architectures?

This is Part 2 of a series on Advanced Computing Architectures on Advanced Computing Architectures. See also: PART 1: What is an Advanced Computing Architecture?

Part 1 presented an overview of Advanced Computing Architectures – ie. a mix of specialized hardware (GPUs, FPGAs, ASICs, Domain Specific Architectures) and specialized software for solving complex computational problems.

In this part we'll dive deeper into the on-going transition that makes Advanced Computing Architectures ubiquitous for every company. Furthermore we'll see how this transition will be further accelerated by the rise and adoption of Generative AI.

1. The rise of complex computational problems

A key reason for why a company might need Advanced Computing in the first place is that it has a “complex computational problem” to solve.?

For simplicity, let us assume then that computational problems fall into one of two categories:

(i) complex ones, requiring Advanced Computing; and

(ii) non-complex (regular) ones that can be addressed with regular computing tools

Until recently, “complex problems” were rather niche: modelling of aerodynamic flows for airplanes; weather prediction; drug discovery; new materials discovery etc. Problems of interest only to researchers in HPC labs. Labs operated only by those could afford them: universities, governments, and some companies (oil & gas, aerospace, pharma).

This is rapidly changing, as many regular computational problems are becoming more complex, due to several factors:

  • companies and consumers generate increasingly more data
  • new tech developments (latest AI models; new optimization tools etc.) unlock new capabilities which companies want to incorporate in their tech stack
  • problems are becoming multi-faceted, ie. a comprehensive application for a given problem can require a mix of tools for combinatorial optimization, Monte Carlo simulations, AI and more

Example: To illustrate this change, let’s think of an airline reservation system. 20 years ago it just needed to handle and manage passenger reservations. Today, an airline reservation system handles much more data and needs robust ML analytics for dynamic pricing or to track potential frauds. Soon, it will also need an AI agent to handle all client conversations and to manage customer service automatically. It should also include detailed simulations of future demand scenarios, manage schedules after disruptive events and better yet predict such events (ie. will a hurricane hit the East Coast or not?).

If 20 years ago, a solution merely managing reservations was sufficient, building today's system are is a more elaborate endeavour. Building such complex systems on the wrong architecture can seriously impact a business' margins.

What this ultimately means is that we’ll see more headlines about companies re-thinking their computing infrastructure, with some even moving away from the cloud. A recent example includes X/Twitter which claimed a 60% reduction in costs after building its own on-prem computing architecture.

2.??? Generative AI is a catalyst for Advanced Computing Architectures

GenAI - Large Language Models in particular - is expensive. Training them from scratch is expensive, so is fine-tuning them and so is inference. Those costs will drive companies to explore ways in which they build efficient infrastructures for deploying AI.

Let’s look at some numbers:

  • Training a model: Hugging Face reported costs of approx. $10m to train a new version of their Bloom model. Meta’s LLaMa was trained for?21 days on 2,048 Nvidia A100 GPUs with 80 GB of RAM each. At current Google prices ($3.93 x 1 hour of A100) this equals to: 2048 GPUs x $3.93 GPU per hour x 24 hours x 21 days = 4.05 million dollars. While Bloomberg’s Bloomberg GPT cost approx. $2.7m to train.
  • Deploying models (aka. inference) isn’t cheap either: Permutable AI calculated that it costs approx. $1m/year to process 2m articles per day using OpenAI’s models.

Costs are not the only reason for why companies will look at building custom AI architectures. Companies might also want to...

  • leverage their own proprietary data
  • deploy a specialized use cases: some use cases might require low latency, highly technical knowledge or edge deployment that goes beyond the capabilities of out-of-the-box models
  • preserve privacy by keeping their AI on-prem

So what can an Advanced Computing approach offer in the deployment of AI?

Change the underlying hardware infrastructure by leveraging one of various options:

#1 Specialized cloud providers, with a service dedicated for AI specifically – offering higher performance & lower costs for training, fine-tuning or deploying models

#2 GPUs: whether on the cloud, on prem or edge companies can built their stack directly on GPUs

#3 Domain Specific Chips for AI: leverage chips that have been built for AI specifically – whether just for inference or just for training/fine-tuning or both – custom AI chips can offer more performance and lower prices for deploying models

#4 Low latency chips: for use cases requiring high-speed performance (say generating 200+ tokens/second) companies should include AI chips that offer low latency

?

Invest in the right software stack:?

#1 Open-source models: with new open-source models coming almost every week, companies have a large pool to choose from. Open-source models can help reduce costs and offer more customization options on things like bias, prompt engineering, specialized knowledge or also privacy.

#2 Fine-tuned models: for industry-specific use cases, companies will want to leverage fine-tuned models that can be trained on proprietary data and be tailored specifically to their clients’ needs.

#3 Multi-model approach: like humans, some LLMs are good at specific tasks. Instead of one large model, a more efficient approach might be leveraging various, smaller models, customized for specific tasks. A routing solution can divide workloads between those various models.

#4 High-speed programming languages: building your stack using a high-speed programming language like Mojo or Rust can help improve the performance of your application.

?

All in all, an Advanced Computing Architecture for LLMs means:

#1 lower costs for heavy AI usage

#2 higher performance for complex AI problems

#3 more accuracy for specialized domain deployments (healthcare, finance)

#4 privacy and control

?

Is such an approach warranted for every LLM deployment? Certainly not. Most of them will be just fine being served with API calls to commercial model providers.

Advanced Computing Architectures for AI will primarily make sense in 2 scenarios:

?

#1 Heavy AI usage:

  • large-scale production deployments of complex models
  • heavy AI model usage (requiring many API calls)
  • continuous deployments (ie. a model that runs 24/7 and performs a lot of tasks)

?

#2 Specialized use cases:

  • need for high-speed inference
  • specialized domain requiring specific knowledge
  • multimodal deployments (mixing text, image, audio, reasoning)
  • requirement for on-prem/edge deployments


Thoughts and opinions my own.


要查看或添加评论,请登录

社区洞察

其他会员也浏览了