登录查看更多内容

AI: Essential Considerations for Hosting Your Own Models

Nick Hume

Global Digital Infrastructure Executive | Sustainable AI & Liquid Cooling Authority | Podcast Host | ex-AMZN | ex-MSFT

发布日期: 2023年8月4日

Artificial Intelligence (AI) has become a pervasive buzzword in the industry, evoking various reactions including an amusing supercut of Sunar Pichai at Google's 2023 I/O...

AI demands high-powered compute, which necessitates effective cooling, with immersion cooling being a popular choice. However, this article will focus on other aspects of AI.

Given the immense hype surrounding AI, it's essential to clarify that AI covers a broad spectrum. It is sometimes used interchangeably with Machine Learning and Visual Computing, which are their own distinct areas. Machine Learning encompasses applications like Amazon Alexa, self-driving cars, and email spam filtering, while Visual Computing includes Metaverse applications and Digital Twins like NVIDIA's Earth 2.

When people mention AI, they are usually referring to Large Language Models (LLMs), with ChatGPT and LLaMA being two prominent examples. While comparing them, I won't delve into the pros and cons or open/closed sourced distinctions but will use them as examples.

At a high level, people often seek answers to questions like:

How do I host my own PrivateGPT
What's the different between training and inferencing?
Do I need to upgrade my network?
Can I use my existing CPU's?

Let's delve into each of these areas.

How do I host my own PrivateGPT?

Many individuals are interested in creating their own ChatGPT with private access and sovereignty. Let's take the example of the largest GPT3.5 model, gpt-3.5-turbo, which is trained on approximately 17TB of text data and contains around 175 Billion parameters. However, running such a model can be expensive, as estimated by Tom Goldstein in December 2022.

Although optimization of models, and faster hardware advancements (H100) have occurred since then, it's essential to consider the costs of running large models. Additionally, newer models like GPT4, with 10x more parameters (1.7 Trillion) and data, are continuously emerging.

For those building their AI models, using pre-trained, open-source models like Meta's LLaMA2 could be a cost-effective option. Although not an exact comparison, using Tom's estimation logic, a model like LLaMA2 with 40% of ChatGPT's size and 1% of the daily queries (e.g., 100k) might cost around $400 per day on Azure, assuming the same $3 per hour rate from December 2022.

What's the difference between training and inferencing?

AI comprises two main aspects: training and inferencing.

Training involves using diverse data to create a model, while inferencing entails using the model to generate predictions or responses.

领英推荐

The Future of AI: Will Machines Learn like People Do?

Michael Spencer 2 年前

Are AI’s Reasoning Skills More Capable Than Once…

ARK Investment Management LLC 10 个月前

Gen AI for Business #2

Eugina Jordan 11 个月前

ChatGPT (gpt3.5) is known to have used 10,000 GPUS (A100) and several months to train. For small companies, using off-the-shelf, pre-trained models and running inferencing is a more cost-effective option. However, medium and large companies may want to train their models and can consider renting compute resources from hyperscalers or GPU cloud companies.

Training typically requires significant GPU memory, while inferencing can be accomplished on more modest hardware, including CPUs. Larger and more real-time inferencing use cases might require higher-end, or specialized hardware like GPU's.

Do I need to upgrade my network?

NVIDIA promotes InfiniBand for networking, especially with A100 and H100 chips that work optimally with this technology. While InfiniBand creates a networking fabric to pool chips using NVSwitch, it also has some downsides, such as being proprietary and expensive.

When evaluating networking options, consider your use case and the way GPUs work. For example, NVIDIA GPUs can be bought with PCIe or NVLink connectivity, and NVSwitch allows for interconnecting nodes.

To keep things simple, networking choices can range from;

10-100G for 1-16 GPUs using PCIe
100G/200G+ for 16-60 GPUs using NVLink
200/400G+ for 100+ GPUs requiring the fastest speed and lowest latency.

Note: Be sure to consider availability of networking gear before deciding a path for upgrading or augmenting your existing network as the networking supply chain is still very lumpy for high demand components.

While InfiniBand has its keen followers and vocal opponents, there are ongoing efforts by companies like Arista, Cisco, Broadcom, AMD, and Intel to compete with InfiniBand through the Ultra Ethernet Consortium (UEC).

Expect products based on UEC standards to be available in 2024.

Can I use my existing CPU's?

While AI infrastructure primarily focuses on specialized hardware like GPUs, FPGAs, and ASICs, don't overlook the role of general-purpose CPUs. Existing CPUs can be utilized to support AI infrastructure, performing tasks such as dataset aggregation, querying, hosting value-added AI services, or running some inferencing.

Although CPUs are less efficient than GPUs, FPGAs, and ASICs in terms of performance per watt, they remain important components in an AI setup.

Summing it up, hosting your own AI models involves careful consideration of factors such as cost, hardware capabilities, networking, and the role of CPUs. New, and updated models are released frequently, as are efforts to optimize for less costly, more available hardware. Qualcomm have announced they are working with Meta to optimize LLaMA2 to enable on-device AI which will flow through to Android phones next year, and MLC-LLM can run natively on your iPhone today.

As AI technology progresses, staying informed and optimizing your approach will be crucial for a successful implementation.

Infrastructure as a Newsletter

1,591 位关注者

Paul Edmondson

EMEA VP & GM at GRC: The Immersion Cooling Authority

1 年

Andy Ramgobin

1 次回应

要查看或添加评论，请登录

Nick Hume的更多文章

Behind the Curtain: AWS re:Invent 2024 Highlights

2024年12月11日

Behind the Curtain: AWS re:Invent 2024 Highlights

Expanding on my post from last week, it was great to see AWS leaning back into their engineering roots at re:Invent…

3 条评论
OCP Global Summit 2024 Series

2024年11月13日

OCP Global Summit 2024 Series

For the final piece of the Global Summit wrap up, I focus on Networking, both inside the server and between racks, and…
OCP Global Summit 2024 Series

2024年11月8日

OCP Global Summit 2024 Series

We've touched on the power innovations at the summit, so obviously, the next logical step is to talk about cooling…

2 条评论
OCP Global Summit 2024 Series

2024年11月7日

OCP Global Summit 2024 Series

Originally planned as a two-part reflection, my series from the fantastic OCP Summit has grown into a series! Up next:…

2 条评论
OCP Global Summit 2024 Series

2024年11月5日

OCP Global Summit 2024 Series

It’s been a busy conference season, with the AI Hardware and Edge AI Summit, Yotta 2024, and OCP’s Global Summit all…

3 条评论
AI for real life

2024年10月5日

AI for real life

As I’ve been busy with my day job(s) and various projects, like the Tech Insider Podcast, I haven’t put my hands to the…

1 条评论
To InfiniBand, maybe beyond?

2024年7月18日

To InfiniBand, maybe beyond?

Nvidia's latest roadmap was teased at Computex in Taiwan last month. Whilst details were a little light on PFLOPS and…
Apple, not Artificial, Intelligence

2024年7月1日

Apple, not Artificial, Intelligence

Just last month, Apple hosted their yearly WWDC - an event where they showcase all the updates to their platforms…
Oh great, another podcast...

2024年6月13日

Oh great, another podcast...

As you may have seen (or heard my "Ausmerican" accent) recently, I've started a podcast, and wanted to share a little…

2 条评论
OCP 2024 Regional Summit wrap

2024年5月22日

OCP 2024 Regional Summit wrap

The Open Compute Project (OCP) Regional Summit was hosted in Lisbon, Portugal last month, the 5th (and largest)…

See all articles

AI: Essential Considerations for Hosting Your Own Models

Nick Hume

Global Digital Infrastructure Executive | Sustainable AI & Liquid Cooling Authority | Podcast Host | ex-AMZN | ex-MSFT

领英推荐

Infrastructure as a Newsletter

1,591 位关注者

Nick Hume的更多文章

社区洞察

其他会员也浏览了

AI NEWS YOU MISSED ?#56

Latest AI, Crypto News Headlines for October 23, 2023

AI: The Ultimate If/Then Computing Revolution

GPThibault Pulse” vol. 4 - your weekly fix of Prompt Engineering, insider tips and news on Generative AI, and Life Sciences

Everyone will have an Independent, Unique and Personalized AI

AI/ML news summary: week 35

DeepSeek's AI Revolution: Why It’s a Game-Changer

AI Isn’t New, It’s Just Finally Ready to Change Everything

Weekend Warp

Newsletter #37: Revolutionizing AI and Data: From DeepSeek’s Affordable AI to Apache Spark’s Serverless Innovation and Meta’s Privacy Dilemma

领英推荐

Infrastructure as a Newsletter

1,591 位关注者

Nick Hume的更多文章

Behind the Curtain: AWS re:Invent 2024 Highlights

OCP Global Summit 2024 Series

OCP Global Summit 2024 Series

OCP Global Summit 2024 Series

OCP Global Summit 2024 Series

AI for real life

To InfiniBand, maybe beyond?

Apple, not Artificial, Intelligence

Oh great, another podcast...

OCP 2024 Regional Summit wrap

社区洞察

其他会员也浏览了

AI NEWS YOU MISSED ?#56

Latest AI, Crypto News Headlines for October 23, 2023

AI: The Ultimate If/Then Computing Revolution

GPThibault Pulse” vol. 4 - your weekly fix of Prompt Engineering, insider tips and news on Generative AI, and Life Sciences

Everyone will have an Independent, Unique and Personalized AI

AI/ML news summary: week 35

DeepSeek's AI Revolution: Why It’s a Game-Changer

AI Isn’t New, It’s Just Finally Ready to Change Everything

Weekend Warp

Newsletter #37: Revolutionizing AI and Data: From DeepSeek’s Affordable AI to Apache Spark’s Serverless Innovation and Meta’s Privacy Dilemma