登录查看更多内容

GPT Compute Power

Raul Palacios

发布日期: 2023年4月11日

We've been under an avalanche of ChatGPT and AI related posts, comments, news ... AI-everywhere.

So just to take all the hype to the brutal tangible world, just took the braveness to summarize an estimate of what is actually behind all this impresive Large Language Models (LLMs)

I haven't found concluding data on the actual hardware stack requirements and usage for ChatGPT, but we will infer some reference data in the following lines:

It's commonly used the amount of paraments the model can handle as the benchmark of strength and power for current GPT models

Here is the complete list of GPT versions and their parameters:

GPT-1: 117 million?parameters
GPT-2: 1.5 billion?parameters
GPT-3: 175 billion?parameters
GPT-4: 170 trillion?parameters

Now we do have a precise information about the hardware stack for a specific Finance LLM, built by Bloomberg: In the arxiv publication in March, there is the description of a model used for Finance purpose. In this publication they present BloombergGPT, a 50 billion parameter language model that is trained on a wide range of nancial data. Using Amazon SageMaker service provided by AWS to train and evaluate BloombergGPT, with the the latest version available at the time of training and train on a total of 64 p4d.24xlarge instances. Each p4d.24xlarge instance has 8 NVIDIA 40GB A100 GPUs with NVIDIA NVSwitch intra-node connections (600 GB/s) and NVIDIA GPUDirect using AWS Elastic Fabric Adapter (EFA) inter-node connections (400 Gb/s). This yields a total of 512 40GB A100 GPUs. For quick data access, we use Amazon FSX for Lustre, which supports up to 1000 MB/s read and write throughput per TiB storage unit.

领英推荐

What Technology Infrastructure Do You Need For…

Bernard Marr 4 年前

40 Presentations Not to Miss, on LLM/RAG and Gen AI…

Vincent Granville 6 个月前

Latest AI, Crypto Trends, Insights and News Headlines…

Lewis E. Farrell 5 个月前

As per AWS specs for p4d.24xlarge instances:

So, for this 50 billion parameter BloombergGPT, this LLM used 64 instances with over 1 terabytes each (plus GPU own memory)

This is also inline with the vram estimates published at https://neuroflash.com/blog/gpt-4-parameters-rumors-and-forecasts/

"GPT-3 currently requires 700 gigabytes of V-RAM, and if GPT-4 has a thousand times the number of parameters, it would require 700 terabytes of V-RAM"

Long story short: again, you'll need a lot of computing resources to achieve a reasonable large language model up and running and basically there is no other way to achieve these innovation but using Cloud Computing services

pd: all images are credited to Dall-E

要查看或添加评论，请登录

Raul Palacios的更多文章

Digital Identity in Changing Business and Social Landscape

2025年3月20日

Digital Identity in Changing Business and Social Landscape

La Identidad Digital en Entornos Cambiantes de Negocios y Sociedad En un mundo cada vez más digitalizado, la identidad…

1 条评论
Optimizing Business-Enabling Technology: A Framework for Balancing Risk, Deliverables, and Cost in a Societal Context

2025年3月17日

Optimizing Business-Enabling Technology: A Framework for Balancing Risk, Deliverables, and Cost in a Societal Context

Desde hace un tiempo que he venido a caer a una conclusión que hasta este punto se ve bastante obvia y es que el…

1 条评论
Google Willow: Hype vs Realidad

2024年12月20日

Google Willow: Hype vs Realidad

1. Introducción: Un Tsunami Cuántico Hace unos días, Google hizo un anuncio sobre su chip cuántico Willow que ha…
2025: A?o Internacional Cuántica y su importancia para Chile y Latinoamérica

2024年11月21日

2025: A?o Internacional Cuántica y su importancia para Chile y Latinoamérica

El A?o Internacional de la Ciencia y la Tecnología Cuántica 2025 marca un hito significativo en la historia de la…
Cloud Smart y las fuerzas que modelan su adopción

2024年10月24日

Cloud Smart y las fuerzas que modelan su adopción

Continuamente, la nube ha sido promocionada por su escalabilidad y flexibilidad. Hoy muchas empresas están reevaluando…

2 条评论
Curvatura Hiperbólica del espacio-tiempo

2023年8月14日

Curvatura Hiperbólica del espacio-tiempo

[English below] Uno de los principales desafíos que tenemos los humanos es nuestra dificultad para entender el mundo de…
Computación Cuántica y el impacto en la industria financiera

2022年9月7日

Computación Cuántica y el impacto en la industria financiera

Cómo las tecnologías de #computación #cuántica presentan una amenaza para la seguridad del sistema bancario es el tema…

1 条评论
Encryption, Internet Security & Quantum Cryptography

2022年2月22日

Encryption, Internet Security & Quantum Cryptography

Story back to ultra-secure communications can be traced back to a couple of thousand years. In recent history, the…

1 条评论
Futuristic? Quantum Computing is already here

2021年7月13日

Futuristic? Quantum Computing is already here

Just like many hyped topics in media, sometimes is not easy to recognize the real thing. This is particularly true for…

1 条评论
Educación 4.0

2021年6月30日

Educación 4.0

No son “clases” en línea (exclusivamente), tampoco es educación “digital” (exclusivamente). A estas alturas pareciera…

2 条评论

See all articles

GPT Compute Power

Raul Palacios

领英推荐

Raul Palacios的更多文章

社区洞察

其他会员也浏览了

Sora-ing to New Heights in AI

How to Solve the Inference Problem of AI Models?

The rise of AI agents

Making a Better AI to Solve the World’s Greatest Problems - How Much 2022 Training Compute (in FLOP) and Data Is Needed? – What is Quetta (Q)

DeepSeekv3 Crushes Closed-Source LLMs

The Silicon Symphony: Understanding the Computational Orchestra Behind Generative AI

Where is AI going in 5 years?

Why Everyone Is Talking About DeepSeek – Everything Explained in 3 Min

Estimating the Infrastructure and Training Costs for Massive AI Models

Three Things To Expect From This Week’s Llama 3 405B Release

领英推荐

Raul Palacios的更多文章

Digital Identity in Changing Business and Social Landscape

Optimizing Business-Enabling Technology: A Framework for Balancing Risk, Deliverables, and Cost in a Societal Context

Google Willow: Hype vs Realidad

2025: A?o Internacional Cuántica y su importancia para Chile y Latinoamérica

Cloud Smart y las fuerzas que modelan su adopción

Curvatura Hiperbólica del espacio-tiempo

Computación Cuántica y el impacto en la industria financiera

Encryption, Internet Security & Quantum Cryptography

Futuristic? Quantum Computing is already here

Educación 4.0

社区洞察

其他会员也浏览了

Sora-ing to New Heights in AI

How to Solve the Inference Problem of AI Models?

The rise of AI agents

Making a Better AI to Solve the World’s Greatest Problems - How Much 2022 Training Compute (in FLOP) and Data Is Needed? – What is Quetta (Q)

DeepSeekv3 Crushes Closed-Source LLMs

The Silicon Symphony: Understanding the Computational Orchestra Behind Generative AI

Where is AI going in 5 years?

Why Everyone Is Talking About DeepSeek – Everything Explained in 3 Min

Estimating the Infrastructure and Training Costs for Massive AI Models

Three Things To Expect From This Week’s Llama 3 405B Release