GPT Compute Power

GPT Compute Power

We've been under an avalanche of ChatGPT and AI related posts, comments, news ... AI-everywhere.

So just to take all the hype to the brutal tangible world, just took the braveness to summarize an estimate of what is actually behind all this impresive Large Language Models (LLMs)


I haven't found concluding data on the actual hardware stack requirements and usage for ChatGPT, but we will infer some reference data in the following lines:


It's commonly used the amount of paraments the model can handle as the benchmark of strength and power for current GPT models


Here is the complete list of GPT versions and their parameters:

  • GPT-1: 117 million?parameters
  • GPT-2: 1.5 billion?parameters
  • GPT-3: 175 billion?parameters
  • GPT-4: 170 trillion?parameters


No alt text provided for this image

Now we do have a precise information about the hardware stack for a specific Finance LLM, built by Bloomberg: In the arxiv publication in March, there is the description of a model used for Finance purpose. In this publication they present BloombergGPT, a 50 billion parameter language model that is trained on a wide range of nancial data. Using Amazon SageMaker service provided by AWS to train and evaluate BloombergGPT, with the the latest version available at the time of training and train on a total of 64 p4d.24xlarge instances. Each p4d.24xlarge instance has 8 NVIDIA 40GB A100 GPUs with NVIDIA NVSwitch intra-node connections (600 GB/s) and NVIDIA GPUDirect using AWS Elastic Fabric Adapter (EFA) inter-node connections (400 Gb/s). This yields a total of 512 40GB A100 GPUs. For quick data access, we use Amazon FSX for Lustre, which supports up to 1000 MB/s read and write throughput per TiB storage unit.

As per AWS specs for p4d.24xlarge instances:

No alt text provided for this image

So, for this 50 billion parameter BloombergGPT, this LLM used 64 instances with over 1 terabytes each (plus GPU own memory)

This is also inline with the vram estimates published at https://neuroflash.com/blog/gpt-4-parameters-rumors-and-forecasts/

"GPT-3 currently requires 700 gigabytes of V-RAM, and if GPT-4 has a thousand times the number of parameters, it would require 700 terabytes of V-RAM"




Long story short: again, you'll need a lot of computing resources to achieve a reasonable large language model up and running and basically there is no other way to achieve these innovation but using Cloud Computing services

No alt text provided for this image


pd: all images are credited to Dall-E

要查看或添加评论,请登录

Raul Palacios的更多文章

社区洞察

其他会员也浏览了