Is Chatgpt only a Big Model?(1)

Is Chatgpt only a Big Model?(1)

StartPoint

Chatgpt has gained a lot of attetion. But most nlper thinks that reinforcement learning(RL) is only for Big Language model like GPT-3.

RL with text generation has been a method in text summary.Before 2019, most people thinks that reinforcement learning for text summarization is useless in the industry, it can only be used to write papers.

At now as we know that Chatgpt has proved that RL has a higher upper bound.Lacking of a big model or enough GPUs, most engineers do not want to have a try and even do not read the paper[1].

If we look carefully in the paper.There is a sentence:

"On our test set, outputs from the 1.3B parameter InstructGPT model are preferred to outputs from the 175B GPT-3."

It means reinforcement learning can work in a model that can train on a 48G GPU.

And then if we deeply read the paper.We can find the cost of data and GPU also is small enough that can train even on 4 GPUs.

Let's have a look.

"The cost of collecting our data and the compute for training runs, including experimental runs is a fraction of what was spent to train GPT-3: training our 175B SFT model requires 4.9 petaflops/s-days and training our 175B PPO-ptx model requires 60 petaflops/s-days, compared to 3,640 petaflops/s-days for GPT-3 (Brown et al., 2020). "

Refer to wiki[2] and openai blog[3], we convert the petaflops/s-days to the A6000 gpu cost.If we train a 1.3B SFT moel, we could only need 4 GPUs for 20 days.And we have a gain than more than about thousand of cost in language model training.

One A6000 in a day is 0.0379. If we assum that the model training time and model parameters have a square linear relationship, we need to 4gpu and 20 days to train a 3 petaflops/s-days model.

That means if we do not have a big model, we cann use RL to replace.


[1]Training language models to follow instructions with human feedback

[2]https://en.wikipedia.org/wiki/List_of_Nvidia_graphics_processing_units

[3]https://openai.com/research/ai-and-compute

要查看或添加评论,请登录

Xingyu Ma的更多文章

  • Paper Note:Make-An-Audio

    Paper Note:Make-An-Audio

    The model use STFT as the inter feature.A HIFIGAN is used as the vocoder to convert STFT to wav.

  • Paper Note: RAVE

    Paper Note: RAVE

    Startpoints Improved the way vae models audio signals. Main innovation points: Two-stage training: representation…

  • Paper Notes: ACE-VC

    Paper Notes: ACE-VC

    ACE-VC is to disentangling the speech into linguistic content, speaker characteristics, and speaking style with self…

  • Accent Conversion:NON-PARALLEL ACCENT CONVERSION USING PSEUDO SIAMESE DISENTANGLEMENT NETWORK

    Accent Conversion:NON-PARALLEL ACCENT CONVERSION USING PSEUDO SIAMESE DISENTANGLEMENT NETWORK

    Accent is a local feature.Distengling the component of speech and reconstructing the speech is the methods.

  • Music prompt:VALL-E

    Music prompt:VALL-E

    StartPoint VALL-E likes a speech language model that predicts current audio code from past audio codes.This has a…

  • Music Prompt: MusicLM(1)

    Music Prompt: MusicLM(1)

    Startpoints Nearly, Text prompt has been a trend in cross-modal generative model. Taking advantage of the brevity…

  • Image Generating: Stable Diffsusion in different view behind intuition(1)

    Image Generating: Stable Diffsusion in different view behind intuition(1)

    Let see stable diffusion in style transfer view. Cross Attention Text encoder can be regarded as a style encoder.

  • Paper Notes: DDSP

    Paper Notes: DDSP

    Start Points Tranditional DSP algorithm can produce high quality instrument sounds.As there is a lot of paramters to be…

  • Paper Notes: FreeVC

    Paper Notes: FreeVC

    Issues Text-based VC models need labeled data.Text-free approaches has lots of defects.

社区洞察

其他会员也浏览了