Is Chatgpt only a Big Model?(1)
StartPoint
Chatgpt has gained a lot of attetion. But most nlper thinks that reinforcement learning(RL) is only for Big Language model like GPT-3.
RL with text generation has been a method in text summary.Before 2019, most people thinks that reinforcement learning for text summarization is useless in the industry, it can only be used to write papers.
At now as we know that Chatgpt has proved that RL has a higher upper bound.Lacking of a big model or enough GPUs, most engineers do not want to have a try and even do not read the paper[1].
If we look carefully in the paper.There is a sentence:
"On our test set, outputs from the 1.3B parameter InstructGPT model are preferred to outputs from the 175B GPT-3."
It means reinforcement learning can work in a model that can train on a 48G GPU.
And then if we deeply read the paper.We can find the cost of data and GPU also is small enough that can train even on 4 GPUs.
Let's have a look.
领英推荐
"The cost of collecting our data and the compute for training runs, including experimental runs is a fraction of what was spent to train GPT-3: training our 175B SFT model requires 4.9 petaflops/s-days and training our 175B PPO-ptx model requires 60 petaflops/s-days, compared to 3,640 petaflops/s-days for GPT-3 (Brown et al., 2020). "
Refer to wiki[2] and openai blog[3], we convert the petaflops/s-days to the A6000 gpu cost.If we train a 1.3B SFT moel, we could only need 4 GPUs for 20 days.And we have a gain than more than about thousand of cost in language model training.
One A6000 in a day is 0.0379. If we assum that the model training time and model parameters have a square linear relationship, we need to 4gpu and 20 days to train a 3 petaflops/s-days model.
That means if we do not have a big model, we cann use RL to replace.
[1]Training language models to follow instructions with human feedback
[2]https://en.wikipedia.org/wiki/List_of_Nvidia_graphics_processing_units
[3]https://openai.com/research/ai-and-compute