How enot.ai optimized 6 billion param. NLP model: reducing VRAM requirement by 7 times and making it 2.3x faster!

How enot.ai optimized 6 billion param. NLP model: reducing VRAM requirement by 7 times and making it 2.3x faster!

Large language models are widely adopted across all sorts of Natural Language Processing applications. Such models generate text of such a high quality that it can be difficult to distinguish whether it was written by an AI or a human. Also, such models allow to solve tasks of Intent Classification or Named Entity Recognition with highest possible accuracy.

Let’s consider the GPT-J model, which has 6 billion parameters. At the baseline, it requires 22Gb of video memory, and it takes 3.4 seconds to generate just one sequence of text.

Using the ENOT technology, we were able to significantly reduce the consumption of video memory (7x times) and speed up calculations for text classification task by more than twice.

No alt text provided for this image
GPT-J with ENOT: optimization results

Main technical parameters:

Estimations were carried out on IMDB reviews classification task, where:

  • Sequence length = 512
  • Video card Nvidia GPU = A6000
  • Baseline model - GPT-J 6B?(https://huggingface.co/EleutherAI/gpt-j-6B).

Results:

Using ENOT tool, we were able to reduce the consumption of video memory by 7x times, and speed up calculations by 2.3x times. Accuracy loss was just 0.8%.

要查看或添加评论,请登录

ENOT.ai的更多文章

社区洞察

其他会员也浏览了