How enot.ai optimized 6 billion param. NLP model: reducing VRAM requirement by 7 times and making it 2.3x faster!
Large language models are widely adopted across all sorts of Natural Language Processing applications. Such models generate text of such a high quality that it can be difficult to distinguish whether it was written by an AI or a human. Also, such models allow to solve tasks of Intent Classification or Named Entity Recognition with highest possible accuracy.
Let’s consider the GPT-J model, which has 6 billion parameters. At the baseline, it requires 22Gb of video memory, and it takes 3.4 seconds to generate just one sequence of text.
Using the ENOT technology, we were able to significantly reduce the consumption of video memory (7x times) and speed up calculations for text classification task by more than twice.
Main technical parameters:
Estimations were carried out on IMDB reviews classification task, where:
Results:
Using ENOT tool, we were able to reduce the consumption of video memory by 7x times, and speed up calculations by 2.3x times. Accuracy loss was just 0.8%.