Self-Lengthen method for longer LLMs responses

Self-Lengthen method for longer LLMs responses

Qwen Team, 阿里巴巴集团 proposes Self-Lengthen approach to help LLMs create longer, coherent responses.

It uses only the built-in abilities of the LLMs and 2 key components:

- a Generator to create an initial response

- an Extender to split and detail it further

Here are the details:

Image credit: Original paper

Self-Lengthen method gradually expanding LLMs output, step by step.

Firstly, using self-instruct technique, researchers create a variety of instructions to guide the model in generating longer text.

  • Initial response generation:

Generator component creates an initial long response based on a given instruction. This basic model is trained over several rounds, gradually increasing its output length each time.

  • Response extension:

The Extender does this by expanding the first part of the answer and then using that as a guide to make the second part longer. After that a response is about twice as long. This longer version is then used to improve the Generator and the Extender.

Image credit: Original paper

  • Iterative process:

By repeating this process, the Generator and Extender help the model learn to handle increasingly lengthy outputs. After that it can produce long responses from scratch without needing other data sources or advanced models.


Results:

Tests reveal Self-Lengthen boosts long response quality in LLMs like Qwen2 and LLaMA3 without affecting general task performance.

For example, in the latest Qwen2.5 model, output length increased from 1,000 words to up to 8,000 words.

Image credit: Original paper

Paper: https://arxiv.org/pdf/2410.23933

Code: https://github.com/QwenLM/Self-Lengthen

Paul Peters

Social Engineer TREE(3).0

1 个月

It's so much fun to see more traditional optimization methods translated to LLM-type AI. This is like a bufferoverflow... If this is the new trend, i wouldn't be surprised by a performance jump of at least an order of magnitude by year-end. I'm actually flabbergasted to most of the optimizations are actually as basic translation of those used by system engineers, like at the foundations of virtual machines, application servers or experimental operating systems. And then comes the?fact that Python isn't designed to be a high-performance programming language that can deal with massive parallelism. It's not as bad as writing an LLM in JavaScript, and i understand Python was the choice because it became a de-facto standard with all the post-academic toy models that were incorporated, but the choice of combination of good programming environments (or maybe do metaprogramming on top of an AI-codegenerator) might very well boost performance an additional order of magnitude.

要查看或添加评论,请登录

TuringPost的更多文章

社区洞察

其他会员也浏览了