Self-Lengthen method for longer LLMs responses
TuringPost
Newsletter about AI and ML. ?? Sign up for free to get your list of essential AI resources ??
Qwen Team, 阿里巴巴集团 proposes Self-Lengthen approach to help LLMs create longer, coherent responses.
It uses only the built-in abilities of the LLMs and 2 key components:
- a Generator to create an initial response
- an Extender to split and detail it further
Here are the details:
Self-Lengthen method gradually expanding LLMs output, step by step.
Firstly, using self-instruct technique, researchers create a variety of instructions to guide the model in generating longer text.
Generator component creates an initial long response based on a given instruction. This basic model is trained over several rounds, gradually increasing its output length each time.
The Extender does this by expanding the first part of the answer and then using that as a guide to make the second part longer. After that a response is about twice as long. This longer version is then used to improve the Generator and the Extender.
By repeating this process, the Generator and Extender help the model learn to handle increasingly lengthy outputs. After that it can produce long responses from scratch without needing other data sources or advanced models.
Results:
Tests reveal Self-Lengthen boosts long response quality in LLMs like Qwen2 and LLaMA3 without affecting general task performance.
For example, in the latest Qwen2.5 model, output length increased from 1,000 words to up to 8,000 words.
Social Engineer TREE(3).0
1 个月It's so much fun to see more traditional optimization methods translated to LLM-type AI. This is like a bufferoverflow... If this is the new trend, i wouldn't be surprised by a performance jump of at least an order of magnitude by year-end. I'm actually flabbergasted to most of the optimizations are actually as basic translation of those used by system engineers, like at the foundations of virtual machines, application servers or experimental operating systems. And then comes the?fact that Python isn't designed to be a high-performance programming language that can deal with massive parallelism. It's not as bad as writing an LLM in JavaScript, and i understand Python was the choice because it became a de-facto standard with all the post-academic toy models that were incorporated, but the choice of combination of good programming environments (or maybe do metaprogramming on top of an AI-codegenerator) might very well boost performance an additional order of magnitude.