Meet LIMA: A New 65B Parameter LLaMa Model Fine-Tuned On 1000 Carefully Curated Prompts And Responses ????
SWARNODIP NAG
MCA @CU '26 | AI/ML Researcher ?? | Data Scientist ?????? | Computer vision ???| Python (Django) Developer?? | FSD enthusiast ?? | Graphic Designer ???? | VITMEE '24 GMR: 23 | WBJECA '24 GMR: 55
By being pretrained to predict the next token at an astoundingly large scale, language models provide general-purpose representations that can be used for nearly any language interpretation or producing task. As a result, a variety of language model alignment strategies have been proposed to aid in this transfer, with a focus on instruction tuning over large datasets with millions of examples and, more recently, Reinforcement Learning from Human Feedback (RLHF) gathered over millions of interactions with human annotators. However, for existing alignment techniques to perform at ChatGPT levels, large computing, and specialized data resources are required.?
However, they show that with a good language model already trained, very good performance may be obtained by just tweaking 1,000 properly chosen training instances. According to their hypothesis, alignment may be a quick and easy procedure where the model learns the format or style of engaging users to disclose the skills and information already learned during pretraining. They collect 1,000 instances that resemble authentic user cues and excellent replies to verify this idea. They choose 750 of the best questions and responses from online discussion boards like Stack Exchange and wikiHow, evaluating them for quality and variety.
They also manually compose 250 instances of questions and answers while emphasizing a consistent response style in the vein of an AI assistant and optimizing for task diversity. Researchers from AI at Meta , 美国卡内基梅隆大学 , 美国南加州大学 , and Tel Aviv University USA (AFTAU) train LIMA, a 65B-parameter LLaMa model previously trained and improved on this collection of 1,000 examples. Three hundred difficult test questions compare LIMA against contemporary language models and products.
LIMA surpasses RLHF-trained DaVinci003 from OpenAI , which was trained with RLHF, as well as a 65B-parameter replica of Alpaca, which was introduced on 52,000 samples, in a study of human preference.?
Although humans frequently prefer GPT-4, Claude, and Bard replies over LIMA responses, this is not always the case; LIMA consistently yields equivalent or preferable results in 43%, 46%, and 58% of the situations, respectively. They repeat the annotations of human preferences using GPT-4 as the annotator confirms their findings. When LIMA replies are evaluated on an absolute scale, 88% satisfy the prompt’s requirements, and 50% are rated outstanding. Ablation tests show significant improvements when improving data quality and significantly falling returns when increasing data amount without simultaneously increasing prompt variety.?
领英推荐
Furthermore, they discover that LIMA can carry on coherent multi-turn discourse despite having no dialogue examples. Including 30 hand-crafted dialogue chains in training may enhance this capacity. Overall, these amazing results show the effectiveness of pretraining and its relative value over approaches to reinforcement learning and large-scale instruction tailoring. They demonstrate how a robust pretrained language model may be tuned to provide outstanding, competitive outcomes on various prompts using 1,000 well-picked samples.
There are, however, drawbacks to this strategy.?The mental work required to create such instances is enormous and challenging to scale up. Second, while LIMA normally provides strong replies, an unfortunate sample during decoding or an aggressive prompt can frequently result in a weak response. LIMA is less resilient than product-grade models.
Nevertheless, the data provided in this work shows that it is possible to address the difficult alignment problems straightforwardly.
Thanks for reading! ??
Follow me SWARNODIP NAG ↗