OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework
Credit: https://arxiv.org/pdf/2404.14619

OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework

Today’s paper introduces OpenELM, a new open-source language model family that achieves state-of-the-art performance for its size. It is trained on publicly available dataset and uses layer-wise scaling to efficiently allocate parameters within each transformer layer, leading to enhanced accuracy compared to existing open language models of similar size.

Method Overview

OpenELM adopts a decoder-only transformer architecture and incorporates several recent advancements such as RMSNorm, rotary positional embeddings, grouped query attention, SwiGLU feed-forward networks, and flash attention. The main distinguishing feature is the use of layer-wise scaling, where the number of attention heads and feed-forward network dimensions are scaled differently for each transformer layer. This allows allocating more parameters to the later layers closer to the output, enabling more efficient utilization of the parameter budget.

For pre-training, OpenELM uses a mixture of publicly available datasets totaling approximately 1.5 trillion tokens. The pre-training data is filtered and tokenized on-the-fly during training. OpenELM variants ranging from 270 million to 3 billion parameters are trained for 350,000 iterations using the AdamW optimizer.

Results

Evaluation is performed on a comprehensive suite of zero-shot and few-shot tasks spanning reasoning, knowledge understanding and bias & misinformation. This includes standard zero-shot benchmarks, the OpenLLM leaderboard, and the LLM360 leaderboard. OpenELM achieves state-of-the-art performance among open language models trained on public datasets. For example, the 1.1 billion parameter OpenELM outperforms the similarly-sized OLMo model by 2.36% on the OpenLLM leaderboard while using 2x fewer pre-training tokens.

Instruction tuning using the UltraFeedback dataset further improves OpenELM's performance by 1-2% on average. OpenELM is also shown to work well with parameter-efficient fine-tuning methods like LoRA and DoRA.

Conclusion

OpenELM pushes the state-of-the-art for open language models trained on public data. By open-sourcing the full training and evaluation framework, including code, weights, logs, and configurations, the authors aim to empower open research into large language models. For more information please consult the full paper.

Code: https://github.com/apple/corenet

Models: https://huggingface.co/apple/OpenELM

Congrats to the authors for their work!

Mehta, Sachin, et al. "OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework." arXiv preprint arXiv:2404.14619 (2023).

要查看或添加评论,请登录

Vlad Bogolin的更多文章

社区洞察

其他会员也浏览了