OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework

Vlad Bogolin

AI/ML Engineer & Researcher | Large Language Models (LLMs)

发布日期: 2024年4月26日

Today’s paper introduces OpenELM, a new open-source language model family that achieves state-of-the-art performance for its size. It is trained on publicly available dataset and uses layer-wise scaling to efficiently allocate parameters within each transformer layer, leading to enhanced accuracy compared to existing open language models of similar size.

Method Overview

OpenELM adopts a decoder-only transformer architecture and incorporates several recent advancements such as RMSNorm, rotary positional embeddings, grouped query attention, SwiGLU feed-forward networks, and flash attention. The main distinguishing feature is the use of layer-wise scaling, where the number of attention heads and feed-forward network dimensions are scaled differently for each transformer layer. This allows allocating more parameters to the later layers closer to the output, enabling more efficient utilization of the parameter budget.

For pre-training, OpenELM uses a mixture of publicly available datasets totaling approximately 1.5 trillion tokens. The pre-training data is filtered and tokenized on-the-fly during training. OpenELM variants ranging from 270 million to 3 billion parameters are trained for 350,000 iterations using the AdamW optimizer.

Results

Evaluation is performed on a comprehensive suite of zero-shot and few-shot tasks spanning reasoning, knowledge understanding and bias & misinformation. This includes standard zero-shot benchmarks, the OpenLLM leaderboard, and the LLM360 leaderboard. OpenELM achieves state-of-the-art performance among open language models trained on public datasets. For example, the 1.1 billion parameter OpenELM outperforms the similarly-sized OLMo model by 2.36% on the OpenLLM leaderboard while using 2x fewer pre-training tokens.

领英推荐

Differences Between LLAMA 2 and LLAMA 3

Blockchain Council 2 个月前

Survey of Multimodal LLMs; Meet GOAT-7B-Community…

Danny Butvinik 1 年前

?? Apple Unveals Their Secrets

Pascal Biese 5 个月前

Instruction tuning using the UltraFeedback dataset further improves OpenELM's performance by 1-2% on average. OpenELM is also shown to work well with parameter-efficient fine-tuning methods like LoRA and DoRA.

Conclusion

OpenELM pushes the state-of-the-art for open language models trained on public data. By open-sourcing the full training and evaluation framework, including code, weights, logs, and configurations, the authors aim to empower open research into large language models. For more information please consult the full paper.

Code: https://github.com/apple/corenet

Models: https://huggingface.co/apple/OpenELM

Congrats to the authors for their work!

Mehta, Sachin, et al. "OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework." arXiv preprint arXiv:2404.14619 (2023).

AI Paper of the Day

892 位关注者

要查看或添加评论，请登录

Vlad Bogolin的更多文章

LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations

2024年10月8日

LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations

Today's paper explores the internal representations of large language models (LLMs) to better understand and detect…
Unraveling Cross-Modality Knowledge Conflict in Large Vision-Language Models

2024年10月7日

Unraveling Cross-Modality Knowledge Conflict in Large Vision-Language Models

Today's paper addresses a critical issue in Large Vision-Language Models (LVLMs): cross-modality parametric knowledge…
LLaVA-Critic: Learning to Evaluate Multimodal Models

2024年10月6日

LLaVA-Critic: Learning to Evaluate Multimodal Models

Today's paper introduces LLaVA-Critic, an open-source large multimodal model (LMM) designed as a generalist evaluator…
Loong: Generating Minute-level Long Videos with Autoregressive Language Models

2024年10月5日

Loong: Generating Minute-level Long Videos with Autoregressive Language Models

Today's paper introduces Loong, an autoregressive language model-based approach for generating minute-long videos from…
Movie Gen: A Cast of Media Foundation Models

2024年10月4日

Movie Gen: A Cast of Media Foundation Models

Today's paper introduces Movie Gen, a set of foundation models for generating high-quality videos with synchronized…
TPI-LLM: Serving 70B-scale LLMs Efficiently on Low-resource Edge Devices

2024年10月3日

TPI-LLM: Serving 70B-scale LLMs Efficiently on Low-resource Edge Devices

Today's paper introduces TPI-LLM, a tensor parallel inference system for running large language models (LLMs)…
Law of the Weakest Link: Cross Capabilities of Large Language Models

2024年10月2日

Law of the Weakest Link: Cross Capabilities of Large Language Models

Today's paper introduces the concept of "cross capabilities" in Large Language Models (LLMs) and presents a new…
MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning

2024年10月1日

MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning

Today's paper introduces MM1.5, a new family of multimodal large language models (MLLMs) designed to enhance…
Emu3: Next-Token Prediction is All You Need

2024年9月30日

Emu3: Next-Token Prediction is All You Need

Today's paper introduces Emu3, a new multimodal AI model that uses next-token prediction to excel at both generation…
Programming Every Example: Lifting Pre-training Data Quality like Experts at Scale

2024年9月29日

Programming Every Example: Lifting Pre-training Data Quality like Experts at Scale

Today's paper introduces PROX (Programming Every Example), a new framework for refining pre-training data for large…

1 条评论

See all articles

OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework

Vlad Bogolin

AI/ML Engineer & Researcher | Large Language Models (LLMs)

Method Overview

Results

领英推荐

Conclusion

AI Paper of the Day

892 位关注者

Vlad Bogolin的更多文章

社区洞察

其他会员也浏览了

Issue #222 - THE ML ENGINEER ??

Understanding the Basic Components of a Prompt in LLM Models

Top LLM Papers of the Week (March Week-3 2024)

Evaluating LLM and RAG Systems

Improving Large Language Models Domain-Specific Answers with local long-term Memory. Testing "Cheshire Cat" with my book "Scrum for Hardware"

Introducing HaluMon: Ensuring Language Model Reliability

Fine-Tuning LLMs with Your Data

Demystifying Domain-Specific Languages: The Dawn of DSPyGen DSL

Insider’s Edit: Google’s New AI Developer Workspace for Browsers

Part Beta: Information Discovery and Discoverability

Method Overview

Results

领英推荐

Conclusion

AI Paper of the Day

892 位关注者

Vlad Bogolin的更多文章

LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations

Unraveling Cross-Modality Knowledge Conflict in Large Vision-Language Models

LLaVA-Critic: Learning to Evaluate Multimodal Models

Loong: Generating Minute-level Long Videos with Autoregressive Language Models

Movie Gen: A Cast of Media Foundation Models

TPI-LLM: Serving 70B-scale LLMs Efficiently on Low-resource Edge Devices

Law of the Weakest Link: Cross Capabilities of Large Language Models

MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning

Emu3: Next-Token Prediction is All You Need

Programming Every Example: Lifting Pre-training Data Quality like Experts at Scale

社区洞察

其他会员也浏览了

Issue #222 - THE ML ENGINEER ??

Understanding the Basic Components of a Prompt in LLM Models

Top LLM Papers of the Week (March Week-3 2024)

Evaluating LLM and RAG Systems

Improving Large Language Models Domain-Specific Answers with local long-term Memory. Testing "Cheshire Cat" with my book "Scrum for Hardware"

Introducing HaluMon: Ensuring Language Model Reliability

Fine-Tuning LLMs with Your Data

Demystifying Domain-Specific Languages: The Dawn of DSPyGen DSL

Insider’s Edit: Google’s New AI Developer Workspace for Browsers

Part Beta: Information Discovery and Discoverability