登录查看更多内容

DBRX: A New State-of-the-Art Open LLM

Sharad Gupta

Linkedin Top Voice I Ex-McKinsey I GenAI Product and Growth leader in Banking, FinTech | Ex-CMO and Head of Data science Foodpanda (Unicorn) I Ex-CBO and Product leader Tookitaki

发布日期: 2024年3月28日

DBRX is a new open-source large language model (LLM) created by Databricks that sets a new state-of-the-art on several benchmarks. Across standard benchmarks like MMLU, HumanEval, and GSM8K, DBRX Instruct outperforms established open models like LLaMA2, Grok, and Mixtral.DBRX also surpasses or matches the performance of leading closed models like GPT-3.5, Gemini, and Mistral Medium on many tasks.

DBRX uses a fine-grained mixture-of-experts (MoE) architecture with 32B total parameters and 36B active parameters, which provides efficiency gains in training and inference.
The DBRX model and checkpoints are available on Hugging Face under an open license for the community to use and build upon.

?Training Efficiency and Compute Savings

Databricks found that training MoE models like DBRX is about 2x more FLOP-efficient than training dense models to reach the same quality.
Compared to their previous MPT models, Databricks' overall recipe for DBRX can match the quality with nearly 4x less computing. This is due to architectural improvements, better optimization, and higher-quality pretraining data.

Performance on Long-Context and Retrieval Tasks

DBRX Instruct was trained with up to a 32K token context window. It compares its performance to Mixtral Instruct and the latest GPT-3.5 Turbo and GPT-4 Turbo APIs on long-context benchmarks. On retrieval-augmented generation (RAG) tasks using a Wikipedia corpus, DBRX Instruct is competitive with other open and closed models.

DBRX Instruct performs competitively with GPT-3.5 Turbo and GPT-4 Turbo on long-context benchmarks like KV-Pairs and HotpotQAXL.
On retrieval-augmented generation (RAG) tasks using a Wikipedia corpus, DBRX Instruct is competitive with other open and closed models.

On the KV-Pairs and HotpotQAXL benchmarks, GPT-4 Turbo generally performs the best.
However, with one exception, DBRX Instruct performs better than GPT-3.5 Turbo at all context lengths and sequence parts.
The overall performance of DBRX Instruct and Mixtral Instruct are similar on these long-context tasks.

领英推荐

How GPT-4 Fails to Measure Up in 2023

Michael Spencer 11 个月前

??Top ML Papers of the Week

DAIR.AI 8 个月前

??Top ML Papers of the Week

DAIR.AI 4 个月前

*Averages for GPT-3.5 Turbo include only contexts up to 6K, as it supports a maximum of 6K.

The information provided discusses the inference efficiency of DBRX and similar models using NVIDIA TensorRT-LLM with optimized serving infrastructure and 6-bit precision. The benchmark aims to simulate real-world usage, including multiple users hitting the same inference server. Each user request contains an approximately 2000 token prompt, and each response comprises 256 tokens.

MoE models like DBRX are faster at inference than their total parameter counts would suggest due to using relatively few parameters per input.
DBRX's inference throughput is 2-3x higher than a 32B non-MoE model, showcasing its efficiency in inference tasks.
The trade-off between model quality and inference efficiency is highlighted, where larger models typically achieve higher quality but smaller models are more efficient for inference.
DBRX, with its MoE architecture, balances model quality, and inference efficiency, outperforming LLaMA2-70B in quality and speed, with up to 2x faster inference throughput.
Mixtral represents another point on the improved Pareto frontier of MoE models, being smaller than DBRX but offering higher inference throughput at the cost of slightly lower quality.

Users leveraging Databricks Foundation Model APIs can anticipate up to 50 tokens per second for DBRX on the optimized model serving platform with 8-bit quantization.

In summary, DBRX Instruct performs competitively with leading models on long-context and retrieval-augmented tasks, demonstrating its strong capabilities across various benchmarks.

DBRX represents a significant advancement in open-source large language models, providing state-of-the-art performance across a range of benchmarks while also being more efficient to train and use than previous models.

要查看或添加评论，请登录

查看全部

DBRX: A New State-of-the-Art Open LLM

Sharad Gupta

Linkedin Top Voice I Ex-McKinsey I GenAI Product and Growth leader in Banking, FinTech | Ex-CMO and Head of Data science Foodpanda (Unicorn) I Ex-CBO and Product leader Tookitaki

领英推荐

更多精彩文章

社区洞察

其他会员也浏览了

??Top ML Papers of the Week

How to Unlock the Full Potential of Prompt Engineering? An All-Inclusive Guide for Building Language Models

Top LLM Papers of the Week (October Week 4, 2024)

Fine-Tuning Florence-2 Base Model on a Custom Dataset for Image Captioning

Transformer Architectures for Dummies - Part 2 (Decoder Only Architectures)

GPT-3 writes like a writer, programs like a programmer, and can be ... dangerous

Paper Review: RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs

Retrieval Augmented Generation (RAG) overview

An Analysis of LangChain's Reusability in LLMs: Challenges and Insights

领英推荐

Not just a Math, AI is a Hidden Persuader: LLMs’ Political Leaning and Their Influence on Voters

2024年11月23日

What Banks Can Learn from Shopify's AI Adoption Strategy

2024年11月13日

Better Banking and Vertex AI Growth Strategy

2024年10月30日

Anthropic Unveils Advanced AI Agents: A New Era in Digital Task Management

2024年10月25日

AI safety is hot and $1 billion new investment is cool

2024年9月4日

Is Vertical GenAI in Banking solving $600 Billion question?

2024年9月1日

Open Source AI and what it means to the Banks and Fintechs

2024年6月22日

Pinnacle2024: Advances in AI, key issues, and fundraising

2024年5月19日

Top Banking and Credit unions Strategic Priorties in US

2024年5月16日

Agentic workflow in Banking: AI agents are coming Bots are gone

2024年4月30日

社区洞察

其他会员也浏览了

??Top ML Papers of the Week

How to Unlock the Full Potential of Prompt Engineering? An All-Inclusive Guide for Building Language Models

Top LLM Papers of the Week (October Week 4, 2024)

Fine-Tuning Florence-2 Base Model on a Custom Dataset for Image Captioning

Transformer Architectures for Dummies - Part 2 (Decoder Only Architectures)

GPT-3 writes like a writer, programs like a programmer, and can be ... dangerous

Paper Review: RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs

Retrieval Augmented Generation (RAG) overview

An Analysis of LangChain's Reusability in LLMs: Challenges and Insights