DBRX: A New State-of-the-Art Open LLM

DBRX: A New State-of-the-Art Open LLM

DBRX is a new open-source large language model (LLM) created by Databricks that sets a new state-of-the-art on several benchmarks. Across standard benchmarks like MMLU, HumanEval, and GSM8K, DBRX Instruct outperforms established open models like LLaMA2, Grok, and Mixtral.DBRX also surpasses or matches the performance of leading closed models like GPT-3.5, Gemini, and Mistral Medium on many tasks.

  • DBRX uses a fine-grained mixture-of-experts (MoE) architecture with 32B total parameters and 36B active parameters, which provides efficiency gains in training and inference.
  • The DBRX model and checkpoints are available on Hugging Face under an open license for the community to use and build upon.

?Training Efficiency and Compute Savings

  • Databricks found that training MoE models like DBRX is about 2x more FLOP-efficient than training dense models to reach the same quality.
  • Compared to their previous MPT models, Databricks' overall recipe for DBRX can match the quality with nearly 4x less computing. This is due to architectural improvements, better optimization, and higher-quality pretraining data.

Performance on Long-Context and Retrieval Tasks

DBRX Instruct was trained with up to a 32K token context window. It compares its performance to Mixtral Instruct and the latest GPT-3.5 Turbo and GPT-4 Turbo APIs on long-context benchmarks. On retrieval-augmented generation (RAG) tasks using a Wikipedia corpus, DBRX Instruct is competitive with other open and closed models.

  • DBRX Instruct performs competitively with GPT-3.5 Turbo and GPT-4 Turbo on long-context benchmarks like KV-Pairs and HotpotQAXL.
  • On retrieval-augmented generation (RAG) tasks using a Wikipedia corpus, DBRX Instruct is competitive with other open and closed models.

  • On the KV-Pairs and HotpotQAXL benchmarks, GPT-4 Turbo generally performs the best.
  • However, with one exception, DBRX Instruct performs better than GPT-3.5 Turbo at all context lengths and sequence parts.
  • The overall performance of DBRX Instruct and Mixtral Instruct are similar on these long-context tasks.

*Averages for GPT-3.5 Turbo include only contexts up to 6K, as it supports a maximum of 6K.


The information provided discusses the inference efficiency of DBRX and similar models using NVIDIA TensorRT-LLM with optimized serving infrastructure and 6-bit precision. The benchmark aims to simulate real-world usage, including multiple users hitting the same inference server. Each user request contains an approximately 2000 token prompt, and each response comprises 256 tokens.

  • MoE models like DBRX are faster at inference than their total parameter counts would suggest due to using relatively few parameters per input.
  • DBRX's inference throughput is 2-3x higher than a 32B non-MoE model, showcasing its efficiency in inference tasks.
  • The trade-off between model quality and inference efficiency is highlighted, where larger models typically achieve higher quality but smaller models are more efficient for inference.
  • DBRX, with its MoE architecture, balances model quality, and inference efficiency, outperforming LLaMA2-70B in quality and speed, with up to 2x faster inference throughput.
  • Mixtral represents another point on the improved Pareto frontier of MoE models, being smaller than DBRX but offering higher inference throughput at the cost of slightly lower quality.

Users leveraging Databricks Foundation Model APIs can anticipate up to 50 tokens per second for DBRX on the optimized model serving platform with 8-bit quantization.

In summary, DBRX Instruct performs competitively with leading models on long-context and retrieval-augmented tasks, demonstrating its strong capabilities across various benchmarks.

DBRX represents a significant advancement in open-source large language models, providing state-of-the-art performance across a range of benchmarks while also being more efficient to train and use than previous models.

要查看或添加评论,请登录

社区洞察

其他会员也浏览了