登录查看更多内容

Thoughts on DBRX

Srinivas Chilukuri

Leader of ZS AI Innovation Lab

发布日期: 2024年4月6日

DBRX was announced last week and created some buzz. Is it good? What is it good for? and how does it really fare against the alternatives from a practical enterprise usage perspective? Let us dive into this “open source” LLM

Model overview

It took $10M and 2-mos to train, on 12T tokens
132B parameters in a sparse Mixture of Experts (MoE) architecture (12 experts w/ 36B parameters active per inference). About 3x scale of Mixtral 8x7B the prior (or current?) open SOTA

Tokenomics

It requires 4+ NVIDIA H100 GPUs. For reference, on AWS H100 is available as p5.48xlarge @ $98.32 / hr on-demand price
With this instance, we can expect at most 4 inferences/sec => 14,400 inferences/hr
Assuming a user would post at least 10 inferences requests/hr at peak load, this would be 1,440 users/hr (concurrent users is only 4/sec, so we need queueing, etc. which will perhaps reduce the throughput further, but for now lets keep it simple)
For 1-mo, the instance cost would be 30 days x 24 hrs x 98.32 => about $70k
To support about 15k users, this would be 10x => $700k/mo
For reference, with Open AI this would cost about $50k/mo
Keep in mind that GPT-4 is significantly better than any open source model

Evals

The evals are dubious. We don’t have to single out Databricks for this. All evals have marketing flavor to them. Ground truth is for us to figure out ;-)
DBRX is compared against Mixtral (8x7B MoE model which has about 12B parameters active per inference). So, DBRX is 3x the size of Mixtral but offers only a wee bit of performance gain. This performance gain is not good enough to incur the incremental cost
There is particularly a large gap on Human Eval (this is programming benchmark), but Mixtral was not trained for code generation, so this is expected
This is the first open model that has been trained on 12T tokens, so a lot of them might be coding related
As is our experience even with GPT models, real world applications require a lot more than just the evals

Training data purity & legal coverage

The training data details are not revealed. In fact, when asked the question on training data was evaded (see TC article)
No legal indemnity cover, which Open AI, MSFT and Google provide currently i.e. if anybody sues a user for data privacy/security infringement, users are on their own
Several folks in open source community have complained about the openness in the license terms, where it restricts improving other models, commercial use, etc.

领英推荐

This AI newsletter is all you need #92

Towards AI 12 个月前

Tech3 | India joins global LLM race with 18,693 GPUs;…

moneycontrol.com 1 个月前

How Nvidia Will Reach Peak AI Profitability in 2024

Michael Spencer 1 年前

Bottom line

While this came out as an open source model, it is more of an open weight model. Nonetheless, the benefits to open source users/developers would be minimal because this is too huge and therefore expensive to create derivatives unlike Llama-2 and Mistral.

To me, this is more of a signal to the market that Databricks can be an option to build custom LLMs, using their stack* in about $10M (*Spark, Unity Catalog and Lilac AI).

That sounds ok-ish but counter points are – Open AI announced back in Nov’23 that they will enable custom models for enterprises for around $2-3M. I guess the catch is that the data has to go to Azure and maybe the trained model still stays there. But is it worth 5x cost? I doubt it. Large clients have already set up Azure Open AI services and have been using them, so unless the cost is matched, I don’t see them switching. DBRX is still way inferior to GPT-4, and could be further so against a custom model

On the other hand, it is a bit concerning that more and more “open” models are also edging towards larger size and MoE architectures rather than figuring out how to make them parameter efficient by leveraging scaling laws (like Mistral). Looks like MoE architecture is here to stay but I am hoping we will go back to making smaller models stronger. This is where open source community can thrive

References:

https://www.databricks.com/blog/announcing-dbrx-new-standard-efficient-open-source-customizable-llms

https://techcrunch.com/2024/03/27/databricks-spent-10m-on-a-generative-ai-model-that-still-cant-beat-gpt-4/

https://www.reddit.com/r/OpenAI/comments/17pyuut/openai_is_charging_23_million_for_custom_models/

Note: These are my own views and do not reflect my employer's

Yu Cheng Tao

NUS Data Science & Economics

6 个月

Would like to learn this

Gaurav Tiwari

Associate Principal at ZS

11 个月

Very insightful, thanks for sharing

查看更多评论

要查看或添加评论，请登录

Srinivas Chilukuri的更多文章

Generative AI: 2023 Recap and 2024 Predictions

2024年1月5日

Generative AI: 2023 Recap and 2024 Predictions

Generative AI is perhaps the most discussed topic last year. As we wrapped up 2023 and are welcoming the new year 2024,…

6 条评论
Thoughts on Gemini

2023年12月8日

Thoughts on Gemini

After much anticipation, Gemini has been announced this week (on 6-Dec). There have been rumors that it is being pushed…

1 条评论
Ontology of Trustworthy AI

2022年10月3日

Ontology of Trustworthy AI

What do these words have in common: Person, Woman, Man, Camera, TV? They are part of a cognitive test that measures…

2 条评论

Thoughts on DBRX

Srinivas Chilukuri

Leader of ZS AI Innovation Lab

领英推荐

Srinivas Chilukuri的更多文章

社区洞察

其他会员也浏览了

How Nvidia Will Reach Peak AI Profitability in 2024

Leading Practices for GPUaaS and LLMaaS Success: A Detailed Guide

Latest Updates: FREE Llama 3.2 Multimodal & FLUX.1 [schnell], NVIDIA H200s, and Enterprise Platform

AI at Lightning Speed: Nvidia’s Game-Changing Chip Innovations

VAST Powers Blazing-Fast, S3-Native Model Streaming and Data Processing with NVIDIA Run:ai

DeciDiffusion 1.0: 3x the Speed of Stable Diffusion with the Same Quality

Insider’s Edit: Nvidia’s Future Defining Hardware, Google’s AI Search Edits, Microsoft’s $18.9 Billion Partnership

Inside the H200 Tensor Core GPU: An In-Depth Architectural Analysis

Inside the H200 Tensor Core GPU: An In-Depth Architectural Analysis

Transitioning to Gen-AI/ML Services: A Practical Guide

领英推荐

Srinivas Chilukuri的更多文章

Generative AI: 2023 Recap and 2024 Predictions

Thoughts on Gemini

Ontology of Trustworthy AI

社区洞察

其他会员也浏览了

How Nvidia Will Reach Peak AI Profitability in 2024

Leading Practices for GPUaaS and LLMaaS Success: A Detailed Guide

Latest Updates: FREE Llama 3.2 Multimodal & FLUX.1 [schnell], NVIDIA H200s, and Enterprise Platform

AI at Lightning Speed: Nvidia’s Game-Changing Chip Innovations

VAST Powers Blazing-Fast, S3-Native Model Streaming and Data Processing with NVIDIA Run:ai

DeciDiffusion 1.0: 3x the Speed of Stable Diffusion with the Same Quality

Insider’s Edit: Nvidia’s Future Defining Hardware, Google’s AI Search Edits, Microsoft’s $18.9 Billion Partnership

Inside the H200 Tensor Core GPU: An In-Depth Architectural Analysis

Inside the H200 Tensor Core GPU: An In-Depth Architectural Analysis

Transitioning to Gen-AI/ML Services: A Practical Guide