Thoughts on DBRX

DBRX was announced last week and created some buzz. Is it good? What is it good for? and how does it really fare against the alternatives from a practical enterprise usage perspective? Let us dive into this “open source” LLM

Model overview

  • It took $10M and 2-mos to train, on 12T tokens
  • 132B parameters in a sparse Mixture of Experts (MoE) architecture (12 experts w/ 36B parameters active per inference). About 3x scale of Mixtral 8x7B the prior (or current?) open SOTA

Tokenomics

  • It requires 4+ NVIDIA H100 GPUs. For reference, on AWS H100 is available as p5.48xlarge @ $98.32 / hr on-demand price
  • With this instance, we can expect at most 4 inferences/sec => 14,400 inferences/hr
  • Assuming a user would post at least 10 inferences requests/hr at peak load, this would be 1,440 users/hr (concurrent users is only 4/sec, so we need queueing, etc. which will perhaps reduce the throughput further, but for now lets keep it simple)
  • For 1-mo, the instance cost would be 30 days x 24 hrs x 98.32 => about $70k
  • To support about 15k users, this would be 10x => $700k/mo
  • For reference, with Open AI this would cost about $50k/mo
  • Keep in mind that GPT-4 is significantly better than any open source model

Evals

  • The evals are dubious. We don’t have to single out Databricks for this. All evals have marketing flavor to them. Ground truth is for us to figure out ;-)
  • DBRX is compared against Mixtral (8x7B MoE model which has about 12B parameters active per inference). So, DBRX is 3x the size of Mixtral but offers only a wee bit of performance gain. This performance gain is not good enough to incur the incremental cost
  • There is particularly a large gap on Human Eval (this is programming benchmark), but Mixtral was not trained for code generation, so this is expected
  • This is the first open model that has been trained on 12T tokens, so a lot of them might be coding related
  • As is our experience even with GPT models, real world applications require a lot more than just the evals

Training data purity & legal coverage

  • The training data details are not revealed. In fact, when asked the question on training data was evaded (see TC article)
  • No legal indemnity cover, which Open AI, MSFT and Google provide currently i.e. if anybody sues a user for data privacy/security infringement, users are on their own
  • Several folks in open source community have complained about the openness in the license terms, where it restricts improving other models, commercial use, etc.

Bottom line

While this came out as an open source model, it is more of an open weight model. Nonetheless, the benefits to open source users/developers would be minimal because this is too huge and therefore expensive to create derivatives unlike Llama-2 and Mistral.

To me, this is more of a signal to the market that Databricks can be an option to build custom LLMs, using their stack* in about $10M (*Spark, Unity Catalog and Lilac AI).

That sounds ok-ish but counter points are – Open AI announced back in Nov’23 that they will enable custom models for enterprises for around $2-3M. I guess the catch is that the data has to go to Azure and maybe the trained model still stays there. But is it worth 5x cost? I doubt it. Large clients have already set up Azure Open AI services and have been using them, so unless the cost is matched, I don’t see them switching. DBRX is still way inferior to GPT-4, and could be further so against a custom model

On the other hand, it is a bit concerning that more and more “open” models are also edging towards larger size and MoE architectures rather than figuring out how to make them parameter efficient by leveraging scaling laws (like Mistral). Looks like MoE architecture is here to stay but I am hoping we will go back to making smaller models stronger. This is where open source community can thrive

References:

https://www.databricks.com/blog/announcing-dbrx-new-standard-efficient-open-source-customizable-llms

https://techcrunch.com/2024/03/27/databricks-spent-10m-on-a-generative-ai-model-that-still-cant-beat-gpt-4/

https://www.reddit.com/r/OpenAI/comments/17pyuut/openai_is_charging_23_million_for_custom_models/

Note: These are my own views and do not reflect my employer's

Yu Cheng Tao

NUS Data Science & Economics

6 个月

Would like to learn this

回复
Gaurav Tiwari

Associate Principal at ZS

11 个月

Very insightful, thanks for sharing

回复

要查看或添加评论,请登录

Srinivas Chilukuri的更多文章

  • Generative AI: 2023 Recap and 2024 Predictions

    Generative AI: 2023 Recap and 2024 Predictions

    Generative AI is perhaps the most discussed topic last year. As we wrapped up 2023 and are welcoming the new year 2024,…

    6 条评论
  • Thoughts on Gemini

    Thoughts on Gemini

    After much anticipation, Gemini has been announced this week (on 6-Dec). There have been rumors that it is being pushed…

    1 条评论
  • Ontology of Trustworthy AI

    Ontology of Trustworthy AI

    What do these words have in common: Person, Woman, Man, Camera, TV? They are part of a cognitive test that measures…

    2 条评论

社区洞察

其他会员也浏览了