Thoughts on DBRX
DBRX was announced last week and created some buzz. Is it good? What is it good for? and how does it really fare against the alternatives from a practical enterprise usage perspective? Let us dive into this “open source” LLM
Model overview
Tokenomics
Evals
Training data purity & legal coverage
领英推荐
Bottom line
While this came out as an open source model, it is more of an open weight model. Nonetheless, the benefits to open source users/developers would be minimal because this is too huge and therefore expensive to create derivatives unlike Llama-2 and Mistral.
To me, this is more of a signal to the market that Databricks can be an option to build custom LLMs, using their stack* in about $10M (*Spark, Unity Catalog and Lilac AI).
That sounds ok-ish but counter points are – Open AI announced back in Nov’23 that they will enable custom models for enterprises for around $2-3M. I guess the catch is that the data has to go to Azure and maybe the trained model still stays there. But is it worth 5x cost? I doubt it. Large clients have already set up Azure Open AI services and have been using them, so unless the cost is matched, I don’t see them switching. DBRX is still way inferior to GPT-4, and could be further so against a custom model
On the other hand, it is a bit concerning that more and more “open” models are also edging towards larger size and MoE architectures rather than figuring out how to make them parameter efficient by leveraging scaling laws (like Mistral). Looks like MoE architecture is here to stay but I am hoping we will go back to making smaller models stronger. This is where open source community can thrive
References:
https://www.databricks.com/blog/announcing-dbrx-new-standard-efficient-open-source-customizable-llms
Note: These are my own views and do not reflect my employer's
NUS Data Science & Economics
6 个月Would like to learn this
Associate Principal at ZS
11 个月Very insightful, thanks for sharing