DeepSeek: When Innovation Shines
Note: This section is part of a longer blog post on Model Size and its impact on performance and inference (to be published next week)
It would be remiss of me not to mention why DeepSeek is causing such a stir and why people and financial markets are losing their minds over it.
Training LLMs is expensive, and companies like OpenAI and Anthropic spend between $70 and $150M (rumoured) on compute per model. Let that sink in - that's just for Compute. They need massive data centres packed with tens of thousands of GPUs to make this happen. Everyone assumed that better AI models needed more compute power, which meant hundreds of millions in investment. Until now.
DeepSeek flips this script by building a model that matches or even beats GPT-4 and Claude on many tasks - and they do it with just under $6M (see footnote 1). That's like getting a Ferrari's performance for the price of a Toyota. They pulled this off with several clever innovations:
领英推荐
The cherry on top - it's open-source with a very generous MIT license which allows for unrestricted commercial use. This could unleash a wave of innovation from developers, researchers, and creators who were previously priced out of the AI race. Sometimes the biggest breakthroughs come not from throwing more resources at a problem, but from fundamentally rethinking how we solve it.
Technology Consulting | People Connector | Salesforce Enthusiast
1 个月Thanks sharing and summaries the long post for the readers ???? I like ???? That's like getting a Ferrari's performance for the price of a Toyota.
Driving Innovative Agentic AI Solutions with Global System Integrators and Top Consulting Partners @ Salesforce across US and Canada
1 个月The distillation of OpenAI's data quality into their own efficient data for learning is definitely a cool headline here Anup Jadhav. The other that sticks out for me is the RL that produced its super-powered approach to CoT. That is going to be the creative spark from RL that all others incorporate to power reasoning. With it the agentic era is really going to take off! I believe they also didn't even need to do RLHF!?! ??
Salesforce CTA - Slalom
1 个月If I understand it correctly we are now into a world where LLMs are training LLMs. And what DeepSeek have achieved is a highly optimised architecture for doing that. I believe the data that they have trained the model on is in fact closed source, but it is known to be a mixture of other open source and commercial models. The irony is OpenAI can hardly complain that their model was used to train DeepSeek, given OpenAI used copyrighted material for training.
Chief Technical Officer | AWS Cloud Architect & Engineer | Salesforce Architect | MVP Hall of Fame
1 个月Thanks for this Anup Jadhav perfect
Thank you Anup for this summary. Attempting to use an analogy: are they offering something similar to columnar database benefits over relational, when columnar made massive strides in performance and cost - as long as we knew how to handle the nulls in the database?