登录查看更多内容

DeepSeek: When Innovation Shines

Anup Jadhav

发布日期: 2025年1月29日

Note: This section is part of a longer blog post on Model Size and its impact on performance and inference (to be published next week)

It would be remiss of me not to mention why DeepSeek is causing such a stir and why people and financial markets are losing their minds over it.

Training LLMs is expensive, and companies like OpenAI and Anthropic spend between $70 and $150M (rumoured) on compute per model. Let that sink in - that's just for Compute. They need massive data centres packed with tens of thousands of GPUs to make this happen. Everyone assumed that better AI models needed more compute power, which meant hundreds of millions in investment. Until now.

DeepSeek flips this script by building a model that matches or even beats GPT-4 and Claude on many tasks - and they do it with just under $6M (see footnote 1). That's like getting a Ferrari's performance for the price of a Toyota. They pulled this off with several clever innovations:

领英推荐

Future Beat: Powerful predictions

The National News 8 个月前

Palm-Sized Petaflops Meet $6M AI- And Poof

Sharp Decisions 1 个月前

??? DeepSeek: Ferrari Performance or Temu Economics?

Nido Ventures 1 个月前

Think of traditional AI as storing numbers like your bank balance with tons of decimal places (32 of them). DeepSeek found a way to do the same calculations accurately with just 8 decimal places. That means they need way less memory to get the same results.
They took the innovation of "expert systems" and sparse models to another level, so instead of one big model trying to know everything, they have specialised experts that only wake up when needed. Very similar to Mixtrals LLM.
Their "multi-token" system is like reading multiple words at once instead of one at a time - imagine reading "New York City" as one chunk rather than three separate words. This makes it at least 2x faster and works correctly 90% of the time. This makes a huge difference when you're processing billions of words.

The cherry on top - it's open-source with a very generous MIT license which allows for unrestricted commercial use. This could unleash a wave of innovation from developers, researchers, and creators who were previously priced out of the AI race. Sometimes the biggest breakthroughs come not from throwing more resources at a problem, but from fundamentally rethinking how we solve it.

There is some contention around the reported $6 million training cost for DeepSeek-R1; it's likely being confused with DeepSeek-V3, released last December.

Sita Kunz SKR ?????????

Technology Consulting | People Connector | Salesforce Enthusiast

1 个月

Thanks sharing and summaries the long post for the readers ???? I like ???? That's like getting a Ferrari's performance for the price of a Toyota.

1 次回应

Ian Douglas

Driving Innovative Agentic AI Solutions with Global System Integrators and Top Consulting Partners @ Salesforce across US and Canada

1 个月

The distillation of OpenAI's data quality into their own efficient data for learning is definitely a cool headline here Anup Jadhav. The other that sticks out for me is the RL that produced its super-powered approach to CoT. That is going to be the creative spark from RL that all others incorporate to power reasoning. With it the agentic era is really going to take off! I believe they also didn't even need to do RLHF!?! ??

1 次回应

Mark Wraith

Salesforce CTA - Slalom

1 个月

If I understand it correctly we are now into a world where LLMs are training LLMs. And what DeepSeek have achieved is a highly optimised architecture for doing that. I believe the data that they have trained the model on is in fact closed source, but it is known to be a mixture of other open source and commercial models. The irony is OpenAI can hardly complain that their model was used to train DeepSeek, given OpenAI used copyrighted material for training.

8 次回应

Michael Gill

Chief Technical Officer | AWS Cloud Architect & Engineer | Salesforce Architect | MVP Hall of Fame

1 个月

Thanks for this Anup Jadhav perfect

1 次回应

Mehmet G?kmen Orun

1 个月

Thank you Anup for this summary. Attempting to use an analogy: are they offering something similar to columnar database benefits over relational, when columnar made massive strides in performance and cost - as long as we knew how to handle the nulls in the database?

1 次回应

查看更多评论

要查看或添加评论，请登录

Anup Jadhav的更多文章

From Coding to Spec-Writing

2025年3月8日

From Coding to Spec-Writing

Navigating AI's Impact on Developers reposted from: https://www.anup.
Choosing Between RAG, Fine-Tuning, or Hybrid Approaches for LLMs

2025年3月3日

Choosing Between RAG, Fine-Tuning, or Hybrid Approaches for LLMs

A structured guide for AI engineers making architecture decisions Reposted from: https://www.anup.

1 条评论
LLM Security 101: Defending Against Prompt Hacks

2025年2月18日

LLM Security 101: Defending Against Prompt Hacks

Reposted from: https://www.anup.
Thinking Smarter, Not Harder: How LLMs Can Learn on the Fly

2025年2月5日

Thinking Smarter, Not Harder: How LLMs Can Learn on the Fly

..
From ML to AI Engineering: Transforming How We Build AI Applications

2025年1月27日

From ML to AI Engineering: Transforming How We Build AI Applications

Reposted from - https://www.anup.
Agentforce Use Case Evaluation: From Risk Assessment to Implementation

2025年1月21日

Agentforce Use Case Evaluation: From Risk Assessment to Implementation

Successful Agentforce implementations aren't just about the agents—they're about the entire ecosystem. Reposted from…

10 条评论
What is an AI Agent?

2025年1月13日

What is an AI Agent?

Gentle introduction to Agentic Systems An AI Agent refers to a system or program that is capable of autonomously…
LLM Transformer Overview ...for the busy AI Engineer

2025年1月3日

LLM Transformer Overview ...for the busy AI Engineer

[Reposted from https://www.anup.
A Guide To Predictive & Generative AI With Salesforce Einstein

2024年5月18日

A Guide To Predictive & Generative AI With Salesforce Einstein

Salesforce has been at the forefront of the AI revolution for nearly a decade, with the launch of Einstein in 2016. The…

7 条评论
Possibilities with Salesforce Evergreen

2020年3月2日

Possibilities with Salesforce Evergreen

Salesforce introduced Evergreen at Dreamforce 2019. It came as a surprise announcement (in a good way) since I was…

4 条评论

See all articles

DeepSeek: When Innovation Shines

Anup Jadhav

领英推荐

Anup Jadhav的更多文章

社区洞察

其他会员也浏览了

The future of AI is physical

The Generative A.I. Brief Issue #5

What DeepSeek R1 Teaches Us About the Future of AI

Addepto AI & Tech Digest: September Edition 2024

What is DeepSeek? How Can It Transform Businesses?

Meet Imandra

DeepSeek: A New Contender in the AI Landscape

Essential AI News from 7/1 - 7/5

Intelligencer Vol 4:

DeepSeek and the Quest for Smarter Machine Learning Solutions

领英推荐

Anup Jadhav的更多文章

From Coding to Spec-Writing

Choosing Between RAG, Fine-Tuning, or Hybrid Approaches for LLMs

LLM Security 101: Defending Against Prompt Hacks

Thinking Smarter, Not Harder: How LLMs Can Learn on the Fly

From ML to AI Engineering: Transforming How We Build AI Applications

Agentforce Use Case Evaluation: From Risk Assessment to Implementation

What is an AI Agent?

LLM Transformer Overview ...for the busy AI Engineer

A Guide To Predictive & Generative AI With Salesforce Einstein

Possibilities with Salesforce Evergreen

社区洞察

其他会员也浏览了

The future of AI is physical

The Generative A.I. Brief Issue #5

What DeepSeek R1 Teaches Us About the Future of AI

Addepto AI & Tech Digest: September Edition 2024

What is DeepSeek? How Can It Transform Businesses?

Meet Imandra

DeepSeek: A New Contender in the AI Landscape

Essential AI News from 7/1 - 7/5

Intelligencer Vol 4:

DeepSeek and the Quest for Smarter Machine Learning Solutions