登录查看更多内容

DeepSeek Progression

Steve Fu

Seeking innovative startups driving applications leveraging Artificial Intelligence and Machine Learning.

发布日期: 2025年1月28日

The massive stock drop certainly stimulated me to learn more about DeepSeek so I figured I would share it with you here.

DeepSeek has released the following models building on top of each one in relatively rapid succession:

DeepSeek-LLM (October 2023): 67B and 7B
DeepSeek Coder (November 2023): 1.3B to 33B
DeepSeek-V3 (Late 2024): Mixture-of-Experts (MoE), 671B with 37B activated per token
DeepSeek-R1 (January 20, 2025): Advanced reasoning and problem solving. Reinforcement Learning (RL), 1.5B to 70B. Self-verification, step-by-step reasoning, 128k context length
Janus Pro (January 27, 2025): Multimodal AI models for Image Analysis and creation. 1B to 7B

Training flow of DeepSeek-R1 Consists of a four-phase pipeline

Cold Start (Phase 1): The pre-trained DeepSeek-V3-Base model undergoes supervised fine-tuning on a small dataset of high-quality, readable samples collected from DeepSeek-R1-Zero. This phase helps mitigate readability issues. In this phase, the base model was first fine-tuned with thousands of long Chain-of-Thought (CoT) examples.
Reasoning Reinforcement Learning (Phase 2): Large-scale reinforcement learning or technically known as Group Relative Optimization (GRPO) is applied to enhance the model's reasoning capabilities, particularly in tasks like coding, math, science, and logic reasoning. GRPO samples a group of outputs from old policy and then iteratively optimizes the policy model by maximizing the objective function. An accuracy model then evaluates whether the response is correct. Self-evolution is a very interesting behavior where the capabilities of the model is done autonomously. It leads to behaviors such as reflection where the model revisits and reevaluate its previous steps and explores alternative approaches.
Rejection Sampling and Supervised Fine-Tuning (Phase 3): The model generates numerous samples, retaining only correct and readable ones through rejection sampling. A generative reward model, DeepSeek-V3, aids in sample selection. 600k reasoning samples were collected for this purposes. The model is then fine-tuned on this dataset, which includes both reasoning-oriented questions and broader domain knowledge.
Diverse Reinforcement Learning (Phase 4): The final phase incorporates diverse tasks. Rule-based rewards are used for tasks that allow it, such as math problems. For other tasks, a language model provides feedback to align the model with human preferences.
Distillation: The 800k samples curated from DeepSeek-R1 was used to fine-tuned (SFT) other open source models like Qwen and Llama to demonstrate significant enhancements in their reasoning abilities.

Inference flow of DeepSeek-R1

The inference flow for DeepSeek-R1 involves several key steps:

领英推荐

Beyond Human Limits: Machine Learning Propels Future…

Spruce InfoTech Inc. 9 个月前

What is Machine Learning ?

5G 6G & O-RAN 2 年前

Machine Learning

Bluechip Technologies Asia 11 个月前

Input processing: The model receives a user query or prompt.
Reasoning generation: DeepSeek-R1 employs its "DeepThink" feature to generate a step-by-step reasoning process.
Visible reasoning: Unlike some models that conceal their reasoning, DeepSeek-R1 displays its reasoning steps to users, enhancing transparency and interpretability.
Final response: After the reasoning process, the model produces its final answer or output.
API interaction: When using the DeepSeek-R1 API, developers can access both the reasoning content and the final response separately.

Is the $5.6 million training cost reasonable?

I do not believe the $5.6 million is reasonable assuming the pre-training alone requires 2.664 million H800 GPU hours. H800 Hopper architecture is an adaptation of the H100 GPU for the Chinese market, with reduced chip-to-chip interconnect speed (50% less), per US regulation. The $2 per H800 GPU cost seems low to me and could be as high as $7. This also does not include the other phases during training. In any case, it is a fraction of Llama and GPT-5 training costs and is impressive.

Comparing DeepSeek R1 versus OpenAI o1 in terms of costs

Input Costs: DeepSeek R1 $0.55 per million tokens; OpenAI o1 $15 per million tokens

Output Costs: DeepSeek R1 $2.19 per million tokens; OpenAI o1 $60 per million tokens

With context cashing, DeepSeek R1 offers up 90% cost savings for repeated queries (not sure how useful this is).

The insight here is the ~30x reduction in cost will create tremendous pressure for other players to aim for (even though it is highly unlikely for US companies to leverage DeepSeek API based on its China location).

Last Words: Implication of Distillation Techniques

While knowledge distillation is a well-known method in AI to create smaller models from larger ones while retaining performance. DeepSeek's use of distillation aligns with this trend but is executed exceptionally well to create highly efficient models. What this means to me is that domain experts can create highly optimized, specialized models with modest compute budgets leading to proliferation of domain-specific AI models tailored to particular industries or tasks. Beyond, these "Small Language Models" will fit nicely into portable devices leveraging the compute powers we have in our hands, facilitating AI agents to help us in our daily lives.

要查看或添加评论，请登录

Steve Fu的更多文章

DeepSeek: Revolutionizing AI with Cost-Effective Innovation

2025年1月28日

DeepSeek: Revolutionizing AI with Cost-Effective Innovation

If you are like me, you woke up wondering what happened to 25% of the value in your stock portfolio. In a…

15 条评论
Some reflections from Hot Chips 2024 (part I)

2024年8月30日

Some reflections from Hot Chips 2024 (part I)

I attended the Hot Chips conference from Sunday to Tuesday at Stanford and was completely impressed by the gathering…

8 条评论
GPUs versus TPUs, which is better?

2024年8月15日

GPUs versus TPUs, which is better?

Lately, I have been thinking about how LLM processing might evolve, the associated implications and the advent of LLM…

1 条评论

DeepSeek Progression

Steve Fu

Seeking innovative startups driving applications leveraging Artificial Intelligence and Machine Learning.

Training flow of DeepSeek-R1 Consists of a four-phase pipeline

Inference flow of DeepSeek-R1

领英推荐

Is the $5.6 million training cost reasonable?

Comparing DeepSeek R1 versus OpenAI o1 in terms of costs

Last Words: Implication of Distillation Techniques

Steve Fu的更多文章

社区洞察

其他会员也浏览了

Finding Humor in Machine Learning: 30 Silly Incidents to Amuse and Motivate ML Professionals

What are "Tensors"

The Ultimate Roadmap to Mastering AI: A Step-by-Step Guide

Generative Models: Noisy Channel to LLM

Advanced Techniques for Optimizing Ranking Models in Machine Learning

Classification vs. Regression in Machine Learning

Why is deep learning for con-tech so hard?

Machine Learning Vs Deep Learning

RAG Deep Dive: Understanding Vector Embeddings and Similarity Search

Day 3 of the Machine Learning: Teach by Doing Project

Training flow of DeepSeek-R1 Consists of a four-phase pipeline

Inference flow of DeepSeek-R1

领英推荐

Is the $5.6 million training cost reasonable?

Comparing DeepSeek R1 versus OpenAI o1 in terms of costs

Last Words: Implication of Distillation Techniques

Steve Fu的更多文章

DeepSeek: Revolutionizing AI with Cost-Effective Innovation

Some reflections from Hot Chips 2024 (part I)

GPUs versus TPUs, which is better?

社区洞察

其他会员也浏览了

Finding Humor in Machine Learning: 30 Silly Incidents to Amuse and Motivate ML Professionals

What are "Tensors"

The Ultimate Roadmap to Mastering AI: A Step-by-Step Guide

Generative Models: Noisy Channel to LLM

Advanced Techniques for Optimizing Ranking Models in Machine Learning

Classification vs. Regression in Machine Learning

Why is deep learning for con-tech so hard?

Machine Learning Vs Deep Learning

RAG Deep Dive: Understanding Vector Embeddings and Similarity Search

Day 3 of the Machine Learning: Teach by Doing Project