DeepSeek Progression
DeepSeek-R1 Benchmark Performance

DeepSeek Progression

The massive stock drop certainly stimulated me to learn more about DeepSeek so I figured I would share it with you here.

DeepSeek has released the following models building on top of each one in relatively rapid succession:

  • DeepSeek-LLM (October 2023): 67B and 7B
  • DeepSeek Coder (November 2023): 1.3B to 33B
  • DeepSeek-V3 (Late 2024): Mixture-of-Experts (MoE), 671B with 37B activated per token
  • DeepSeek-R1 (January 20, 2025): Advanced reasoning and problem solving. Reinforcement Learning (RL), 1.5B to 70B. Self-verification, step-by-step reasoning, 128k context length
  • Janus Pro (January 27, 2025): Multimodal AI models for Image Analysis and creation. 1B to 7B

Training flow of DeepSeek-R1 Consists of a four-phase pipeline


DeepSeek-R1 Training Phases

  1. Cold Start (Phase 1): The pre-trained DeepSeek-V3-Base model undergoes supervised fine-tuning on a small dataset of high-quality, readable samples collected from DeepSeek-R1-Zero. This phase helps mitigate readability issues. In this phase, the base model was first fine-tuned with thousands of long Chain-of-Thought (CoT) examples.
  2. Reasoning Reinforcement Learning (Phase 2): Large-scale reinforcement learning or technically known as Group Relative Optimization (GRPO) is applied to enhance the model's reasoning capabilities, particularly in tasks like coding, math, science, and logic reasoning. GRPO samples a group of outputs from old policy and then iteratively optimizes the policy model by maximizing the objective function. An accuracy model then evaluates whether the response is correct. Self-evolution is a very interesting behavior where the capabilities of the model is done autonomously. It leads to behaviors such as reflection where the model revisits and reevaluate its previous steps and explores alternative approaches.
  3. Rejection Sampling and Supervised Fine-Tuning (Phase 3): The model generates numerous samples, retaining only correct and readable ones through rejection sampling. A generative reward model, DeepSeek-V3, aids in sample selection. 600k reasoning samples were collected for this purposes. The model is then fine-tuned on this dataset, which includes both reasoning-oriented questions and broader domain knowledge.
  4. Diverse Reinforcement Learning (Phase 4): The final phase incorporates diverse tasks. Rule-based rewards are used for tasks that allow it, such as math problems. For other tasks, a language model provides feedback to align the model with human preferences.
  5. Distillation: The 800k samples curated from DeepSeek-R1 was used to fine-tuned (SFT) other open source models like Qwen and Llama to demonstrate significant enhancements in their reasoning abilities.

Inference flow of DeepSeek-R1

The inference flow for DeepSeek-R1 involves several key steps:

  1. Input processing: The model receives a user query or prompt.
  2. Reasoning generation: DeepSeek-R1 employs its "DeepThink" feature to generate a step-by-step reasoning process.
  3. Visible reasoning: Unlike some models that conceal their reasoning, DeepSeek-R1 displays its reasoning steps to users, enhancing transparency and interpretability.
  4. Final response: After the reasoning process, the model produces its final answer or output.
  5. API interaction: When using the DeepSeek-R1 API, developers can access both the reasoning content and the final response separately.

Is the $5.6 million training cost reasonable?

I do not believe the $5.6 million is reasonable assuming the pre-training alone requires 2.664 million H800 GPU hours. H800 Hopper architecture is an adaptation of the H100 GPU for the Chinese market, with reduced chip-to-chip interconnect speed (50% less), per US regulation. The $2 per H800 GPU cost seems low to me and could be as high as $7. This also does not include the other phases during training. In any case, it is a fraction of Llama and GPT-5 training costs and is impressive.

Comparing DeepSeek R1 versus OpenAI o1 in terms of costs

Input Costs: DeepSeek R1 $0.55 per million tokens; OpenAI o1 $15 per million tokens

Output Costs: DeepSeek R1 $2.19 per million tokens; OpenAI o1 $60 per million tokens

With context cashing, DeepSeek R1 offers up 90% cost savings for repeated queries (not sure how useful this is).

The insight here is the ~30x reduction in cost will create tremendous pressure for other players to aim for (even though it is highly unlikely for US companies to leverage DeepSeek API based on its China location).

Last Words: Implication of Distillation Techniques

While knowledge distillation is a well-known method in AI to create smaller models from larger ones while retaining performance. DeepSeek's use of distillation aligns with this trend but is executed exceptionally well to create highly efficient models. What this means to me is that domain experts can create highly optimized, specialized models with modest compute budgets leading to proliferation of domain-specific AI models tailored to particular industries or tasks. Beyond, these "Small Language Models" will fit nicely into portable devices leveraging the compute powers we have in our hands, facilitating AI agents to help us in our daily lives.

要查看或添加评论,请登录

Steve Fu的更多文章

  • DeepSeek: Revolutionizing AI with Cost-Effective Innovation

    DeepSeek: Revolutionizing AI with Cost-Effective Innovation

    If you are like me, you woke up wondering what happened to 25% of the value in your stock portfolio. In a…

    15 条评论
  • Some reflections from Hot Chips 2024 (part I)

    Some reflections from Hot Chips 2024 (part I)

    I attended the Hot Chips conference from Sunday to Tuesday at Stanford and was completely impressed by the gathering…

    8 条评论
  • GPUs versus TPUs, which is better?

    GPUs versus TPUs, which is better?

    Lately, I have been thinking about how LLM processing might evolve, the associated implications and the advent of LLM…

    1 条评论

社区洞察

其他会员也浏览了