登录查看更多内容

What's Deep about DeepSeek?

Arun Krishnan

Entrepreneur, Technology Leader, Business Leader, Analytics, AI, GenAI, Author Experienced Data Science and AI professional and leader. Driving Data Science, AI and GenAI technology and business growth

发布日期: 2025年1月27日

Deepseek has taken the LLM world by storm, achieving parity with the latest models from OpenAI at a fraction of the stated cost and with much smaller models. I am sure folks are wondering how they did all that! I delved into their paper given here.

Here is what I learnt!

The most significant part of the Deepseek approach is the use of Reinforcement Learning -- a reward system -- to help the model understand which reasoning paths are better than others.

The biggest difference is in their reinforcement learning policy. They call it "Group Reinforcement Learning Policy" or GRPO as opposed to the Proximal Policy Optimization of PPO that is typically used and shown below, taken from their earlier paper here

In PPO, along with the reference and reward models, you have a Value Model that is roughly the same size as the other two models leading to computational and memory burden. Moreover, this Value Model is treated as the baseline.

In GRPO on the other hand, the Value Model is removed and the baseline is obtained as "an average reward of multiple sample outputs, produced in response to the same question, as the baseline."

"More specifically, for each question ??, GRPO samples a group of outputs {??1, ??2, · · · , ????} from the old policy ?????????? and then optimizes the policy model by maximizing the following objective:"

领英推荐

OpenAI has o1

AIM Events 6 个月前

LLMs and Contract Intelligence, Part II Reasoning…

Leonard Park 2 周前

An overview of some available Fairness Frameworks &…

Murat Durmus 4 年前

Don't be too scared by the function. The first part is the expectation that, given a question q, the output matches an output based on the old policy. The second part is a ratio of the outputs derived from the current and the old policy models with Ai,t being an advantage calculated based on relative rewards of the outputs based on each group. Think of it as taking some measured action to the outputs.

The last part, Dkl, known as the Kullbach-Liebler Divergence, is a penalty applied to indicate the difference of the current policy model from the reference model.

And this is what gives the model the power to rapidly understand new pathways for reasoning through chain-of-thought modifications, saving on huge amounts of training data required in supervised modelling, otherwise.

And THIS is what has made Deepseek so powerful that with way less training data than OpenAI o1, they are still able to meet their benchmark standards.

In a way, this makes sense. Deepseek is training the model a human being learns. By adapting and learning from mistakes and the feedback provided on the errors.

Paraphrasing an old Chinese saying, 'we do live in interesting times!'

Sakthi Kannan Elayaraja Eswaran Sridhar Mahadevan Sree Balaji Olivier Travers

Yashan Kumar

IT professional

1 个月

Insightful

Anand Prahlad

Technology Risk|Information Security|Business Continuity|Enterprise Software|Products

1 个月

Good one Arun Krishnan. So, their approach is a " hey LLM, get trained in a group setting versus the old value model way where it is a 1-on-1 tutoring"...? Their cost savings is on the training side right? Not on the inference or running-the-model side? Pray, do educate us non-math folks some more kindly

Anand K.

Applied datascience

1 个月

Hi Arun, thanks for sharing this paper and your summary. for PPO, softmax assigns the probailities, so, how can multi group reward model create better results ?. fascinating and look forward to experimentation

Uday Bhaskarwar

1 个月

Thanks for explaining in “simple” words ??

1 次回应

Elayaraja Eswaran

Chief Architect | Senior Vice President – Data & Analytics | Microsoft at iLink Digital

1 个月

Nice Summary. Great progress for a 2 year old startup. On a lighter note, "China products are comparatively cheap while offering unique features". Still, real-world use cases will be the true test.

2 次回应

查看更多评论

要查看或添加评论，请登录

Arun Krishnan的更多文章

A new architecture that incorporates more human-like memory features

2025年1月28日

A new architecture that incorporates more human-like memory features

The one huge drawback of attention models that are ubiquitous in LLMs, is the fact that the memory requirements can…

3 条评论
BertViz - Visualizing Attention in Transformers

2024年6月25日

BertViz - Visualizing Attention in Transformers

With the increasing use of LLMs and Transformers in organisations, users are starting to demand explainability from…
Buffer-of-Thought Prompting

2024年6月20日

Buffer-of-Thought Prompting

With use cases becoming more and more complicated and agent-based systems becoming the norm for #GenerativeAI based…

1 条评论
To Embed or not to Embed ...

2023年12月12日

To Embed or not to Embed ...

Everyone by now, ought to be familiar with the Retrieval-Augmented Generation (RAG) approach, wherein documents or text…
The GenAI conundrum

2023年11月30日

The GenAI conundrum

So you are the CEO of a company and have heard of this wonderful new toy called Generative AI. You call a meeting of…

9 条评论
Understanding the craft of writing

2023年6月15日

Understanding the craft of writing

I have never written an article about writing. Even though I have published my first novel and three more are already…
Generating Images with Large Language Model (GILL)

2023年6月13日

Generating Images with Large Language Model (GILL)

By now, we all know that LLMs work by creating embeddings of sentences in a large, multi-dimensional textual space…

2 条评论
Are neural networks actually starting to replicate the functioning of the human brain?

2023年5月25日

Are neural networks actually starting to replicate the functioning of the human brain?

Artificial Neural Networks (ANNs), as the name suggests were patterned after the way we thought the human brain worked.…

2 条评论
Claude and "Constitutional" AI

2023年5月23日

Claude and "Constitutional" AI

For a while now, I have been of the firm opinion that we need to build in Asimov's Three Laws of Robotics into our AI…
All about Chain-of-Thought (CoT)Prompting

2023年5月15日

All about Chain-of-Thought (CoT)Prompting

The rapidity with which LLM models have been progressing has been nothing short of stunning. The last few months have…

5 条评论

See all articles

What's Deep about DeepSeek?

Arun Krishnan

Entrepreneur, Technology Leader, Business Leader, Analytics, AI, GenAI, Author Experienced Data Science and AI professional and leader. Driving Data Science, AI and GenAI technology and business growth

领英推荐

Arun Krishnan的更多文章

社区洞察

其他会员也浏览了

What is Machine Learning?

Unlocking the World of Machine Learning: A Beginner's Roadmap to Algorithm Selection

??Replication Through Exploration: The Journey to Understanding OpenAI's o1

How OpenAI's Latest API Changes Make Life Easier for Users

Machine Learning for Beginners: A No-Nonsense Guide to ML Algorithms

AI CEO Blender, a GPT from agilend

Master Any Machine Learning Model: A Beginner’s Guide to Learning with Synthetic Data and Generative AI

A Comprehensive Guide To Deep Q-Learning For Data Science Enthusiasts

A Comprehensive Guide To Deep Q-Learning For Data Science Enthusiasts

A Comprehensive Guide To Deep Q-Learning For Data Science Enthusiasts

领英推荐

Arun Krishnan的更多文章

A new architecture that incorporates more human-like memory features

BertViz - Visualizing Attention in Transformers

Buffer-of-Thought Prompting

To Embed or not to Embed ...

The GenAI conundrum

Understanding the craft of writing

Generating Images with Large Language Model (GILL)

Are neural networks actually starting to replicate the functioning of the human brain?

Claude and "Constitutional" AI

All about Chain-of-Thought (CoT)Prompting

社区洞察

其他会员也浏览了

What is Machine Learning?

Unlocking the World of Machine Learning: A Beginner's Roadmap to Algorithm Selection

??Replication Through Exploration: The Journey to Understanding OpenAI's o1

How OpenAI's Latest API Changes Make Life Easier for Users

Machine Learning for Beginners: A No-Nonsense Guide to ML Algorithms

AI CEO Blender, a GPT from agilend

Master Any Machine Learning Model: A Beginner’s Guide to Learning with Synthetic Data and Generative AI

A Comprehensive Guide To Deep Q-Learning For Data Science Enthusiasts

A Comprehensive Guide To Deep Q-Learning For Data Science Enthusiasts

A Comprehensive Guide To Deep Q-Learning For Data Science Enthusiasts