登录查看更多内容

Why is DeepSeek More Efficient than ChatGPT?: The Library Analogy.

Diego Vallarino, PhD (he/him)

Immigrant | Global AI & Data Strategy Leader | Quantitative Finance Analyst | Risk & Fraud ML-AI Specialist | Ex-Executive at Coface, Scotiabank & Equifax | Board Member | PhD, MSc, MBA | EB1A Green Card Holder

发布日期: 2025年2月28日

This week, we had an amazing webinar with over 300 attendees connected, discussing "What comes after DeepSeek?". Thanks CDO LATAM and Pacífico Business School for the invitation.

During the session, I didn’t get the chance to fully explain why I believe DeepSeek—and this new way of building LLMs—is a real game changer.

So here’s my take:

Imagine a huge university library. ChatGPT is like an assistant who, when asked a question, runs through every single aisle, checking every book until it finds an answer. DeepSeek, on the other hand, doesn’t just search randomly—it knows exactly which aisle to go to, which shelf to check, and even which pages to read.

Those of us who have always focused more on math and statistics than just raw computational power (crazy, I know) see something different happening. The kinds of innovations that used to only exist in academic journals are now making their way into industry, shaping how AI is built and deployed.

Why is DeepSeek More Efficient than ChatGPT?: The Library Analogy

Imagine that ChatGPT and DeepSeek are two assistants in a gigantic university library (like Harvard's), whose mission is to answer questions as quickly and efficiently as possible.

1. ChatGPT (Monolithic, No MoE)

When you ask: "What is the preference curve?", the assistant (ChatGPT) walks through every aisle of the library, checking every book until it finds the best answer.
Disadvantage: Although it responds accurately, it wastes time and resources exploring irrelevant sections, such as macroeconomics or economic history.

2. DeepSeek (Optimized with MoE and MLA)

Step 1: Understanding the Question and Categorizing the Topic (Tokens and Parameters)

Tokens: Each word in the question is processed and converted into a numerical representation (vector).
Parameters: These are the internal rules that determine how to interpret the question. Example: “Preference curve” → Microeconomics.

Step 2: Using MoE (Mixture of Experts) Instead of Searching the Entire Library

DeepSeek determines that the question belongs to Microeconomics and only consults the "experts" in this field.
Load Balancing: It activates only the most relevant experts, avoiding wasting resources on irrelevant sections. It only takes the "Introduction to Microeconomics" books instead of searching everything.

Step 3: Optimization with MLA (Multi-Head Latent Attention)

Once DeepSeek has located the correct aisle, it does not go through all the books one by one.
MLA acts as an internal compass, quickly identifying the most relevant shelves within the aisle.
It's like DeepSeek has a mental map telling it exactly where to search.

Step 4: Selecting Only the Most Relevant Pages

Finally, DeepSeek accesses only the specific pages that contain the correct answer, rather than reading entire books.
It then generates a response based on this information.

?So...

A LLM (Large Language Model) like DeepSeek or ChatGPT trains parameters, not tokens or answers themselves.

For example, if someone asks, "What is the preference curve?" and the model initially answers with macroeconomics information instead of microeconomics, it will adjust its load balancing in MoE to activate the correct experts in the next round.

Porandu

2,501 位关注者

要查看或添加评论，请登录

Diego Vallarino, PhD (he/him)的更多文章

Navigating the Unknown: A Conversation on New Careers, Industries, and the Power of Adaptation.

2025年2月26日

Navigating the Unknown: A Conversation on New Careers, Industries, and the Power of Adaptation.

February 2025: “Dad, what’s the point of studying a career if artificial intelligence is going to replace all jobs?”…
Algorithmic Models for Accelerating the Training and Implementation of LLMs

2025年2月18日

Algorithmic Models for Accelerating the Training and Implementation of LLMs

Large Language Models (LLMs) have revolutionized artificial intelligence and its applications across various…

1 条评论
Advancing Public Policy Design with Machine Learning and Artificial Intelligence: A Case for Evidence-Based Policymaking.

2025年2月6日

Advancing Public Policy Design with Machine Learning and Artificial Intelligence: A Case for Evidence-Based Policymaking.

In the current swiftly evolving socio-economic environment, the necessity for novel instruments to inform public policy…
DeepSeek and the AI Bubble: Are We Underestimating Disruption?

2025年1月30日

DeepSeek and the AI Bubble: Are We Underestimating Disruption?

The rapid advancements in artificial intelligence (AI) have led to an unprecedented boom in the valuation of technology…

1 条评论
The Viability of the Stargate Project: A Game-Theoretic and Power Dynamics Analysis.

2025年1月22日

The Viability of the Stargate Project: A Game-Theoretic and Power Dynamics Analysis.

The announcement of the Stargate Project marks a significant shift in the landscape of artificial intelligence (AI)…

3 条评论
Would You Trust a Meal from a Dirty Kitchen? What about the 'data kitchen' of your data provider?

2025年1月14日

Would You Trust a Meal from a Dirty Kitchen? What about the 'data kitchen' of your data provider?

Imagine walking into a restaurant, excited for a meal you've been craving. But as you pass by the kitchen window, you…
Adaptive Machine Learning (II): Tackling Model Drift with Reinforcement Learning and Attention Mechanisms.

2025年1月9日

Adaptive Machine Learning (II): Tackling Model Drift with Reinforcement Learning and Attention Mechanisms.

As I mentioned in my previous post, in today’s fast-paced digital landscape, machine learning (ML) models face a…
Adaptive and Agentive AI: Technical Innovations in Economics, Finance, Fraud, and Risk

2025年1月7日

Adaptive and Agentive AI: Technical Innovations in Economics, Finance, Fraud, and Risk

The Role of AI in Transforming Financial Systems The intersection of artificial intelligence (AI) with economic…

3 条评论
It's Always Worth Revisiting the Classics: Reflections on AI, Data, and Our Democracies.

2025年1月3日

It's Always Worth Revisiting the Classics: Reflections on AI, Data, and Our Democracies.

Some of the books I read in 2024, such as 1984 by George Orwell (1949), Fahrenheit 451 by Ray Bradbury (1953), and The…

1 条评论
Labor Network Analysis in Uruguay: A Policy Perspective Centered on the 25.000 Pesos Threshold.

2024年12月11日

Labor Network Analysis in Uruguay: A Policy Perspective Centered on the 25.000 Pesos Threshold.

1. Introduction Understanding labor market structures is key to addressing wage inequality and economic inefficiency.

See all articles

Why is DeepSeek More Efficient than ChatGPT?: The Library Analogy

1. ChatGPT (Monolithic, No MoE)

2. DeepSeek (Optimized with MoE and MLA)

?So...

Porandu

2,501 位关注者

Diego Vallarino, PhD (he/him)的更多文章

Navigating the Unknown: A Conversation on New Careers, Industries, and the Power of Adaptation.

Algorithmic Models for Accelerating the Training and Implementation of LLMs

Advancing Public Policy Design with Machine Learning and Artificial Intelligence: A Case for Evidence-Based Policymaking.

DeepSeek and the AI Bubble: Are We Underestimating Disruption?

The Viability of the Stargate Project: A Game-Theoretic and Power Dynamics Analysis.

Would You Trust a Meal from a Dirty Kitchen? What about the 'data kitchen' of your data provider?

Adaptive Machine Learning (II): Tackling Model Drift with Reinforcement Learning and Attention Mechanisms.

Adaptive and Agentive AI: Technical Innovations in Economics, Finance, Fraud, and Risk

It's Always Worth Revisiting the Classics: Reflections on AI, Data, and Our Democracies.

Labor Network Analysis in Uruguay: A Policy Perspective Centered on the 25.000 Pesos Threshold.