Why is DeepSeek More Efficient than ChatGPT?: The Library Analogy.

Why is DeepSeek More Efficient than ChatGPT?: The Library Analogy.

This week, we had an amazing webinar with over 300 attendees connected, discussing "What comes after DeepSeek?". Thanks CDO LATAM and Pacífico Business School for the invitation.

During the session, I didn’t get the chance to fully explain why I believe DeepSeek—and this new way of building LLMs—is a real game changer.

So here’s my take:

Imagine a huge university library. ChatGPT is like an assistant who, when asked a question, runs through every single aisle, checking every book until it finds an answer. DeepSeek, on the other hand, doesn’t just search randomly—it knows exactly which aisle to go to, which shelf to check, and even which pages to read.

Those of us who have always focused more on math and statistics than just raw computational power (crazy, I know) see something different happening. The kinds of innovations that used to only exist in academic journals are now making their way into industry, shaping how AI is built and deployed.

Why is DeepSeek More Efficient than ChatGPT?: The Library Analogy

Imagine that ChatGPT and DeepSeek are two assistants in a gigantic university library (like Harvard's), whose mission is to answer questions as quickly and efficiently as possible.

1. ChatGPT (Monolithic, No MoE)

  • When you ask: "What is the preference curve?", the assistant (ChatGPT) walks through every aisle of the library, checking every book until it finds the best answer.
  • Disadvantage: Although it responds accurately, it wastes time and resources exploring irrelevant sections, such as macroeconomics or economic history.

2. DeepSeek (Optimized with MoE and MLA)

Step 1: Understanding the Question and Categorizing the Topic (Tokens and Parameters)

  • Tokens: Each word in the question is processed and converted into a numerical representation (vector).
  • Parameters: These are the internal rules that determine how to interpret the question. Example: “Preference curve” → Microeconomics.

Step 2: Using MoE (Mixture of Experts) Instead of Searching the Entire Library

  • DeepSeek determines that the question belongs to Microeconomics and only consults the "experts" in this field.
  • Load Balancing: It activates only the most relevant experts, avoiding wasting resources on irrelevant sections. It only takes the "Introduction to Microeconomics" books instead of searching everything.

Step 3: Optimization with MLA (Multi-Head Latent Attention)

  • Once DeepSeek has located the correct aisle, it does not go through all the books one by one.
  • MLA acts as an internal compass, quickly identifying the most relevant shelves within the aisle.
  • It's like DeepSeek has a mental map telling it exactly where to search.

Step 4: Selecting Only the Most Relevant Pages

  • Finally, DeepSeek accesses only the specific pages that contain the correct answer, rather than reading entire books.
  • It then generates a response based on this information.

?So...

A LLM (Large Language Model) like DeepSeek or ChatGPT trains parameters, not tokens or answers themselves.

For example, if someone asks, "What is the preference curve?" and the model initially answers with macroeconomics information instead of microeconomics, it will adjust its load balancing in MoE to activate the correct experts in the next round.

要查看或添加评论,请登录

Diego Vallarino, PhD (he/him)的更多文章