Why is DeepSeek More Efficient than ChatGPT?: The Library Analogy.
Diego Vallarino, PhD (he/him)
Immigrant | Global AI & Data Strategy Leader | Quantitative Finance Analyst | Risk & Fraud ML-AI Specialist | Ex-Executive at Coface, Scotiabank & Equifax | Board Member | PhD, MSc, MBA | EB1A Green Card Holder
This week, we had an amazing webinar with over 300 attendees connected, discussing "What comes after DeepSeek?". Thanks CDO LATAM and Pacífico Business School for the invitation.
During the session, I didn’t get the chance to fully explain why I believe DeepSeek—and this new way of building LLMs—is a real game changer.
So here’s my take:
Imagine a huge university library. ChatGPT is like an assistant who, when asked a question, runs through every single aisle, checking every book until it finds an answer. DeepSeek, on the other hand, doesn’t just search randomly—it knows exactly which aisle to go to, which shelf to check, and even which pages to read.
Those of us who have always focused more on math and statistics than just raw computational power (crazy, I know) see something different happening. The kinds of innovations that used to only exist in academic journals are now making their way into industry, shaping how AI is built and deployed.
Why is DeepSeek More Efficient than ChatGPT?: The Library Analogy
Imagine that ChatGPT and DeepSeek are two assistants in a gigantic university library (like Harvard's), whose mission is to answer questions as quickly and efficiently as possible.
1. ChatGPT (Monolithic, No MoE)
2. DeepSeek (Optimized with MoE and MLA)
Step 1: Understanding the Question and Categorizing the Topic (Tokens and Parameters)
Step 2: Using MoE (Mixture of Experts) Instead of Searching the Entire Library
Step 3: Optimization with MLA (Multi-Head Latent Attention)
Step 4: Selecting Only the Most Relevant Pages
?So...
A LLM (Large Language Model) like DeepSeek or ChatGPT trains parameters, not tokens or answers themselves.
For example, if someone asks, "What is the preference curve?" and the model initially answers with macroeconomics information instead of microeconomics, it will adjust its load balancing in MoE to activate the correct experts in the next round.