DeepSeek: Why Necessity Triumphed?

DeepSeek: Why Necessity Triumphed?

Biplab Pal / [email protected] /[email protected]

Early Beginnings Under a Cloud of Constraints

In February 2016, High-Flyer—a hedge fund focused on technology-driven trading—was co-founded by an AI enthusiast, Liang Wenfeng. Before the fund’s launch, Liang had been trading since the 2007–2008 financial crisis while attending Zhejiang University. By 2019, High-Flyer had started using AI-driven strategies in its trading operations, and just two years later it relied exclusively on AI for day-to-day decisions.

What sounds like a straightforward growth path hides a deeper story of navigating constraints. U.S. sanctions on high-performance Nvidia chips were increasingly on the horizon, threatening to limit China’s access to the cutting-edge processors needed to run advanced AI. High-Flyer’s pivot from conventional trading to heavily AI-driven processes set the stage for an even more ambitious venture: creating entirely new AI models under intense resource limitations.


Founding DeepSeek: A Hedge Fund’s AI Lab Goes Solo

By April 2023, High-Flyer recognized that truly innovative AI required full-time research untethered from trading demands. The firm spun off an artificial general intelligence (AGI) lab dedicated to building large language models. In May 2023, this lab became an independent company—Hangzhou DeepSeek Artificial Intelligence Basic Technology Research Co., Ltd., or simply DeepSeek.

DeepSeek’s birth came at a fraught moment. The U.S. government had introduced strict export restrictions to limit China’s access to advanced Nvidia chips—essential hardware for training massive AI models. Conventional wisdom said: without ready access to top-tier GPUs, you cannot hope to match American AI giants. DeepSeek had other plans.

Stockpiling GPUs in Uncertain Times

Before the U.S. chip sanctions went into effect, Liang Wenfeng had quietly amassed large quantities of Nvidia A100 GPUs—some sources say over 10,000, others claim as many as 50,000. In the world of AI, GPU availability can be the difference between breakthroughs and stagnation, and this strategic stockpile became the hedge fund’s—and later DeepSeek’s—lifeline. Still, it was nowhere near the scale of a Microsoft or Google data center, and thus DeepSeek remained under-resourced compared to the likes of OpenAI or Meta.


The Underdog Approach: Innovation Born of Necessity

Faced with fewer funds and restricted GPU supply, DeepSeek had no choice but to experiment aggressively with cost-saving AI architectures. The team focused on Mixture-of-Experts, a technique that activates only certain “experts” or parts of a model for each query—reducing the hardware load and slashing inference costs. They also pioneered ways to train models with minimal supervised fine-tuning (SFT), sometimes opting for reinforcement learning from scratch.

This was not about showy R&D—it was a matter of survival. Traditional large models from big tech can cost tens of millions to train. DeepSeek, by contrast, had to innovate or risk irrelevance. Over time, these necessity-driven experiments led to DeepSeek-V2 and DeepSeek-V3, which achieved performance on par with or beyond GPT-4-level benchmarks but with far lower computational overhead.


Rising to Global Prominence

DeepSeek-V2: A Catalyst for China’s AI Model Price War

In May 2024, DeepSeek launched its V2 model, quickly dubbed the “Pinduoduo of AI” for offering a strong product at a fraction of the usual price. It cost only 2 RMB per million output tokens—a stark contrast to the more expensive services from American tech giants. The move triggered a price war among major Chinese tech companies like ByteDance, Tencent, Baidu, and Alibaba, who scrambled to cut their own AI model prices to stay competitive.

Despite the low fees, DeepSeek still managed to turn a profit. In a sense, it proved that more efficient architectures and resource usage could be just as—if not more—lucrative than the entrenched “bigger is better” approach of well-funded competitors.

DeepSeek-V3: Shaking the Foundations of AI

Then in December 2024, DeepSeek-V3 arrived with a jaw-dropping claim: 671 billion parameters, trained in around 55 days at a total cost of roughly $5.58 million—about a tenth of Meta’s last AI training budget. This was possible thanks to both deep algorithmic optimizations (like mixture-of-experts with multi-head latent attention) and careful hardware utilization. Instead of the sprawling 16,000-chip supercomputers used by leading American labs, DeepSeek had only around 2,000 specialized H800-series GPUs. Critics called it impossible—until benchmarks confirmed its performance rivaled or exceeded the best from Meta and even OpenAI.

By January 2025, DeepSeek’s free chatbot, powered by V3, soared to become the most-downloaded free app in the iOS App Store in the United States, surpassing the reigning ChatGPT. Word quickly spread that DeepSeek could solve logic problems, write reliable code, and handle nuanced questions just as well as any large-scale American AI model.

In tandem, Nvidia’s share price dropped by a staggering 18%; financial markets recognized that if advanced AI can be trained with fewer chips at lower cost, the era of GPU arms races might be coming to an end.


Open Source Stance and Talent Strategy

DeepSeek’s preference for open source set it apart from most big AI labs. The company published the model weights (under its own license), the code, and design documents, effectively inviting a global community of developers to refine or build upon their technology. This openness stands in contrast to other AI giants that keep their best-performing models locked behind closed APIs.

Simultaneously, hiring at DeepSeek focuses on talent over credentials. Recent university graduates, part-time developers, even non-computer-science specialists—anyone with demonstrable skill or unique domain knowledge could join the team. This broad mix of perspectives enhanced the model’s ability to tackle a wide array of questions, from advanced math to Chinese poetry.


Why Necessity Triumphed

While American and other global players fought for GPU supremacy, DeepSeek navigated around these resource barriers through groundbreaking research. By avoiding heavy reliance on SFT and applying reinforcement learning strategies like group relative policy optimization (GRPO), it cut training costs dramatically. Each design choice was born of a single truth: they simply did not have the money or the hardware supply to brute force their way to success.

That strategic constraint became an advantage. Instead of blindly scaling parameter counts, DeepSeek made more efficient use of the parameters it had. Lower compute requirements also enabled quicker iteration cycles, letting the team refine their models without the time delays or budget blowouts that often hobble larger competitors.


Global Impact and the “Sputnik Moment”

By early 2025, many analysts began calling DeepSeek a “Sputnik moment” for American AI, echoing the shock that the Soviet satellite launch once caused in the West’s space race. The company’s rise underscored potential limits of U.S. export restrictions and forced the entire AI ecosystem—both East and West—to reexamine what it means to build cutting-edge technology at scale.

As DeepSeek-R1 and DeepSeek-R1-Zero models rolled out (each using 671B parameters and skipping large portions of supervised fine-tuning), global investors, regulators, and technologists all realized that AI leadership is no longer just about who has the biggest GPU cluster. Efficiency, algorithmic innovation, and a willingness to challenge established norms can be far more disruptive in the long run.


How Necessity Became DeepSeek’s Superpower

DeepSeek’s success story revolves around the classic principle: necessity is the mother of invention. Faced with limited budgets, constrained compute, and geopolitical headwinds from U.S. sanctions, the startup had no choice but to rethink the AI model development process from first principles.

  • They minimized compute usage, slashing training costs and hardware needs.
  • They released open-source models to crowdsource innovation and cultivate a vibrant developer community.
  • They hired a diverse range of talent, from junior coders to non-technical experts, to enrich the AI’s understanding.

The result? An underdog startup that rattled global giants, depressed GPU-manufacturer stock prices, and delivered state-of-the-art AI at a fraction of the cost.

By turning their limitations into a structural advantage, DeepSeek has proven that scarce resources can spark the most transformative ideas. Whether one views it as an indictment of brute-force approaches or a lesson in innovation against the odds, DeepSeek’s journey shows that when conventional pathways are closed, truly original—and sometimes revolutionary—paths open up.

要查看或添加评论,请登录

Biplab Pal, PhD的更多文章

社区洞察

其他会员也浏览了