From AlphaGo to ChatGPT to DeepSeek: The Three Defining Moments of Modern AI.

From AlphaGo to ChatGPT to DeepSeek: The Three Defining Moments of Modern AI.

It is fascinating to watch how DeepSeek, a private hedge fund company based in HangZhou China created so much news in late January 2025.

From reading news articles, tech journals, combing through message boards, listening to podcasts, and getting geeky with DeepSeek algorithms throughout this week, I would characterize the overall response to the open source release of DeepSeek as chaotic, confusing, biased, and in collective shock.

As someone who came from a software programming background in AI, and stumbled into a career in data center hardware sales for the last 20+ years, I find it worthwhile to internalize current events for myself, sharing my thoughts with other tech enthusiasts, and hopefully help AI novice to understand the progression of AI development.

My own journey in AI began in graduate school by programming GNUGo, a worldwide AI project in which I was able to improve its algorithm and beat original programming by 85% on HPC clusters. Since then, I have formed my own little garage-based firm, worked for large corporations like HP, Microsoft, Hitachi and others, and mainly focused on data center architectures and sales management. This career path has given me a holistic experience from low level machine code programming, to data center hardware platform design, to server/storage/networking implementation, to cloud architecture, to internal and external data center infrastructure. While my passion is in sales management, customer interactions and finance, having embedded all of the technical knowledge above gave me a better perspective on understanding technology drivers and trends in each role I've held. I suppose the collective sum of my previous roles truly represents an MoE (Mixture of Expert) model.

As a technologist who has been following AI development in both the US and China closely, I consider DeepSeek one of the three major defining moments in recent AI history. What's remarkable is that each new advancement is being achieved more rapidly than previous milestones.

The 1st defining moment that transformed AI field from "AI Winter" to "AI spring" is DeepMind's AlphaGo back in 2016. At the time, the ancient Chinese board game of Go was considered by many AI experts to be 'the last refuge of human intelligence,' as it was believed to be impossible to program effectively. From my experience with GnuGo, even though GNUGo was the winner of the Computer Olympiad, it was not able to beat an entry level Go player. This was due to the sheer complexity of the game: the number of possible board variations is so vast that brute-force methods—calculating every possible move—are computationally infeasible. To put it into perspective, Go, a Chinese board game created thousands of years ago, has more possible moves than the total number of protons in the known universe.

Remarkably, the human brain can grasp and learn the basics of Go in 10-15 minutes, while this would not be possible for computers even with all of Earth's computing resources combined into a single supercomputer. It was the consensus in the tech world that if computers could achieve the skill level of 8-dan or 9-dan Go players, they would have reached a level of consciousness, since playing Go requires human intuition rather than just brute force calculations.

Sounds familiar?

Yes, It would require AGI (Artificial General Intelligence) to program the board game of Go. So we thought. So, imagine the shock of the AI and tech community when AlphaGO, a deep learning model developed by DeepMind, defeated South Korean pro GO player Le Sedol (ranked 11th in the world) in 2016 and Chinese Pro GO player Ke Jie (ranked #1 in the world) in 2017. It was such complete defeat of human players who had studied Go their entire lives, playing with AlphaGO was praised by GO professionals as a 'God-like' experience. Even more astonishing was the subsequent release of AlphaGo Zero and AlphaZero, which achieved a 100% win rate against their predecessors.

Yet, the AGI moment did not arrive, despite this unprecedented AI achievement. However, DeepMind's contributions have undoubtedly made leaps and bounds in progress, propelling AI, and integration of machine learning in many fields, such as biological research, robotics, game-theory, complex problem solving, and firmly securing its place in history.


The second defining moment of AI advancement is the release of conversational AI, ChatGPT, in 2022. OpenAI developed ChatGPT based on Google's research on pre-trained transformers (PT). ChatGPT's release spurred the release of competing products, including Gemini, Claude, Llama, Ernie, Grok, and Qwen. This is when investment firms started to aggressively getting big money in AI startups. On the hardware side, the biggest winners were NVIDIA, TSMC, and other AI chip producers. On the software side, Microsoft and OpenAI emerged as key players. However, a dark twist emerged: OpenAI, which was originally founded with a mission to benefit humanity through open-source objectives and a non-profit company, saw its CEO transform it into a de facto for-profit entity. This shift prioritized the interests of its executives and led to the closure of its codebase. The model proved so profitable that the Trump administration committed to a $500 billion investment goal, with OpenAI, SoftBank, and Oracle leading the effort—known as the Stargate project. Despite the controversial privatization of OpenAI, ChatGPT remains one of the most significant milestones in AI advancement since AlphaGo five years prior.


The 3rd and latest AI advancement is obviously the open source program DeepSeek released by a small Chinese hedge fund firm called High-Flyer, using its subsidiary called DeepSeek. DeepSeek released its AI coder back in November 2023, a powerful V3 version aimed at solving more complex tasks in Dec 2024, and a lean and mean energy-efficient R1 version on 1/20/2025. DeepSeek has been releasing various versions on Github and Hugging Face under the MIT open source license since Nov 2023, including a small version that can be run on a beefy laptop.

What shocked the tech world is their V3 model with $5.6M USD training cost vs. typical $100M USD cost by most AI companies, completed only over a two months period.

It's noteworthy that High Flyer has only about 100 employees, and DeepSeek began as a side project to try to squeeze more computational bandwidth out of the inferior Nvidia GPUs. Due to the US sanctions on Nvidia GPU export, DeepSeek team was only able to use A100 and H800 which have slow interconnects. Think of this company as an early-stage stock trading firm, similar to the one portrayed in the TV series Billions, but replace Bobby Axelrod with a team of math nerds. They were able to figure out the best way to improve efficiency on a bunch of slow GPUs is not using NVIDIA's default CUDA coding, but with lower-level PTX (Parallel Thread Execution) coding. In essence, DeepSeek has demonstrated a groundbreaking approach to achieving good results with less hardware in a short amount of time, offering the world a new perspective on computational and energy efficiency.

Without getting into technical details, other original creations by DeepSeek team includes these elements:

  • MoE(Mixture of Experts)
  • Reinforcement learning
  • Multi-head Latent Attention
  • Multi-Token Prediction
  • Dual Pipe
  • FP8 Mixed-Precision Training

The distillation learning method, on top of all of DeepSeek's improvements, is not a new practice in the AI industry. Since the introduction of the distillation training method in 2015, companies like Microsoft, Meta, DeepMind, and many others have extensively used distillation to train AI models. In this approach, the student model—in this case, DeepSeek—generates its own answers based on its dataset and then consults a teacher model, such as Qwen, Llama, ChatGPT, or other AI models. It compares the responses and fine-tunes its results by adopting the better answer. However, this learning model is not perfect. At times, it can produce hallucinations or incorrect outputs, and in some cases, it may inadvertently embed responses from the parent AI model."

In the world of engineering and software programming, DeepSeek's approach is as original as it gets. This was evident in the earth-shattering response on January 27, when the stock market crash led to the realization that many assumptions about data center spending needed to be reassessed. Additionally, the nature of OpenAI's closed AI model has come into question.

Since DeepSeek's open-source release of R1, numerous controversies have emerged in the US regarding the authenticity of DeepSeek's code, the validity of its efficiency claims, and even widespread conspiracy theories circulating across various US media outlets. For the general public, experienced technologists who are not well-versed in AI, or even AI experts who have not researched this topic, the situation is highly confusing due to the sheer amount of disinformation surrounding DeepSeek. Some of this stems from deliberate attacks driven by self-interest, aimed at justifying excessive spending and maintaining continued investments—such as those by Alexandr Wang at Scale AI and Microsoft, who seek to discredit DeepSeek. Others take the form of "technical criticism" that lacks a solid foundation but is presented as credible. Meanwhile, many reactions are outright hate speech, fueled by racial prejudice against Chinese, advocating for increased US sanctions and calls for technological warfare.

The most important point I want to bring up is not the technical advancement DeepSeek has brought to the world, but rather how we should respond when advancement happens in a disruptive manner.

It is true we are in a competitive world, where we compete on education, jobs, and promotions at the individual level; nations compete at an industry level with energy, science, engineering, healthcare, manufacturing, etc. However, at what point do we start to collaborate with one another? With DeepSeek's case, releasing its method openly to provide massive reduction in computational and energy costs is one of the most altruistic postures I can think of. It is a Nobel Prize-worthy gesture that benefits the entire of humanity. Instead of thanking DeepSeek profusely for its profound contributions, the masses in the US are shouting "burn it". What does this say about America's psyche today?

However insignificant as individuals, what each of us says or does has an effect. Will our actions encourage more AI companies to release their models as open source? Or does the current reaction to the open release of DeepSeek discourage more collaboration in the tech community? Do we want AI for the good of all people, or as a controlling tool for the 1%? We teach our children to broaden their horizons and learn from others' perspectives, can we do the same? If you think you have already made a decision, perhaps just try to ponder on it a bit longer.


References:

DeepSeek

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning


Joeri van Haren

Operational Excellence ? Strategic Business Planning ? Executive Leadership ? Connecting Business + IT ? Cost Controls ? Turnaround Management ~ Building Company Culture ~ Multinational Technology Organizations

2 周

How to respond to a tectonic?shift in assumptions made is a hard thing to manage, I don't think DeepSeek had at all thought that their model would face criticism on the scale it did - I am hopeful that given time, and better review of what actually has been accomplished here, opinions will change and people will look back to this as a huge positive next step in the overall progression of AI technical approach. Side note: Love your geekiness side to this - once HP curious, always HP curious! Kudos!

Lisa Burnop

Large Order - Customer Success

3 周

What a great read, thank you for sharing Harry.

Jason Milgram

CTO @ OZ | Microsoft Azure MVP (2010-present) | Army Reserve Veteran | Author | NSSAR Member

3 周

Great read, thank you Harry!

Great overview, my friend.

要查看或添加评论,请登录

Harry Wang的更多文章

社区洞察

其他会员也浏览了