A short reflection on DeepSeek
The morning the articles about DeepSeek R1 were released, I read an article detailing the story of The K?ln Concert. If you're not familiar, the K?ln Concert is considered one of the greatest jazz concerts/jazz albums in history. It was performed by Keith Jarrett, one of the greatest jazz pianists. However, that day, the requested B?sendorfer 290 Imperial (a piano valued at over $500,000) could not be found; instead, they had a baby grand piano typically used for teaching lessons, and it was rather poor in quality. To make matters worse, the concert began late at 11 PM, and Jarrett was tired, hungry, and in pain. It seemed destined to be a disaster, yet that concert became the greatest jazz concert in history. Jarrett played skillfully to mask the limitations of the inadequate baby piano, using the narrow range to create a symphony that lures and hypnotizes.?
Later that day, I learned about DeepSeek R1. The situation is similar: constrained by not having the best GPU available, they made the most of what they had and created something exciting. Once again, talent prevails over expensive tools. I don’t mean to suggest that the U.S. data scientists at OpenAI lack talent; there’s nothing wrong with utilizing the best tools available. However, human talent has once again outshined the mere availability of the most expensive tools. “But, but, but,” I can hear you saying. They did steal data from OpenAI and distilled o1. Maybe... and OpenAI didn't discover it until much later. Yet the distilled model performs better than the original? They also had access to more expensive GPUs than they declared. Perhaps we’ll find out when people attempt to reproduce the results, as mentioned earlier. To clarify a few points: Do I believe that R1 costs only $6 million to train? No, the $6 million figure refers to V3 (the base for R1). They make no claims about the cost of R1 in the paper, and I wouldn’t be surprised if there were a little fudge factor in the numbers. This isn’t to downplay the fact that they trained a model for a fraction of what the major player spent. In the paper about V3, they explain that they employed several well-known open-source techniques. I don't wish to minimize their work, but this is a classic case of “standing on the shoulders of giants.”
R1 is more intriguing. The secret ingredient of modern LLMs has been RLHF, Reinforcement Learning from Human Feedback. In both machine and human learning, two different mechanisms are at play: imitation learning, such as reading books and completing quizzes, and reinforcement learning, for example, writing a complete essay that is graded. RLHF utilizes the second approach but incorporates human judges to evaluate the LLM’s work. No one has ever managed to eliminate human feedback and run purely RL for LLMs (disclosure: I'm an advisor for NeoDynamics, a startup aiming to simplify RL). That is the contribution of DeepSeek with the R1 model, and I'm excited to see what this evolution will unlock. Just to provide a sneak peek: AlphaGo, the software that defeated Go master Lee Sedol, used Reinforcement Learning to refine its Go skills.?
If there's one lesson to draw from the entire DeepSeek saga, it's that LLMs are still a relatively young technology, where open research proves to be more effective than closed development, and human talent remains the primary ingredient in innovation.