Hide and DeepSeek
Image generated by Amazon Nova

Hide and DeepSeek

How quickly the world changes again! On my first week back from a 6 week summer sabbatical where my mind was mostly occupied with whether it was a beach or pool day, I return back to work in the midst of the DeepSeek frenzy.

DeepSeek, or more precisely the R1 model is of course latest large language model to be unleashed onto the world from a Chinese firm that appears to offer remarkable reasoning capabilities showing comparable results to OpenAI’s o1 models. Most of the public commentary over the past few weeks has swirled around the reported tiny $6M(USD) cost to train, the tech sell off on Wall Street, this being a “sputnik moment” for the US, and where the hell did the Chinese get all those GPU’s anyway where there is supposed to be an embargo? In this hot take, I’m going to cover none of that and instead offer some of my initial experience and what I learned playing with it, but more importantly offer my perspective on why this is yet another important moment in the development of this disruptive branch of Artificial Intelligence, and how it will most likely thrust us forward even faster.

Firstly. I’m going to come right out and say that the emergence of DeepSeek R1 is genuinely a great thing for everyone. Putting aside any nationalist or political interests, whether the cost to train it were in fact true, and the AI boogie man, the release of this very capable reasoning model introduces several new advances in large language models that everyone should understand. I’m going to attempt to break them down in some very specific areas and hopefully not get overly techy.?

How open is open?

Open source software has, by in large been good for users and companies. Its principles grounded in being not only freely available, but also freely distributable and modifiable has proven to significantly accelerate the development of features, integrations and overall capabilities of the tools we use in our society today. Think about all the things we wouldn’t have today if Liuns Torvalds has never open sourced the Linux Operating System?

We are at the earliest phases of Generative AI technologies, and with LLM’s showing so much promise we must rapidly evolve in order for it to reach broadscale adoption. That means getting it into the hands in as many clever developers, scientists, engineers and builders so open source seems like a no brainer. Unfortunately, open source has been having a put of a “moment” with several high profile companies walking away from standard licenses and defining their own versions and muddying the waters. But in the world of AI, open source is much more nuanced. OpenAI, who as the name implies started out with noble intentions being “open” with their technology actually turned out to quite closed about almost everything. For this development to occur, and for trust to be gained, transparency is needed at every layer which means not just the source code of how it works, but the training data they used and weights applied. The recent announcement of the Open Weight Definition (OWD) is news that we are heading in this direction, but it ultimately comes down to the companies developing adopting these standards.

Now with DeepSeek R1, they have taken a decisive step ahead of other model providers to open source the weights used for fine tuning and training code, although the underlying training data is still unavailable and largely unknown. While still not fully transparent, these assets together with R1’s whitepaper will give others the ability to inspect, learn, evolve and even adapt its efficiencies to other models. But let me be clear, its not “open source”, but it’s the most open we’ve seen yet and it will no doubt prompt other model creators to rethink their level of transparency.

A new approach to training

Most LLM’s available today from companies like OpenAI, Anthropic, Mistral, Cohere and others use a supervised training approach with instruction based fine-tuning, this means in order to get better results you need to increase the number of parameters (now in the tens of billions) which leads to more complex number crunching which ultimately translates to needing a lot more GPU horsepower to throw at it. This is what is making them so expensive to train.

DeepSeek’s R1 is different, it uses a reinforcement learning training approach, which leverages an efficient rule based reward system compared to the complex neural reward system, all of this means it requires far few accelerators or can be done with less capable GPU’s. The architecture they use is actually called a Mixture of Experts (MoE), and by using these chips they are able to exploit a phenomenon known as “sparsity”. I ?won’t go into all the details here (but read here if you are interested) but it can effectively “switch off” large parts of the models neural network’s weights, making it more efficient to train while not materially affecting the model’s output. They pioneered a clever way around the use of tokens called Multi-Head Latent Attention (again read here for more details) for long context inference. They’ve also found some neat ways in being able to do model distillation, or compression to a small 1.5B parameters making it more efficient with its knowledge transfer.

Now I know that was all a bit heavy to pack in, and whether DeepSeek spent only $6M or more will remain to be debated, however the experience I got with it was mostly impressive. Not only were the answers well structured and contextual, but its reasoning and chain of thought was very logical. What is more important is we now have new techniques and potentially new lines of research to explore on how do the most expensive part of frontier models which is the training. And if more models can be trained this way, we can dramatically reduce the cost to develop a wide range of models, which would encourage us to apply them in significantly more use cases.

Are we back to Jevons Paradox?

Such a dramatic enhancement to efficiency has seemingly broken the cardinal rule that training frontier models is uber expensive, requiring hundreds of millions of dollars of investment. And that all the “great data center build out” predominantly led by the large cloud hyperscale’s will actually not be needed, or at least to the same extent, after all. Industry pundits and investors have at least signalled as much they set nVidia stock tumbling on Wall Street immediately after the news of DeepSeek, and even continued to dog the recent earnings reports of Amazon, Google and Microsoft as they stridently defended their infrastructure plans to the tune of tens of billions of dollars.

But these fears and concerns are largely unfounded, and we know through many generations of technological advancements like the cloud. When such efficiencies are unlocked, and the cost barrier is reduced, people will find more ways use it, not less. Several people online have made reference to the Jevons Paradox and how the better unit cost benefits will fuel greater and faster consumption as a way to foresee what is most likely going to play out.

DeepSeek’s economics for running prompts are tantalisingly attractive, they charge just $0.55 per million input tokens and $2.19 per million output tokens, which way more affordable when compared to OpenAI's API pricing of $15 and $60 for the same. ?The fact is AI economics matter not only when it comes to consumers, but companies committing to their AI projects and laying down their budgets. In 2025 as we see many companies who have experimented and validated Generative AI and will be looking to transition them into production, so this is going to be a very critical factor.

So can we trust it?

When it comes to data privacy, security and responsibility in Generative AI, there is no alternative to maintaining a very bar. The nature of how we interreact with LLMs through prompts means they have to be very good blocking harmful behaviours, not contributing to illegal activities or cybercrimes and avoid providing misinformation just to name a few. Increasingly we are seeing models incorporate reasoning transparency as part of its output to help users. The consequences of wavering on any of these areas is simply too high to calculate when you think of use cases where they are being used to give people credible advice.

Unfortunately, this is already an area where DeepSeek are already failing. We have seen a dreadful fail of the HarmBench tests with a 100% attack success rate and cloud security provider Wiz.io’s research exposed a publicly accessible database with sensitive information where they were able to get full control of operations. There is also founded evidence that information is being send back to Chinese servers. And concerns about its Chinese origins have already led several countries around the world to call for or outright block its use at least within Government.

All of this is quite worrying and DeepSeek really need to get a handle on it, however this is where my guidance for experimenting with any model comes in. Whether you are playing around, experimenting, piloting or putting into production you should always take the measures of protect yourself. Since we can’t audit training data, you should rigorously test model outputs for bias, reliability, and security vulnerabilities for yourself. Implementing guardrails from model hosters to filter, monitor, and validate responses before they reach users is also a must.

In summary

Despite several shortcomings on this release of DeepSeek R1 model, it deserves its place in the ever growing line up of frontier models. It also shows that those working in this brave new world of AI should seek to be open and transparent in the true pursuit of democratising access to everyone on the planet, and avoid closing off access regardless of demographics, means or sovereignty. As I stated at the beginning I remain overwhelmingly optimistic about how this will push development further and faster into the future. My team and I will surely be experimenting and evaluating it in our customer prototyping engagements this year where I will be looking forward to putting it the test against real customer use cases. I'm also keen to hear other peoples perspectives as comments on this blog.

But what I have learned over my many decades in tech is that these are moments to embrace with vigour and curiosity, but not without well founded knowledge and caution.

So bravely go forward and experiment, and you can get your hands on it here on AWS.

?

?

Great post. Also thanks for the economic comparison. Do you have a source for these numbers please?

要查看或添加评论,请登录

Adrian De Luca的更多文章

  • My guide to AWS re:Invent 2024

    My guide to AWS re:Invent 2024

    As the Thanksgiving holiday weekend in the United States subsides, and the plentiful feasts of food enjoyed around the…

    9 条评论
  • Building a Culture of Experimentation in your organisation

    Building a Culture of Experimentation in your organisation

    In the mid-20th century, Bell Labs, the research and development company of the well-known American telephone company…

    7 条评论
  • My guide to AWS re:Invent 2023

    My guide to AWS re:Invent 2023

    It’s that special time of the year again after the US Thanksgiving holiday when tens of thousands of developers…

    4 条评论
  • My guide to AWS re:Invent 2021

    My guide to AWS re:Invent 2021

    AWS’ annual re:Invent conference has not only become the much anticipated event for customers and partners, but pretty…

    4 条评论
  • A hitchhikers guide to SaaS transformation (Part 3)

    A hitchhikers guide to SaaS transformation (Part 3)

    In Part 1, I talked about several aspects of moving to SaaS including preparing for the culture shift, understanding…

  • A hitchhikers guide to SaaS transformation (Part 2)

    A hitchhikers guide to SaaS transformation (Part 2)

    In Part 1, I talked about several aspects of moving to SaaS: preparing for the culture shift, understanding what…

    3 条评论
  • 1,826 Day Ones at Amazon

    1,826 Day Ones at Amazon

    Five years ago today, I showed up at the office in Sydney for day one of my new job at AWS. The modestly decorated…

    60 条评论
  • A hitchhikers guide to SaaS transformation (Part 1)

    A hitchhikers guide to SaaS transformation (Part 1)

    Software as a service (SaaS)—also known as cloud-based software, on-demand software, hosted software, or web-based…

    4 条评论
  • Can Australian Tech rescue the uncertain future of the economy? Part 1

    Can Australian Tech rescue the uncertain future of the economy? Part 1

    With arguably the most important federal election in decades now over, I can’t but help lament over Prime Minister…

    12 条评论

社区洞察

其他会员也浏览了