Are bigger AI models better?

Are bigger AI models better?

In AI “the 2010s were the age of scaling, now we're back in the age of wonder and discovery once again.” This important insight comes from[1] the pioneer of deep-learning AI Ilya Sutskever . So, when the co-developer of AlexNet (the first, successful deep-learning AI model), the co-founder of OpenAI, and most recently co-founder of the AI lab Safe Superintelligence (SSI), calls time on the recent trend of just scaling LLM models to make AI better and better, we all need to pay attention.

It was in the so called Chinchilla[2] paper from 2022 where researchers showed that for optimal training of large language models, the model size and the number of training tokens need to be scaled equally. For every doubling in the size of the model the number of training tokens also needs to double, and when combined with an exponential increase in compute, this scaling approach has produced stunning results. But recently AI developers have started to see a slowdown from just scaling current model methods. It has been reported that the improvements in the next version of Google's Gemini are falling short, Anthropic has delayed its next-generation Claude model, and over at OpenAI we hear[3] that their next-generation Orion model is seeing far smaller gains from model scaling then were previously seen between GPT-3 and GPT-4.

What we need to remember is that language is just an encoding scheme that humans use to share information and knowledge. As an example, engineers and lawyers use special encodings (technical language) to describe complex terms which improve the efficiency and accuracy of their information exchange. But now we have Large Language Models which allow computers to ‘decode’ this human encoding scheme and to learn from our human information. This includes the roughly 500 trillion tokens (or word sequences) that are available on the indexed worldwide-web. However, trying to embedded all of this information into a single model is too expensive and appears to already be reaching its limits.

But humans use other techniques too. We follow a ‘train of thought’ as we ‘reason’ over the information that we have available, and we are starting to see this type of approach being used in AI too. Noam Brown, a researcher at OpenAI, said[4] about their recent ‘o1’ release that, "20 seconds of thinking time" in inference achieves an improvement that would have needed a "100,000x increase in model scale." Another stunning example is the breakthroughs that have been achieved by the Chinese research company DeepSeek-AI who have also used a reasoning approach that leverages Reinforcement Learning , combined together with a highly diverse mixture-of-experts model, to achieve results and which goes beyond what has previously been possible with a single large model, but where this performance is achieved in a far more efficient way. Whereas most AI researchers have just been pushing hard on the scaling lever, perhaps held back by recent export restrictions that limit access to GPU’s, this Chinese team are showing that necessity can become the mother of invention.

I find it very exciting (but also not a surprise) that we are about to see a new wave of innovation in artificial intelligence. When asked about AI development, I always point out that the AI you are using today is the worst AI you will ever use. We are just at the start of an incredible journey. And here are just a few simple ideas to consider:

o?? Today all AI models are single task, whereas in all other software we take advantage of multitasking that utilizes memory management units to protect data and manage data locality. Rather than following a single train of thought most of us would consider multiple ideas and then back track to pick up on different threads, apply a different part of our knowledge expertise, then combine these threads together to find a solution. We will see these concepts emerge in next generation AI models.

o?? Uncertainty Quantification (UQ) is another little used concept that has been very successful in applying AI to improve complex real-world simulation problems. UQ helps to direct the simulation to look at the areas where the uncertainty is highest and to understand how this uncertainty will change based on certain events. This approach has been especially successful in the most complex time-based simulations, such as trying to model how the plasma in a nuclear fusion tokamak reactor will behave over time and how you can control this complex reaction. Innovative young AI companies like digiLab , who is a world leader in UQ, are using this same approach to improve the efficiency, accuracy, and trustworthiness of new agentic AI systems.

Yann LeCun believes[5] (as I do) that AI development should be open-sourced so that we can drive progress forward more rapidly. Open-sourcing and sharing ideas allows researchers to quickly build on the ideas of others, and also provides a much higher level of transparency that will allow us to make AI safer. As LeCun points out, humans: “have the capacity to understand the world, understand the physical world, the ability to remember and retrieve things, persistent memory, the ability to reason, and the ability to plan.” As an example, when I say the word ‘apple’ you instantly know it’s rough size, its shape, its weight, that it grows on trees, the sound it makes when you bite in to it, and the taste that will hit your tongue – oh, and it is also linked to Isaac Newton, the concept of gravity, plus it might also be referring to a computer or smartphone! You have so much real-world context about this simple word that adds much more information and fire’s off different neurons and synapses in your brain when confronted by this simplest of human encodings. By contrast an LLM just knows it has four letters, two of which are repeated, and has some understanding of when it might be useful to add this word into a sentence as part of its very advanced text prediction.

I do not believe that LLMs on their own will ever deliver Artificial General Intelligence (AGI). AGI is a false path - instead we need to focus on building Artificial Expert Intelligence[6]. The good news is “we're back in the age of wonder and discovery.” Welcome to the start of AI deep-learning 2.0, a world of new discoveries and the next generation of breakthrough AI companies.


[1] https://www.reuters.com/technology/artificial-intelligence/openai-rivals-seek-new-path-smarter-ai-current-methods-hit-limitations-2024-11-11/?utm_source=substack&utm_medium=email

[2] https://arxiv.org/abs/2203.15556?utm_source=substack&utm_medium=email

[3] https://www.deeplearning.ai/the-batch/issue-276/?utm_source=substack&utm_medium=email

[4] https://venturebeat.com/ai/openai-noam-brown-stuns-ted-ai-conference-20-seconds-of-thinking-worth-100000x-more-data/?utm_source=substack&utm_medium=email

[5] https://lexfridman.com/yann-lecun-3-transcript/?utm_source=substack&utm_medium=email

[6] https://www.dhirubhai.net/posts/nigeltoon_artificialexpertintelligence-fii8-artificialexpertintelligence-activity-7257835748307922944-CXQw?utm_source=share&utm_medium=member_desktop

Tim Atherton

Head of Research

2 个月

Nice article! The steps forwards on the algorithmic side run in parallel, but asynchronously, with advances on the hardware side. The software binds them. Together they are all making solid and steady progress. Kurzweil was right.

回复
Peter Dixon

Managing Director at Flagchess LTD

3 个月

hi nigel i have followed graphcore for years and would be delights to have an interest/shares pre-ipo and when will softbank go to the market??? $100,000 worth minium

回复
Daniel Wilkinson

Technology Leader, Inventor and Architect : Semiconductors and AI

3 个月

Varying the computation per output token is an obvious next step in improving LLM capabilities but it’ll make batched inference next to impossible in the form folks have been assuming. Should have interesting implications for memory hierarchy, TCO/cost per token and hopefully drive interest in efficiency versus scaling. Finally - we can always hope!

Ranjit Gopi

Multi-physics Virtual Twins | Product Development | AI

3 个月

Insightful

回复
Gandhi Karuna

Semiconductor GTM | Innovation | Strategy

3 个月

Intelligence is not connecting just words by probability gradient... Rather is connecting dots across different train of thoughts that arises from learning &/or reasoning. Hence it becomes obvious that AGI cannot be attained with current LLMs, & as rightly said by Nigel Toon , lot more needs to done than just scaling.

要查看或添加评论,请登录

Nigel Toon的更多文章

  • AI economic impact

    AI economic impact

    by Nigel Toon | reading time ~12 mins In his book of 1942, Capitalism, Socialism, and Democracy, the Austrian/ American…

    7 条评论
  • AI coming soon to consumer devices...

    AI coming soon to consumer devices...

    Nigel Toon 12 August 2024 A few months ago, I was invited to join the Board at Intrinsic Semiconductor and it has been…

    12 条评论
  • Concerned about AI? Learn more.

    Concerned about AI? Learn more.

    I am delighted to announce that on February 8th my book – How AI Thinks – will be published by Torva/Transworld, a…

    71 条评论
  • It’s not AI, it’s humans you need to worry about.

    It’s not AI, it’s humans you need to worry about.

    This week I attended the Global AI Safety Summit organised by Rishi Sunak and the UK Government at Bletchley Park…

    9 条评论
  • Global AI Safety Summit: Primus non nocere – first do no harm

    Global AI Safety Summit: Primus non nocere – first do no harm

    Nigel Toon, co-founder and CEO, Graphcore. (also at: https://www.

    18 条评论
  • Graphcore and Sequoia Capital partner for growth

    Graphcore and Sequoia Capital partner for growth

    I am thrilled to announce that we have just closed a $50 million Series C funding round with Sequoia Capital. This is…

    17 条评论
  • Letter to America, from a British visitor and friend.

    Letter to America, from a British visitor and friend.

    I have been travelling to the USA for over 30 years. I cannot claim to have visited every state but have been in at…

    32 条评论
  • 5 reasons why we need new machine learning hardware

    5 reasons why we need new machine learning hardware

    Along with speakers from Facebook, Microsoft, Nvidia, Samsung and many more I attended the Samsung 2017 AI event, an…

  • What links brains, worms & parallel processors for AI?

    What links brains, worms & parallel processors for AI?

    There are striking similarities between the networking structures of human brains, the nervous system of the nematode…

    4 条评论
  • Can machines become more intelligent than man?

    Can machines become more intelligent than man?

    In 1965 an article was published in the New Scientist called Logic of Man and Machine written by a British…

    7 条评论

社区洞察

其他会员也浏览了