Reasoning and Simulations.
Jody Gajic
Building the Future of Learning, Assessment, and Work, with Emerging Technologies and Applied Research | Director @ Pearson Labs
I’m going to be working offsite for a few days next week and then I’ll be wrapping up the for Christmas the following week, so chances are this is going to be the final newsletter of the year. Given the articles that have caught my eye over the last week, I’ll take this opportunity to look ahead at something I think we’ll need to be considering in 2025 - reasoning models. There’s also some interesting developments in generative worlds…
???Reasoning
You might recall a report of OpenAI’s 5 stages of AI development a few months ago. Though still not officially confirmed as far as I am aware, this is still a useful framework:
For the last 2 years we’ve been operating at level 1 where the scaling laws - throwing more compute and data at training increases model intelligence - have held up well… and Nvidia have reaped the rewards. But scaling alone is seemingly getting trickier; most of us expected GPT5 this year and it hasn’t happened. But what did happen was the preview version of the first reasoning model: GPT-o1.
To expand a little on the table above, level 2 of OpenAI's AGI roadmap introduces "Reasoners", which are AI systems capable of solving complex problems with the proficiency of human experts. These systems are expected to perform problem-solving tasks at PhD-level and could represent a significant leap from mimicking human behaviour to demonstrating a level of intellectual prowess. OpenAI believes it is approaching this level, and are looking to trademark their reasoning models. Perhaps in part because Google are chasing them!
In practical terms, assuming reasoning AI lives up to the hype, perhaps it will be useful in assessing higher-order skills that are traditionally hard to measure at scale, like critical thinking or decision-making. It could also be an unlock for personalisation; you could imagine a system that is capable of reasoning being adaptive to the learner’s needs.
I’m probably sounding like a stuck record at this point but it’s still worth repeating; in my view, as consumer behaviour changes, the new UI/UX and meeting the learner where they are (i.e. how content distribution changes) will become increasingly significant as these technologies continue to advance.
But it’s not just Google and OpenAI working on reasoning.
?Qwen with Questions
Alibaba have released “Qwen with Questions”, which is an open source competitor to OpenAIs o1-preview and in most benchmarks appears to be outperforming it:
Notably QwQ-32B-Preview shows that open-source models continue to rival the capabilities of closed-source models and that Chinese AI models continue to rival the capabilities of U.S. AI models.
???DeepSeek-R1
The perhaps less familiar Chinese AI Lab, DeepSeek, released a new R1 family, which included their reasoning model DeepSeek-R1-Lite-Preview. The benchmarks are similarly impressive versus o1-preview:
They announced that it would also be open source, but no licensing terms were included.
Like o1-preview, most of its performance gains come from an approach known as?test-time compute, which trains an LLM to think at length in response to prompts, using more compute to generate deeper answers. Unlike o1-preview, which hides its reasoning, at inference, DeepSeek-R1-lite-preview’s reasoning steps are visible.
??????Go Deeper
There have been several research papers on reasoning over the past week or so. I’ve taken 3 that caught my eye and turned them into an AI-generated podcast:
This AI-podcast was made with NotebookLM and features:
Incidentally, the team that created NotebookLM have announced they are leaving Google to create a startup. It’ll be interesting to see what happens to the app… there are reports the podcast creating tool might be making it’s way to the Gemini phone app.
???Non-Linear Exploration
RunwayML, known for their AI video generation tools, have published a mind bending article about their latest prototype, in which they are exploring a new way of working with video editing, which they are referring to as “non-linear exploration”. Here is how they describe it in the article:
Traditionally, creative software has served primarily in the final stages of refinement and production. One reason for this is language: we had to translate our creative intent into tedious sequences of low-level, machine-readable parameters such as pixel coordinates and hex codes. Generative models have changed this. Instead of manipulating these low-level parameters, we can now express intent naturally, across various modalities—"what would this picture look like at evening time?"?or?"make this video match the style of these images."?This shift enables software to move beyond production tools to become instruments of creative exploration.
As I understand it, the process a user would follow is to create an AI generated image as a starting point, then generate a future frame again using AI. Maybe think of this as a story board. You then connect the 2 frame together and AI fills in the frames in between. But where it gets interesting is creating alternate timelines, branching off to “create a separate thread of experimentation”:
In creating this prototype I think we can see the significance of new interfaces that allow users to explore new creative possibilities.
领英推荐
???Spatial Intelligence
World Labs is a startup co-founded by AI legend Fei-Fei Li, which recently came out of stealth with $230M in funding, to build what they call “large world models”.
I’d not heard of the term “Spatial Intelligence” before, here is how they describe it:
But perhaps more fundamental is?spatial intelligence, allowing us to understand and interact with the world around us. Spatial intelligence also helps us create, and bring forth pictures in our mind's eye into the physical world.
For a deeper dive, Fei-Fei also has an interesting TED talk on the subject:
The team at world labs have subsequently published an article on the work they are doing with a few interactive demos of what’s possible. They’ve also uploaded a teaser video to YouTube which is well worth a look!
It’s always tricky to directly connect new technologies to traditional learning use cases as we just don’t know enough about the possibilities yet; that required experimentation. But as we talked about earlier with reasoning perhaps there is an unlock here with higher order skills like creativity and imagination?
???Infinite-Length and Real-Time Video Generation
Maybe we’re on the timeline where the simulation begins? The Matrix is a new AI model developed by Alibaba, the University of Hong Kong, and the University of Waterloo for generating infinite-length, high-quality video simulations with interactivity in real-time. The model uses advanced diffusion techniques and learning from both game and real-world data.
It’s another one that is hard to wrap your head around, but essentially it is an infinite space, generated in real time, with high precision interaction and high quality visuals.
The target industry here is clearly gaming, but who knows where it leads. Generating worlds to explore could well have some learning use case potential I’d imagine.
For me the thing that makes the last 3 sections so interesting is the speed at which the technology has developed. In a few short years we’ve gone from having our minds blown by avacado armchairs generated by DALL-E to infinite interactive worlds.
If you're celebrating, I hope you have a great Christmas and New Year... and see you in '25.
As always, thanks for reading.?
Jody
News I Didn't Use (now with AI summaries)
A Revolution in How Robots Learn (also with Audio)
The article discusses advancements in robotics, particularly how robots are learning to perform complex tasks through imitation learning and reinforcement learning. It highlights the parallels between human development and robotic learning, emphasising the importance of physical interaction and sensory feedback. The piece explores various projects, including Google's ALOHA system and a Ping-Pong-playing robot, showcasing how robots are beginning to exhibit dexterity and adaptability. It also raises ethical concerns about the implications of autonomous robots in society and the potential for job displacement, urging a thoughtful approach to integrating robots into everyday life.
The document discusses the evolution of computing paradigms from mainframes to wearables, highlighting the role of generative AI as a bridge to future technology. It outlines how each computing paradigm has built upon the previous one, emphasising the importance of the application layer in facilitating transitions. The author predicts that generative AI will enable on-demand user interfaces, enhancing the functionality of wearables and leading to new computing paradigms, while also expressing optimism about future developments in AI technology.
Microsoft's Ignite 2024 event introduced significant upgrades to Copilot Studio, focusing on "agentic AI" to enhance productivity through low-code development tools. The new features aim to empower business users to create automated workflows that integrate with Microsoft 365 and Azure AI Foundry, facilitating collaboration between developers and business teams. The Microsoft 365 Agent SDK allows for seamless integration of Copilot applications across various platforms, promoting flexibility in building AI-powered workflows that can manage complex business processes.
The document discusses the profound changes AI is causing in higher education, highlighting shifts in student cognition and behaviour. It warns of potential issues like "digital dependency disorder," where students become anxious without AI tools, and the illusion of mastery through AI comprehension undermining deep learning. The shift towards AI over collaborative learning threatens social skills and critical thinking, while students may begin to doubt human expertise in favour of AI outputs. Educators face the challenge of adapting to these changes without compromising essential learning aspects, as the decisions made today will shape future generations' cognitive development and societal intellectual capacity.
The UK has enacted the Digital Markets, Competition and Consumers Act (DMCC), granting regulators extensive powers to oversee American tech companies, including the ability to halt acquisitions and impose tailored regulations. This ex-ante framework assumes companies are in potential breach of competition laws, shifting the regulatory landscape significantly. The DMCC targets major firms based on revenue thresholds, allowing for aggressive legal interventions and creating a complex compliance environment that could deter American companies from engaging in the UK market. The law reflects broader global trends of increasing government control over private enterprise, particularly in the tech sector.
ElevenLabs has launched a new feature called GenFM, allowing users to create multispeaker podcasts by uploading various content types, similar to Google's NotebookLM. The feature supports 32 languages and automatically selects voices to generate podcasts, incorporating natural dialogue fillers for a more human-like conversation. Future plans include enhanced customisation and the ability to add multiple sources for generative AI podcasts. Additionally, ElevenLabs is expanding its operations in Poland and India to enhance its AI capabilities.
Large language models (LLMs) have demonstrated superior predictive abilities compared to human experts in neuroscience, as shown by the BrainBench benchmark. This benchmark evaluates the ability to predict study outcomes based on altered abstracts from neuroscience articles. LLMs achieved an average accuracy of 81.4%, significantly outperforming human experts who averaged 63.4%. The study suggests that LLMs can effectively integrate information across abstracts, enhancing their predictive capabilities, and highlights the potential for LLMs to assist in scientific discovery by keeping up with rapidly expanding literature.
The paper introduces WEB-DREAMER, a new framework that uses large language models (LLMs) to enhance web agents' planning abilities by simulating potential outcomes of actions. This approach allows agents to evaluate different actions before executing them, improving their effectiveness in navigating complex web environments. The authors highlight opportunities for future research in optimising LLMs for world modelling and developing better planning algorithms.