登录查看更多内容

TTT: A new Breakthrough Learning Technique in Generative AI

Ahmed M.Raafat

Principal Solutions Architect at AWS | Field CTO – UK Enterprise New Business | AI/ML Expert | Ex-Microsoft| Keynote Speaker

发布日期: 2024年11月18日

From the day of the release of the Transformer architecture in the groundbreaking paper “Attention Is All You Need” (June 2017), the race to achieve a capable Large Language Model (LLM) began. This race took the world by storm on November 2022, when OpenAI released ChatGPT, showcasing the true potential of generative AI.

Since then, numerous training and fine-tuning techniques have been introduced to push the boundaries of LLM capabilities, bringing us closer to the elusive goal of Artificial General Intelligence (AGI). Notable techniques include LoRA (Low-Rank Adaptation), Chain-of-Thought prompting, Retrieval-Augmented Generation (RAG), and RLHF (Reinforcement Learning with Human Feedback), each contributing unique advancements.

Last week, on November 11, researchers at 美国麻省理工学院 took a major step forward by releasing a breakthrough paper titled “The Surprising Effectiveness of Test-Time Training for Abstract Reasoning.” The paper introduces Test-Time Training (TTT), a novel approach that updates model parameters temporarily during inference, enabling LLMs to achieve 61.9% accuracy on the AGI benchmark (ARC Prize). To put this into perspective, the previous top score was 42%, while the average human score is 60.2% and the best human score reaches 97.8% . That mean it's surpasses the average human score !!!

In this post, I will summarize what this paper is about and explain why I believe this technique could be the next big thing in AI, especially given its ability to push models beyond existing limits in complex reasoning tasks.

The Challenge

LLMs have achieved remarkable progress in recent years, yet they still face a fundamental challenge of generalization. While these models excel at solving problems closely related to their training data, they often struggle with novel tasks requiring abstract reasoning. A notable example is the ARC (Abstraction and Reasoning Corpus) Prize, a benchmark designed to evaluate generalization capabilities in artificial intelligence. Despite their sophistication, previous state-of-the-art models could only achieve a score of 42%, far behind the best human score of 97.8% and even trailing the average human score of 60.2%. This gap underscores the limitations of current architectures and techniques in handling truly novel and complex reasoning tasks.

The Idea

Test Time Training (TTT) is an innovative approach that enables models to dynamically adapt their parameters during inference by leveraging the test data itself. This process allows the model to update its predictions based on the specific problem it encounters, creating a more flexible and adaptive system compared to traditional static models.

The core concept is straightforward: when presented with a new problem, the model generates training data on the fly by applying transformations and augmentations to the test input. These variations—such as geometric transformations or masking—allow the model to fine-tune itself temporarily, optimizing for the specific task at hand. Using lightweight techniques like LoRA (Low-Rank Adaptation), TTT performs efficient parameter updates, minimizing a loss function for the given instance. Importantly, these updates are transient; once the task is completed, the model reverts to its original parameters, maintaining efficiency for subsequent tasks.

This dynamic process bridges the gap between training and inference, allowing the model to improve predictions in real-time. By creating tailored training data and fine-tuning itself on the fly, TTT empowers models to tackle novel and complex reasoning tasks with unprecedented precision.

Findings and Conclusion

Using an 8-billion parameter model from the Llama-3 family, the implementation of Test-Time Training (TTT) achieved remarkable results. Applied to the ARC Prize benchmark, TTT delivered a significant breakthrough, reaching 61.9% accuracy, well above the previous best score of 42% and surpassing the average human score of 60.2%. This achievement demonstrates the power of TTT in enabling models to dynamically adapt during inference, improving generalization and reasoning capabilities without requiring larger models or extensive pretraining data. TTT’s ability to leverage task-specific insights during inference marks a critical shift in AI development. By pushing the limits of what LLMs can achieve, this approach offers a promising pathway toward bridging the gap between current AI systems and the vision of AGI. As research progresses, TTT could serve as a foundational technique for creating more adaptive, efficient, and intelligent systems.

Really exciting stuff!! What do you think? I'd love to hear your feedback!

Thanks to Matthew Berman , as I found out about TTT through his insightful channel, which I highly recommend to follow.

Simon Higgins

1 周

Just when you think the technology has stalled along comes a new approach. Great write up thanks Ahmed

1 次回应

Ben Moses

Principal Solutions Architect at AWS, helping customers to modernise and transform their businesses || AWS Serverless enthusiast

Great write up mate, thanks for sharing.

查看更多评论

The Challenge

The Idea

Findings and Conclusion

Swami Sivasubramanian AWS re:Invent Nov 30. 2022- Data & ML Keynote Highlights

2022年12月1日

My summary of Adam Selipsky Keynote announcements re:Invent 29.Nov 2022

2022年11月30日

Pre-re:Invent keynote announcements for multi-accounts strategy

2022年11月29日

Summary of AWS re:Invent -2021 keynote announcements by Werner Vogels on 2-Dec-2021.

2021年12月2日

Summary of AWS re:Invent -2021 keynote announcements by Swami Sivasubramanian on 1-Dec-2021.

2021年12月2日

My summary of AWS re:Invent -2021 announcements by Adam Selipky on 30-Nov-2021.

2021年12月1日

My summary for AWS re:Invent -2020 - Werner Vogels Keynote - 15th Dec.

2020年12月15日

My summary for AWS re:Invent -2020 - Infrastructure Keynote - LIVE- 10th Dec.

2020年12月11日

My summary for AWS re:Invent -2020 Machine learning announcements list- 8th Dec.

2020年12月8日

My summary for AWS reInvent -2020 announcements list-Andy Keynote 1st Dec.

2020年12月7日