登录查看更多内容

Scalable thermodynamic second-order optimization

Normal Computing

We build AI systems that natively reason about the real world #semis #mfg #energy.

发布日期: 2025年2月13日

New preprint from Normal Computing ! Our team (Kaelan Donatella, Sam Duffield , Denis Melanson , Maxwell Aifer , Phoebe Klett , Rajath Salegame , Zachary Belateche , Gavin Crooks , Antonio Martinez , and Patrick Coles ) has posted "Scalable thermodynamic second-order optimization" - introducing a novel approach to accelerate AI training using physics-based hardware. This work was supported by Advanced Research + Invention Agency (ARIA) 's Scaling Compute Programme, which aims to drastically reduce the hardware and energy costs of training AI models by rethinking current computing paradigms.

arxiv: https://arxiv.org/abs/2502.08603

While second-order methods like K-FAC can train neural networks more efficiently per iteration than first-order methods like SGD, they're held back by computational overhead. In the figure below we show the different time contributions to the K-FAC update on a multi-layer perceptron (MLP) and a transformer (GPT). We see that inversion dominates these, as well as other matrix operations.

Our algorithm consists of accelerating the matrix operations in the K-FAC optimizer. We compute the weight updates by first constructing Kronecker factors (that approximate the curvature matrix of the loss landscape) and then sending them onto a thermodynamic solver to compute the weight updates.

This leads to a reduction in the computational overhead of K-FAC, thus approaching the requirements of a first-order method like SGD (and its variants such as Adam), as shown in the table below.

领英推荐

The Landscape of Machine Learning: Classical…

Jose R. Kullok 3 个月前

Quantum Machine Learning and the Future of AI

Analytics Insight? 8 个月前

Quantum Machine Learning and the Future of AI

Analytics Insight? 8 个月前

Naturally, thermodynamic hardware is inherently lower-precision than standard digital hardware. We show that with proper quantization of the matrix involved, the benefits of K-FAC with respect to Adam can largely be preserved. The figure below shows this for both input and output quantization of the K-FAC update.

We also benchmarked the thermodynamic K-FAC optimizer on AlgoPerf, which include workloads such as training a vision transformer on ImageNet and a graph neural network on ogbg-molpcba. This leads to a substantial estimated advantage over standard KFAC as well as first-order baselines in terms of validation metrics per wall-clock time.

To sum up, our approach uses thermodynamic computing to accelerate the matrix operations at K-FAC's core, making it competitive with first-order methods in wall-clock time while preserving its convergence benefits. Our contributions are:

Developed a scalable algorithm for accelerating K-FAC (Kronecker-factored approximate curvature) using thermodynamic computing
Demonstrated how matrix operations in K-FAC can be mapped to physical systems of coupled harmonic oscillators
Achieved asymptotic runtime improvement from O(n3) to O(n2κ2) for neural networks of width n
Experimentally validated robustness to both input and output quantization in ResNet training, with 8-bit precision maintaining competitive performance despite output quantization having stronger impact than input quantization
Demonstrated potential real-world impact through estimated speedups on practical workloads, with matrix inversions accounting for 11% of computation time in ViT on ImageNet and 27% in GNN on ogbg-molpcba training

Follow Normal Computing to stay informed on our thermodynamic computing research!

Normal Computing Blog

963 位关注者

Normal Computing

1 个月

arxiv: https://arxiv.org/abs/2502.08603

3 次回应

要查看或添加评论，请登录

Normal Computing的更多文章

See all articles

Scalable thermodynamic second-order optimization

Normal Computing

We build AI systems that natively reason about the real world #semis #mfg #energy.

领英推荐

Normal Computing Blog

963 位关注者

Normal Computing的更多文章

社区洞察

其他会员也浏览了

Introducing Quantum Agentics: A New Way to Think About AI Tasks & Decision-Making

AI Meets Neuromorphic Computing: A Leap Towards Smarter Technology

AI reasoning, breaking a bottleneck, and putting agents to the test

Major software libraries for physics-informed machine learning

AI and the Ruliad: Charting New Frontiers in Innovation ??

IEEE SSCI 2025 Title Abstract Deadline This Sunday (22 September)

The Quantum Leap: How Quantum Computing Could Supercharge Generative AI

YOLO-NAS: 7 Factors to Success

The Architecture of Boltzmann Networks: From Statistical Physics to Modern Machine Learning

Artificial Intelligence – coming of age?

领英推荐

Normal Computing Blog

963 位关注者

Normal Computing的更多文章

Normal Computing's "Thermodynamic Linear Algebra" published in Nature Unconventional Computing

Introducing Fuji-Web

A First Demonstration of Thermodynamic Matrix Inversion

posteriors: Normal Computing’s library for Uncertainty-Aware LLMs

Explainable Language Models: Existing and Novel Approaches

Eliminating hallucinations (fast!) in Large Language Models with Finite State Machines

社区洞察

其他会员也浏览了

Introducing Quantum Agentics: A New Way to Think About AI Tasks & Decision-Making

AI Meets Neuromorphic Computing: A Leap Towards Smarter Technology

AI reasoning, breaking a bottleneck, and putting agents to the test

Major software libraries for physics-informed machine learning

AI and the Ruliad: Charting New Frontiers in Innovation ??

IEEE SSCI 2025 Title Abstract Deadline This Sunday (22 September)

The Quantum Leap: How Quantum Computing Could Supercharge Generative AI

YOLO-NAS: 7 Factors to Success

The Architecture of Boltzmann Networks: From Statistical Physics to Modern Machine Learning

Artificial Intelligence – coming of age?