Scalable thermodynamic second-order optimization

Scalable thermodynamic second-order optimization

New preprint from Normal Computing ! Our team (Kaelan Donatella, Sam Duffield , Denis Melanson , Maxwell Aifer , Phoebe Klett , Rajath Salegame , Zachary Belateche , Gavin Crooks , Antonio Martinez , and Patrick Coles ) has posted "Scalable thermodynamic second-order optimization" - introducing a novel approach to accelerate AI training using physics-based hardware. This work was supported by Advanced Research + Invention Agency (ARIA) 's Scaling Compute Programme, which aims to drastically reduce the hardware and energy costs of training AI models by rethinking current computing paradigms.

arxiv: https://arxiv.org/abs/2502.08603

While second-order methods like K-FAC can train neural networks more efficiently per iteration than first-order methods like SGD, they're held back by computational overhead. In the figure below we show the different time contributions to the K-FAC update on a multi-layer perceptron (MLP) and a transformer (GPT). We see that inversion dominates these, as well as other matrix operations.


Our algorithm consists of accelerating the matrix operations in the K-FAC optimizer. We compute the weight updates by first constructing Kronecker factors (that approximate the curvature matrix of the loss landscape) and then sending them onto a thermodynamic solver to compute the weight updates.


This leads to a reduction in the computational overhead of K-FAC, thus approaching the requirements of a first-order method like SGD (and its variants such as Adam), as shown in the table below.


Naturally, thermodynamic hardware is inherently lower-precision than standard digital hardware. We show that with proper quantization of the matrix involved, the benefits of K-FAC with respect to Adam can largely be preserved. The figure below shows this for both input and output quantization of the K-FAC update.


We also benchmarked the thermodynamic K-FAC optimizer on AlgoPerf, which include workloads such as training a vision transformer on ImageNet and a graph neural network on ogbg-molpcba. This leads to a substantial estimated advantage over standard KFAC as well as first-order baselines in terms of validation metrics per wall-clock time.


To sum up, our approach uses thermodynamic computing to accelerate the matrix operations at K-FAC's core, making it competitive with first-order methods in wall-clock time while preserving its convergence benefits. Our contributions are:

  • Developed a scalable algorithm for accelerating K-FAC (Kronecker-factored approximate curvature) using thermodynamic computing
  • Demonstrated how matrix operations in K-FAC can be mapped to physical systems of coupled harmonic oscillators
  • Achieved asymptotic runtime improvement from O(n3) to O(n2κ2) for neural networks of width n
  • Experimentally validated robustness to both input and output quantization in ResNet training, with 8-bit precision maintaining competitive performance despite output quantization having stronger impact than input quantization
  • Demonstrated potential real-world impact through estimated speedups on practical workloads, with matrix inversions accounting for 11% of computation time in ViT on ImageNet and 27% in GNN on ogbg-molpcba training

Follow Normal Computing to stay informed on our thermodynamic computing research!

要查看或添加评论,请登录

Normal Computing的更多文章

社区洞察

其他会员也浏览了