Progress, but no major breakthrough on matrix multiplication
Image source: https://www.nature.com/articles/s41586-022-05172-4

Progress, but no major breakthrough on matrix multiplication

On Wednesday, an article titled "Discovering faster matrix multiplication algorithms with reinforcement learning" appeared in the journal Nature; since then, this work has attracted considerable attention.

A few days earlier, I had been asked by a German science journalism organisation to look at the work and to give an expert opinion on it. Why me? Well, as it happens, I am an AI researcher who's internationally known as an expert an automated algorithm design. I also have a keen interest in matrix multiplication (although I am not an expert on that topic). Seeing that there appear to be quite a few misunderstandings about this work, I believe it is useful to share and explain my perspective on it.

Firstly, matrix multiplication is indisputably a fundamental and very important operation that plays a key role in many areas of computer science and its applications. There can be no doubt that substantial speedups of matrix multiplication in prominent real-world applications would have major impact, e.g., on demands for computational resources (and hence on energy footprint), and therefore be of considerable scientific and practical interest.

Secondly, to the best of my knowledge, the use of deep reinforcement learning in this context is new and interesting.

That said, it is tempting to overestimate the significance of this particular piece of work, for three reasons:

1. The reported speedups are relatively small and based on experiments conducted on synthetic data.

2. Speed is not all that matters for practical uses of matrix multiplication.

3. Automated algorithm design is not a new idea, and substantially more impressive results have been achieved and published in the past.

Before explaining these three points in more detail, I'd like to state very clearly that I respect the work presented in the Nature article by Fawzi et al., and I believe it to be of considerable interest for experts on automated algorithm design, deep reinforcement learning and matrix multiplication. However, I do not see it as a breakthrough in automated algorithm design, nor as one in matrix multiplication, as explained in the following.

1. The reported speedups are relatively small and based on experiments conducted on synthetic data.

The key results in terms of speedups can be seen in Figure 5 of the article by Fawzi et al. Closer inspection shows that for matrices of size 8192, the speedup relative to a known method (referred to as "Strassen-square" by the authors) is just over 4%, and for size 20480 on a standard GPU, it's just over 2%. Speedups on Google's proprietary TPU are slightly larger, but still below 6% for all reported matrix sizes. From the supplementary material it is clear that considerable care was taken to control for the limited accuracy of performance measurements, but it is still safe to say that these numbers are medians, and that therefore, in a substantial number of instances, the actual speedups are even smaller (albeit possibly only slightly so).

These are not impressive numbers, especially compared to the speedups still obtained every year by improvements in hardware. Of course, the two types of improvement do compound, but the latter would almost certainly substantially dominate the former in any meaningful comparison, and even their combined effect over one cycle of hardware improvement would hardly be a game changer for any application.

Furthermore, although the article could be clearer on this this, it very much appears that all the results reported in Figure 5 and elsewhere have been measured on synthetic data - specifically, square matrices, presumably generated using a randomised process. Matrices encountered in real-world applications, however, typically exhibit structure, and are not always square. Both factors can have an impact on the efficiency of matrix multiplication algorithms, and there is no reason to believe that the newly discovered algorithms would fare particularly well in that regard - say, compared to Strassen's algorithm or "Strassen-square".

2. Speed is not all that matters for practical uses of matrix multiplication.

To be fair, the Fawzi et al. acknowledge this point, at least in their supplementary material (Section B3). In many real-world uses of matrix multiplication, numerical stability is of considerable importance; roughly speaking, this concerns the compounding of errors resulting from using limited-precision representations of real numbers. It is known that Strassen's algorithm for matrix multiplication is numerically less stable than the na?ve algorithm, which has cubic time complexity, compared to Strassen's O(n^2.807), and for that reason, the na?ve algorithm is used in quite a few real-world applications. I suspect that "Strassen-square" is numerically less stable than Strassen, and according to Fawzi et al., most of their automatically discovered algorithms have numerical error bounds considerably higher than "Strasssen-square". (This interpretation is consistent with the remark made in Footnote 3 of the 2016 article of Huang et al. that introduces "Strassen-square", which states that they limited themselves to two levels of Strassen partly because of numerical stability issues. The 2015 article by Benson & Ballard, on which Fawzi et al. build, also acknowledges the importance of numerical stability for "widespread use" of fast matrix multiplication algorithms, highlights the need for extensive empirical testing, but does not report any results regarding numerical stability.) This begs the question how these algorithms compare to Strassen's algorithm and to the na?ve algorithm.

Of course, Fawzi et al. caution that they did not optimise for numerical stability, and perhaps if they had done so, they might have obtained faster algorithms that are numerically at least as stable as Strassen's - but until demonstrated conclusively, this remains speculation. Based on what we know from their results, using the newly discovered algorithms for any application where numerical stability matters would seem to be problematic.

3. Automated algorithm design is not a new idea, and substantially more impressive results have been achieved and published in the past.

Already in the abstract of their article, Fawzi et al. state that "The automatic discovery of algorithms using machine learning offers the prospect of reaching beyond human intuition and outperforming the current best human-designed algorithms." As an expert in automatic algorithm design (in fact, in 2020, I was made a Fellow of the ACM for my work in that area), I very much agree with this sentiment, although I would emphasise that in my experience, it takes clever combinations of machine learning and optimisation techniques to achieve the best results, and that far more impressive results than those by Fawzi et al. have been published years ago.

To be fair, Fawzi et al. do not claim that they are the first to demonstrate the potential of automated algorithm design, but reactions to their work appear to make that assumption. In reality, work on automated algorithm design can be traced back at least to the 1970s (to the work of John Rice on automated algorithm selection, to be precise) and has shown increasing and very substantial impact over the past 15 years. I know this, because - in close collaboration with many other AI experts - I had the privilege of being a driving force in this research area. By now, the number of highly cited publications in top-tier venues is vast, and I will not even attempt to give an overview here. I will, however, briefly mention results on two problems that have been studied intensely and prominently for decades and that are of very high practical importance: the satisfiability problem in propositional logic (SAT) and mixed-integer programming (MIP), and give examples of work I have been closely involved in and therefore know very well.

SAT is one of the most prominent problems in all of computer science. It was the first problem to be proven NP-hard, and algorithms for solving it are routinely used to ensure the correct operation of computer hard- and software, for accomplishing complex scheduling and product configuration tasks, and in many other real-world applications. Using automated algorithm discovery techniques based on machine learning and optimisation, speedups over previous state-of-the-art SAT algorithms of factors more than 50 (i.e., 98%) have been reported on structured SAT instances (see, e.g., KhudaBukhsh et al., 2016).

MIP is one of the most prominent problems in operations research, and commercial MIP solvers are used broadly and routinely in industry and academia. MIP is also NP-hard, and automated algorithm design techniques (to be precise: automated algorithm configuration procedures) have been demonstrated to achieve speedup factors between just under 2 and over 50 (i.e., between ~50% and ~98%) on various sets of widely studied, highly structured benchmark instances (see, e.g., Hutter et al., 2010).

This shows that automated algorithm design techniques are very powerful and very useful; indeed, they have been demonstrated to achieve performance improvements far beyond those reported by Fawzi et al. for matrix multiplication. I have no doubt that such techniques, perhaps even based on the deep reinforcement learning approach proposed by Fawzi et al., can ultimately achieve far more impressive results for matrix multiplication. From my perspective, this would require demonstrating speedups in excess of 25% over "Strassen-square" for sets of matrices that are taken or derived from real-world applications, or that at least capture the salient structure of such matrices. Since it would be easy to do in the case of matrix multiplication, I'd also like to see results on speedups for actual applications, e.g., from machine learning or scientific computation. Even then, I'd be hesitant to call it a breakthrough in automated algorithm design, considering the truly impressive successes that have been achieved in that area over the past decade.


Notes:

  • An earlier version of this article used a conversion of speed-up factors for SAT and MIP algorithms into percentages that was likely inconsistent with that underlying Table 5 in the article by Fawzi at al. This has now been changed to the intuitive definition, under which a 2-fold speedup corresponds to a 50% speedup. In most publications on automatic algorithm running dealing with running time minimisation, speedup is reported as a factor (i.e., the ratio of the running time required by the improved algorithm and that of the base line algorithm) rather than a percentage, since factors are easier to understand intuitively and their use leaves less room for misinterpretations.
  • The discussion of numerical stability has been slightly amended and now takes into account details from the articles describing "Strassen-square" (references 31 and 32 in the work by Fawzi et al.).

Frank Hutter

Founder, Professor, Tabular Foundation Models and AutoML

2 年

Thanks for writing this, Holger! It is important to have a down-to-Earth expert take on papers that receive as much public attention as this one.

Morten Irgens

Vice Dean of Innovation and Impact, Copenhagen Business School, Strategic advisor, Kristiania, Director of CLAIRE, NORA, and Adra;

2 年

Thanks, Holger, this was a well-written review, setting the work into context.

Raúl Vides Mosquera Pumaricra

Consultor Workflow ABAP / S4 Hana Developer en NBTeam Consulting Inc

2 年
Daan van den Berg

Solving Unsolvable Problems

2 年

Yeah, this is awesome. Just think about the consequences it could have for graph coloring, Kolmogorov complexity or even P vs NP.

Michael Spencer

A.I. Writer, researcher and curator - full-time Newsletter publication manager.

2 年

Sadly, I'm beginning to feel like a lot of deep minds efforts fall in the area of public relations and not science.

要查看或添加评论,请登录

Holger Hoos的更多文章

社区洞察

其他会员也浏览了