Compiler Limitations #3/3

Compiler Limitations #3/3

Some examples, before the main point

  1. As discussed in the 1st post in this series , clang isn't able to properly express C++ constness and as a result is overly cautious at times. In Oct 9 2015 Larisse Voufo submitted a first patch in a series that was supposed to address it. The patches were discussed for 2 months - the last comment that I can see is from Dec 2 2015. In the 7.5Y since then - the work is silent.
  2. Clang's type-based alias analysis often misses negative results (this doesn't cause bad code generation, just missed optimizations). A year after I reported some toy cases I learnt that 6 years ago Ivan Kosarev worked to address these cases and others , but to this day the work remains hidden behind the undocumented switch `-Xclang -new-struct-path-tbaa`.
  3. As discussed in the 2nd post in this series , clang leaves a lot of potential memory-prefetching wins on the table. The academic work I linked to was done entirely out of tree, and when asked - Sam Ainsworth clarifies that he has no intention of trying to submit it upstream.
  4. A while back I came across this work from 2019 about a new approach to cross-procedural optimizations, dubbed HTO ("header-time-optimization"). They did intend to submit it upstream but it never happened. I thought the work was brilliant and tried to nudge the authors into submitting it, with no success.

This list can go on, but hopefully the pattern is clear.

The main point, after some examples

I whole heartly recommend this brilliant 2021 keynote by Chris Lattner, the mastermind behind LLVM: The Golden Age of Compilers . He makes a point to repeat the sentence:

Larger center of gravity concentrated scarce compiler engineering effort. Enables innovations in languages, frontends and backends.

(in fact it repeats 4 times, check the slides ). It is a noble goal and might have been true for a while, but in my experience this definitely isn't the case today. Of all compilers around today the LLVM suite is by far the most friendly and approachable - and yet, as the examples show, even in LLVM it is extremely hard to push innovations forward. Not just for me, but also for people whose day job is LLVM.

Why is that? I don't know, obviously, but here are some thoughts.

For a while I believed the main issue is that of scale. Concentrating so many engineers around a project risks getting it to the 'mythical man-month' point: not only does it not 'Enable innovations in languages, frontends and backends', it hinders them.

I no longer think that. Today I feel the main factor limiting LLVM progress (and maybe gcc too) is that of ownership, and specifically in the mid-end. 'Mid end' is a term for the passes that optimize IR into better IR, sometimes also called 'Optimizer'. Note the examples above mostly land there.

Backends have obvious ownerships (intel owns the intel backend etc.). Somehow front-ends have strong ownerships too: google gives the clang front-end a strong backing, Apple backs swift, seems flang is headed by representatives from ARM/AMD/Huawei, etc.

Not so for the mid-end. As far as I can see only a handful of people really know their way around mid-end, and they're spread around academia and US national laboratories. Not only are mid-end patches struggling to get timely reviews, reported issues are largely un-discussed and un-assigned (there are +20K as of this writing). The few mavens who are holding the mid-end fort are just overwhelmed. In one conversation I had with a key figure in this domain about a missed optimization I was trying to draw attention to, he outright told me: fixing this would not result in an academic paper, so he can't assign anyone to it.


No alt text provided for this image

I honestly don't understand this.

How is it that google (for example) has entire teams working on the clang front end and is heavily involved in C++ language design, but has relatively little investment in the actual optimizing parts of the compiler? I would think investment in optimization would have a larger, across-the-board impact on their developer productivity, infrastructure utilization and customer experience.

Can you see something I'm missing? Any thoughts are welcome.

Kip Hamiltons

Developer Support Engineer @ EngFlow | Bazel Bandit, C++ Connoisseur

1 年

This is a brilliant write up. Thank you for your insights. Very thought provoking. I think you've nailed some larger issues with large, long-standing open source projects generally.

要查看或添加评论,请登录