登录查看更多内容

LLM Compiler: Foundation models for code optimization

Chris Cummins

Research Software Engineer at Meta AI

发布日期: 2024年6月27日

+ 关注

Chris Cummins , Volker Seeker , Dejan Grubisic , Baptiste Rozière , Jonas Gehring , Gabriel Synnaeve , Hugh Leather .

Today we release LLM Compiler, a family of large language models specifically built for compiler and code optimization tasks (paper, models+code).
LLM Compiler is free for both research and commercial use.
LLM Compiler is available in two flavors: LLM Compiler, the foundational models, pretrained on over 500B tokens of LLVM-IR, x86_84, ARM, and CUDA assembly codes and trained to predict the effect of LLVM optimizations; and LLM Compiler FTD, which is fine-tuned on a further 186B tokens to predict the best optimizations for code in LLVM assembly to reduce code size, and to disassemble assembly code to LLVM-IR.
LLM Compiler demonstrates far stronger understanding of compiler optimizations than existing publicly available LLMs. When tasked with optimizing IR and assembly it produces compilable code 95.6% of the time vs 5.4% for Code Llama, and perfectly emulates the compiler 20% of the time vs 0% for Code Llama.
LLM Compiler FTD sets state-of-the-art results on the tasks of optimizing for code size and disassembly. It achieves a 5.24% code size improvement over -Oz vs GPT-4 Turbo 0.03%, and 0.96 round-trip BLEU score on disassembly vs GPT-4 Turbo 0.43.

Introduction

We are excited to announce the release of LLM Compiler, a model targeted at code and compiler optimization tasks. LLM Compiler is built on top of our code specialized large language model, Code Llama, adding capabilities to better understand compiler intermediate representations, assembly language and optimization. LLM Compiler is demonstrated on two difficult tasks: optimizing for code size and decompiling from assembly to the compiler’s intermediate representation. We release these foundation models to accelerate the application of LLMs for code optimization tasks and to enhance developer experience.

We are releasing LLM Compiler under the LLM Compiler License Agreement for research and commercial use, which incorporates the Acceptable Use Policy for Llama Materials.

How LLM Compiler works

LLM Compiler is a specialization of Code Llama. LLM Compiler has been pre-trained on a vast amount of LLVM assembly (IR), x86_64, ARM, and CUDA assembly codes. LLM Compiler can predict, given a piece of LLVM assembly and a sequence of optimization passes for opt, the LLVM optimizer, what the change in code size will be and what the output code will look like after applying these optimizations. It has ‘understood’ the behavior of the optimizing compiler to such a degree that in many cases it can perfectly replicate its output. These capabilities make it ideally suited to compiler optimization tasks.

LLM Compiler has been trained to replicate the behavior of optimizing compilers.

In addition to this core functionality and to demonstrate its ability to solve complex compiler optimization problems, we are also releasing LLM Compiler FTD, which are models that have been further fine-tuned for two specific downstream tasks:

1. Predicting the best optimization passes for opt to use in order to minimize code size, given a piece of LLVM assembly code.

LLM Compiler FTD will tell you what compiler flags to use to reduce code size.

2. Generating LLVM IR from a piece of x86_64 or ARM assembly code.

LLM Compiler FTD can lift assembly code to compiler IR.

We are releasing LLM Compiler and LLM Compiler FTD models in two sizes: 7B and 13B parameters. The models have been trained with a context window of 16,000 tokens.

The two models address different serving and latency requirements. The 7B model, for example, can be served on a single GPU and is more suitable for tasks that require low latency, like fine-grained optimization. The 13B model returns the best results.

Amr Saafan 1 年前

Processor Design #4: Assembly Language

Simon Southwell 2 年前

Under the Hood: Exploring the Inner Workings of…

Ankit A. 5 个月前

LLM Compiler performance

We believe that for LLMs to fully understand code they must understand the mechanics of code optimization. We evaluate LLM Compiler on the task of emulating compiler optimizations by giving it unoptimized compiler intermediate representation (IR) and a randomly generated list of optimizations to apply. We then ask the model to generate the corresponding IR after the optimizations have been applied. LLM Compiler responds with compilable code most of the time and can even perfectly replicate the behavior of the compiler 20% of the time.

LLM Compiler can more accurately predict the outcome of compiler optimizations than Code Llama.

On the two difficult downstream tasks of code size optimization and disassembly LLM Compiler FTD excels. The optimization pass sequences generated by LLM Compiler FTD yield a 5.26% reduction in code size over the compiler's built in -Oz pass pipeline. On the disassembly task, LLM Compiler FTD far outperforms comparable LLMs.

LLM Compiler FTD excels at optimizing IR for code size and disassembling code to IR.

For the full details of our experimental evaluations please see our research paper.

Releasing LLM Compiler

LLMs are being used to make programming easier. They are beginning to be used to make programs more efficient.

At Meta, our conviction is that AI models, especially those designed for coding, thrive best with an open strategy, fostering both innovation and security. Models that are accessible to the public can expedite the creation of novel compiler optimization technologies. In turn, this will allow programs to be more efficient and smaller, enhancing the quality of life for all. By making models such as LLM Compiler available, the whole community can explore their potential, pinpoint problems, and rectify any vulnerabilities.

The model weights are available on Hugging Face.

The future of generative AI for optimization

LLM Compiler is designed to support compiler researchers and engineers. But there are still many more use cases to support than what our models can serve. We hope that LLM Compiler will inspire others to leverage LLMs to create new innovative tools for research and commercial products.