SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution
Credit: https://arxiv.org/pdf/2502.18449

SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution

Today's paper introduces SWE-RL, an approach that uses reinforcement learning to enhance large language models' reasoning capabilities for software engineering tasks. The method leverages software evolution data (like GitHub pull requests) and rule-based rewards to train LLMs to solve real-world software issues. SWE-RL enables open-source models to achieve competitive performance on software engineering benchmarks.

Method Overview

SWE-RL is a reinforcement learning framework that trains LLMs to solve software engineering tasks using real-world software evolution data. The process begins with curating a comprehensive dataset of GitHub pull requests (PRs), which includes issue descriptions, code contexts, and the corresponding patches that fixed those issues. This data serves as the foundation for the reinforcement learning process.

SWE-RL trains a policy LLM to generate code changes through reasoning. For each issue in the training data, the model is presented with the issue description and relevant code context. It then attempts to solve the issue by generating a patch. The quality of this patch is evaluated using a simple rule-based reward function: if the format is incorrect, the model receives a negative reward; otherwise, it receives a reward based on the similarity between the predicted patch and the oracle (ground truth) patch.

What makes SWE-RL unique is its approach to providing context. The model is given the complete content of each file in the input prompt, which implicitly teaches it to reason about precise fault locations before suggesting repair edits. This forces the model to develop both bug diagnosis and repair generation capabilities. The training uses Group Relative Policy Optimization (GRPO), where multiple rollouts are generated for each problem, and the policy is updated based on the normalized rewards within each group.

Importantly, SWE-RL only requires the model to generate repair edits during training, yet the resulting model can generalize to other related tasks like file-level fault localization and test generation. This emergent capability demonstrates how reinforcement learning can help models develop broader reasoning skills beyond the specific training objective.

Results

The paper's main result is Llama3-SWE-RL-70B, a model trained with SWE-RL on top of Llama-3.3-70B-Instruct. This model achieves a 41.0% solve rate on SWE-bench Verified, a human-verified collection of real-world GitHub issues. This performance represents the best result among medium-sized language models (<100B parameters) and is comparable to leading proprietary models like GPT-4o.

When compared to a supervised fine-tuning (SFT) baseline trained on the same data, Llama3-SWE-RL-70B demonstrates superior performance not only on SWE-bench but also on out-of-domain tasks. The model shows improved results on five different categories: function coding, library use, code reasoning, mathematics, and general language understanding. This indicates that SWE-RL helps the model develop generalized reasoning skills that transfer across domains, whereas the SFT approach tends to overfit to specific task distributions.

The paper also demonstrates that using a continuous reward function (based on sequence similarity) outperforms a discrete reward function (exact match only) in SWE-RL. The continuous reward better captures partial correctness and allows for more nuanced learning of repair strategies.

Conclusion

SWE-RL leverages reinforcement learning on software evolution data and enables models to develop strong reasoning capabilities for solving real-world software issues without relying on proprietary models. The resulting Llama3-SWE-RL-70B model achieves state-of-the-art performance among medium-sized models on SWE-bench. For more information please consult the full paper.

Congrats to the authors for their work!

Wei, Yuxiang, et al. "SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution." arXiv preprint arXiv:2502.18449 (2025).

要查看或添加评论,请登录

Vlad Bogolin的更多文章