Sail: Self-Improving Efficient Online Alignment of Large Language Models

Most existing #LLM #alignment & #RLHF rely on offline data or an Oracle teacher model, constrained by data quality and model limits. This often results in suboptimal performance with new, real-world data.

The responses to the prompt and the preferences are either from an offline dataset or from an Oracle teacher/reference model.

How do we go beyond the limitations of static data? How can we achieve better-quality responses? Can the model achieve self-improvement via self-selection of preferences, eliminating the costly human feedback bottleneck?

Online generation of responses using the model itself is the key!


The responses are generated by the model itself. The model also self-critiques.

However, existing online RLHF overlooks the interdependence between data and the model. The response used to (implicitly) fit a reward that guides model updates is generated by the model itself.


The

In a prior paper, PARL, we correctly model this interdependence using bilevel optimization: an upper-level reward optimization relying on the optimal model π* which is the solution of a lower-level RL problem.

Bilevel formation avoids using suboptimal data generated from previous round for (implicit) reward learning.

However, while bilevel optimization is a principled approach to online RLHF, it suffers from computational tractability issues and requires estimating hyper-gradients.

Introducing SAIL, which transforms the bilevel problem into single-level optimization. It turns out, compared to DPO gradient updates, SAIL has an additional term that induces exploration.

SAIL comes as a unified framework, offering user-defined online adaptability. You can select static or dynamic responses as well as static or dynamic preferences.

SAIL dramatically improves alignment with reduced computational demands.

Both

This work paves the way for more resilient and adaptable language models that better reflect evolving human preferences. We're excited to see where this leads us! Read more about SAIL and its implications here: https://arxiv.org/abs/2406.15567 .


SAIL with my awesome collaborators Mucong Ding Souradip Chakraborty Vibhu Agrawal Zora Che Alec Koppel 梦迪 王 Amrit Singh Bedi .


要查看或添加评论,请登录

Furong Huang的更多文章

社区洞察

其他会员也浏览了