Integrating Behavioral Economics into AI

Integrating Behavioral Economics into AI

The evolution of language model alignment methods, from Reinforcement Learning from Human Feedback (RLHF) to Direct Preference Optimization (DPO), and now to Kahneman Tversky Optimization (KTO), represents a shift towards incorporating insights from cognitive psychology and decision-making theories into AI development. This transition is particularly notable with the introduction of KTO, a methodology named in honor of Daniel Kahneman and Amos Tversky for their influential work in behavioral economics, particularly highlighted in Kahneman's book 'Thinking, Fast and Slow'.

KTO is grounded in the principles established by Kahneman and Tversky, particularly their research on prospect theory from 1992, which examines human decision-making and cognitive biases in uncertain situations. This theory has been adapted to optimize AI algorithms in KTO, focusing on aligning language models with human feedback more effectively. Kahneman and Tversky's work on how people assess probabilities and make choices under uncertainty provides the foundational insights for KTO's approach to AI alignment.

The research on Human-Centered Loss Functions (HALOs) by Douwe Kiela, Dan Jurafsky, and others at Contextual AI is a critical development in this field. HALOs, as implemented in KTO, aim to align AI decision-making processes more closely with human values and preferences. This is achieved by modeling human utility functions that reflect how humans perceive gains and losses, thereby making AI systems' decisions more intuitive and aligned with human preferences.

The progression from RLHF to DPO, and now to KTO, indicates an important trend towards integrating psychological insights into AI development. This trend is marked by a growing emphasis on creating AI systems that are not only effective but also ethically aligned, understandable, and user-friendly. Such a human-centric approach in artificial intelligence development promises to enhance the ethical considerations, relatability, and overall effectiveness of AI systems.


Direct Preference Optimization (DPO):

  • Concept and Advantages: DPO is introduced as a stable, performant, and computationally lightweight alternative to traditional RLHF methods. It eschews the complexities of explicit reward modeling or reinforcement learning, simplifying the alignment process. DPO increases the relative log probability of preferred responses over dispreferred ones, incorporating a dynamic importance weight to prevent model degeneration, a common issue with naive probability ratio objectives.
  • Effectiveness: DPO is at least as effective as existing methods, including PPO-based RLHF, for learning from preferences in tasks like sentiment modulation, summarization, and dialogue. This effectiveness extends to language models with up to 6B parameters. Notably, DPO surpasses PPO-based RLHF in controlling the sentiment of generations and matches or improves response quality in summarization and single-turn dialogue, while being substantially simpler to implement and train.
  • Performance in Summarization and Dialogue: In summarization and single-turn dialogue tasks, DPO demonstrates a win rate of approximately 61% at a temperature of 0.0, outperforming PPO at its optimal sampling temperature. DPO shows robustness to sampling temperature and a higher maximum win rate compared to other methods, indicating its potential for broader application.

DPO optimizes for human preferences while avoiding reinforcement learning.


Kahneman-Tversky Optimization (KTO):

  • Foundation and Approach: KTO, developed by researchers including Douwe Kiela and Dan Jurafsky at Contextual AI, is based on Kahneman & Tversky's prospect theory. This theory suggests humans perceive randomness in a distorted manner, being more sensitive to losses than to gains of the same magnitude. KTO models these distortions as human-centered loss functions (HALOs), maximizing the utility of LLM generations directly instead of the log-likelihood of preferences.
  • Key Features of KTO: One of the significant advantages of KTO is its non-reliance on paired preference data. It requires only the knowledge of whether an output is desirable or undesirable. This characteristic makes KTO much easier to deploy in real-world scenarios, where such unpaired data is more abundant. KTO-aligned models have been shown to perform as well or better than DPO-aligned models across various scales from 1B to 30B.
  • Validation and Scalability: To validate KTO and assess its scalability across model sizes, the Archangel suite comprising 56 models was developed. These models, pre-trained across a range from 1B to 30B and aligned using different methods, were tested on a mixture of datasets under nearly identical training settings.
  • Adapting Prospect Theory for LLMs: KTO adapts the Kahneman-Tversky human value function for LLMs. The original function's exponent, which made optimization difficult, was replaced with a logistic function that is concave in gains and convex in losses. This adaptation omits the loss-aversion coefficient, under the hypothesis that humans care equally about gains and losses in the context of text.

LLM alignment involves supervised finetuning followed by optimizing a human-centered loss (HALO). How- ever, the paired preferences that existing approaches need are hard-to-get. Kahneman-Tversky Optimization (KTO) uses a far more abundant kind of data, making it much easier to use in the real world.

In short, both DPO and KTO offer groundbreaking methodologies for aligning LLMs with human preferences. DPO simplifies the training process by directly optimizing language models based on human preferences, while KTO leverages insights from behavioral economics to model human decision-making distortions, enabling more nuanced alignment of LLMs. The development and success of these methods mark a notable evolution in the field of LLM alignment, showcasing potential for effective and human-centric optimization of language models.


Direct Preference Optimization: Your Language Model is Secretly a Reward Model:

https://arxiv.org/abs/2305.18290


Human-Centered Loss Functions (HALOs)

https://github.com/ContextualAI/HALOs/blob/main/assets/report.pdf

Brian Kemp

Transformation Finance Manager at Mars

6 个月

This was super interesting, thanks for writing up. I haven't touched AI since 2019 and at the time was perplexed by the dominance of ReLu activation functions over ones that incorporated negative weights (e.g. TanH) since the importance of negative symbols was so obvious in my human thinking experience, made especially tangible after reading TFAS. Interesting to see such applications at the fine tuning stage, log discounting gains and exponential weighting of losses certainly seems for human, and may even arrive at more human compatible outcomes, but the big question is does it more effectively point toward the truth? I'd say Kahneman & Tversky's work was largely pointed toward "No". Perhaps such applications are best suited for models fine tuned for aesthetic interaction (images, music, "companion" chat bots) and not those where facts matter (research amalgamation, google replacement, etc).

Pedro Correa

* Chief Professional Speaker at PedroSpeaks, LLC! * Better Choice and Decision-Making = Optimized Outcomes*

8 个月

Excellent read! As research points to better decisions (satisfactory the decision-maker) reached when a hybrid approach is used, with both associative, heuristic AND rational, attribute-based methods, "humanizing" these models seems like tangible progress to me - the more human factors' integration, better the AI answers.

回复
Matou? Eibich

LLMs @ Datera

9 个月

Great read! Before I started with AI and LLMs, I was very interested in behavioral economics, critical thinking and evidence-based approach, so this is amazing for me. :) I'm not actually sure what I think about this method - wouldn't it be better to make the model as close to "perfectly rational" rather than incorporating human biases?

回复
Petr Kazar

CTO / Chief Architect at ABIS Czech ?? Interested in AI research

9 个月

Stefan Wendin, in this context I'd also recommend the Blended ensemble "trick", including my comment here. It's so simple but surprisingly effective: https://www.dhirubhai.net/posts/pramodith_the-blending-is-all-you-need-paper-isn-activity-7151165892758839297-aEq0

要查看或添加评论,请登录

Stefan Wendin的更多文章

社区洞察

其他会员也浏览了