Integrating Behavioral Economics into AI
Stefan Wendin
Driving transformation, innovation & business growth by bridging the gap between technology and business; combining system & design thinking with cutting-edge technologies; Graphs, AI, GenAI, LLM, ML ??
The evolution of language model alignment methods, from Reinforcement Learning from Human Feedback (RLHF) to Direct Preference Optimization (DPO), and now to Kahneman Tversky Optimization (KTO), represents a shift towards incorporating insights from cognitive psychology and decision-making theories into AI development. This transition is particularly notable with the introduction of KTO, a methodology named in honor of Daniel Kahneman and Amos Tversky for their influential work in behavioral economics, particularly highlighted in Kahneman's book 'Thinking, Fast and Slow'.
KTO is grounded in the principles established by Kahneman and Tversky, particularly their research on prospect theory from 1992, which examines human decision-making and cognitive biases in uncertain situations. This theory has been adapted to optimize AI algorithms in KTO, focusing on aligning language models with human feedback more effectively. Kahneman and Tversky's work on how people assess probabilities and make choices under uncertainty provides the foundational insights for KTO's approach to AI alignment.
The research on Human-Centered Loss Functions (HALOs) by Douwe Kiela, Dan Jurafsky, and others at Contextual AI is a critical development in this field. HALOs, as implemented in KTO, aim to align AI decision-making processes more closely with human values and preferences. This is achieved by modeling human utility functions that reflect how humans perceive gains and losses, thereby making AI systems' decisions more intuitive and aligned with human preferences.
The progression from RLHF to DPO, and now to KTO, indicates an important trend towards integrating psychological insights into AI development. This trend is marked by a growing emphasis on creating AI systems that are not only effective but also ethically aligned, understandable, and user-friendly. Such a human-centric approach in artificial intelligence development promises to enhance the ethical considerations, relatability, and overall effectiveness of AI systems.
Direct Preference Optimization (DPO):
Kahneman-Tversky Optimization (KTO):
领英推荐
In short, both DPO and KTO offer groundbreaking methodologies for aligning LLMs with human preferences. DPO simplifies the training process by directly optimizing language models based on human preferences, while KTO leverages insights from behavioral economics to model human decision-making distortions, enabling more nuanced alignment of LLMs. The development and success of these methods mark a notable evolution in the field of LLM alignment, showcasing potential for effective and human-centric optimization of language models.
Direct Preference Optimization: Your Language Model is Secretly a Reward Model:
Human-Centered Loss Functions (HALOs)
Transformation Finance Manager at Mars
6 个月This was super interesting, thanks for writing up. I haven't touched AI since 2019 and at the time was perplexed by the dominance of ReLu activation functions over ones that incorporated negative weights (e.g. TanH) since the importance of negative symbols was so obvious in my human thinking experience, made especially tangible after reading TFAS. Interesting to see such applications at the fine tuning stage, log discounting gains and exponential weighting of losses certainly seems for human, and may even arrive at more human compatible outcomes, but the big question is does it more effectively point toward the truth? I'd say Kahneman & Tversky's work was largely pointed toward "No". Perhaps such applications are best suited for models fine tuned for aesthetic interaction (images, music, "companion" chat bots) and not those where facts matter (research amalgamation, google replacement, etc).
* Chief Professional Speaker at PedroSpeaks, LLC! * Better Choice and Decision-Making = Optimized Outcomes*
8 个月Excellent read! As research points to better decisions (satisfactory the decision-maker) reached when a hybrid approach is used, with both associative, heuristic AND rational, attribute-based methods, "humanizing" these models seems like tangible progress to me - the more human factors' integration, better the AI answers.
LLMs @ Datera
9 个月Great read! Before I started with AI and LLMs, I was very interested in behavioral economics, critical thinking and evidence-based approach, so this is amazing for me. :) I'm not actually sure what I think about this method - wouldn't it be better to make the model as close to "perfectly rational" rather than incorporating human biases?
CTO / Chief Architect at ABIS Czech ?? Interested in AI research
9 个月Stefan Wendin, in this context I'd also recommend the Blended ensemble "trick", including my comment here. It's so simple but surprisingly effective: https://www.dhirubhai.net/posts/pramodith_the-blending-is-all-you-need-paper-isn-activity-7151165892758839297-aEq0