登录查看更多内容

Integrating Behavioral Economics into AI

Stefan Wendin

Driving transformation, innovation & business growth by bridging the gap between technology and business; combining system & design thinking with cutting-edge technologies; Graphs, AI, GenAI, LLM, ML ??

发布日期: 2024年1月12日

The evolution of language model alignment methods, from Reinforcement Learning from Human Feedback (RLHF) to Direct Preference Optimization (DPO), and now to Kahneman Tversky Optimization (KTO), represents a shift towards incorporating insights from cognitive psychology and decision-making theories into AI development. This transition is particularly notable with the introduction of KTO, a methodology named in honor of Daniel Kahneman and Amos Tversky for their influential work in behavioral economics, particularly highlighted in Kahneman's book 'Thinking, Fast and Slow'.

KTO is grounded in the principles established by Kahneman and Tversky, particularly their research on prospect theory from 1992, which examines human decision-making and cognitive biases in uncertain situations. This theory has been adapted to optimize AI algorithms in KTO, focusing on aligning language models with human feedback more effectively. Kahneman and Tversky's work on how people assess probabilities and make choices under uncertainty provides the foundational insights for KTO's approach to AI alignment.

The research on Human-Centered Loss Functions (HALOs) by Douwe Kiela, Dan Jurafsky, and others at Contextual AI is a critical development in this field. HALOs, as implemented in KTO, aim to align AI decision-making processes more closely with human values and preferences. This is achieved by modeling human utility functions that reflect how humans perceive gains and losses, thereby making AI systems' decisions more intuitive and aligned with human preferences.

The progression from RLHF to DPO, and now to KTO, indicates an important trend towards integrating psychological insights into AI development. This trend is marked by a growing emphasis on creating AI systems that are not only effective but also ethically aligned, understandable, and user-friendly. Such a human-centric approach in artificial intelligence development promises to enhance the ethical considerations, relatability, and overall effectiveness of AI systems.

Direct Preference Optimization (DPO):

Concept and Advantages: DPO is introduced as a stable, performant, and computationally lightweight alternative to traditional RLHF methods. It eschews the complexities of explicit reward modeling or reinforcement learning, simplifying the alignment process. DPO increases the relative log probability of preferred responses over dispreferred ones, incorporating a dynamic importance weight to prevent model degeneration, a common issue with naive probability ratio objectives.
Effectiveness: DPO is at least as effective as existing methods, including PPO-based RLHF, for learning from preferences in tasks like sentiment modulation, summarization, and dialogue. This effectiveness extends to language models with up to 6B parameters. Notably, DPO surpasses PPO-based RLHF in controlling the sentiment of generations and matches or improves response quality in summarization and single-turn dialogue, while being substantially simpler to implement and train.
Performance in Summarization and Dialogue: In summarization and single-turn dialogue tasks, DPO demonstrates a win rate of approximately 61% at a temperature of 0.0, outperforming PPO at its optimal sampling temperature. DPO shows robustness to sampling temperature and a higher maximum win rate compared to other methods, indicating its potential for broader application.

DPO optimizes for human preferences while avoiding reinforcement learning.

Kahneman-Tversky Optimization (KTO):

Eng. Simon Bere 5 个月前

Johannes Eichstaedt: Exploring the Intersection of…

Stanford Institute for Human-Centered Artificial Intelligence (HAI) 2 年前

Suicide Ideation Detection Using NLP Analysis

Sergey Sundukovskiy, Ph.D. 3 年前

Foundation and Approach: KTO, developed by researchers including Douwe Kiela and Dan Jurafsky at Contextual AI, is based on Kahneman & Tversky's prospect theory. This theory suggests humans perceive randomness in a distorted manner, being more sensitive to losses than to gains of the same magnitude. KTO models these distortions as human-centered loss functions (HALOs), maximizing the utility of LLM generations directly instead of the log-likelihood of preferences.
Key Features of KTO: One of the significant advantages of KTO is its non-reliance on paired preference data. It requires only the knowledge of whether an output is desirable or undesirable. This characteristic makes KTO much easier to deploy in real-world scenarios, where such unpaired data is more abundant. KTO-aligned models have been shown to perform as well or better than DPO-aligned models across various scales from 1B to 30B.
Validation and Scalability: To validate KTO and assess its scalability across model sizes, the Archangel suite comprising 56 models was developed. These models, pre-trained across a range from 1B to 30B and aligned using different methods, were tested on a mixture of datasets under nearly identical training settings.
Adapting Prospect Theory for LLMs: KTO adapts the Kahneman-Tversky human value function for LLMs. The original function's exponent, which made optimization difficult, was replaced with a logistic function that is concave in gains and convex in losses. This adaptation omits the loss-aversion coefficient, under the hypothesis that humans care equally about gains and losses in the context of text.

LLM alignment involves supervised finetuning followed by optimizing a human-centered loss (HALO). How- ever, the paired preferences that existing approaches need are hard-to-get. Kahneman-Tversky Optimization (KTO) uses a far more abundant kind of data, making it much easier to use in the real world.

In short, both DPO and KTO offer groundbreaking methodologies for aligning LLMs with human preferences. DPO simplifies the training process by directly optimizing language models based on human preferences, while KTO leverages insights from behavioral economics to model human decision-making distortions, enabling more nuanced alignment of LLMs. The development and success of these methods mark a notable evolution in the field of LLM alignment, showcasing potential for effective and human-centric optimization of language models.

Direct Preference Optimization: Your Language Model is Secretly a Reward Model:

https://arxiv.org/abs/2305.18290

Human-Centered Loss Functions (HALOs)

https://github.com/ContextualAI/HALOs/blob/main/assets/report.pdf

Brian Kemp

Transformation Finance Manager at Mars

6 个月

This was super interesting, thanks for writing up. I haven't touched AI since 2019 and at the time was perplexed by the dominance of ReLu activation functions over ones that incorporated negative weights (e.g. TanH) since the importance of negative symbols was so obvious in my human thinking experience, made especially tangible after reading TFAS. Interesting to see such applications at the fine tuning stage, log discounting gains and exponential weighting of losses certainly seems for human, and may even arrive at more human compatible outcomes, but the big question is does it more effectively point toward the truth? I'd say Kahneman & Tversky's work was largely pointed toward "No". Perhaps such applications are best suited for models fine tuned for aesthetic interaction (images, music, "companion" chat bots) and not those where facts matter (research amalgamation, google replacement, etc).

1 次回应

Pedro Correa

* Chief Professional Speaker at PedroSpeaks, LLC! * Better Choice and Decision-Making = Optimized Outcomes*

8 个月

Excellent read! As research points to better decisions (satisfactory the decision-maker) reached when a hybrid approach is used, with both associative, heuristic AND rational, attribute-based methods, "humanizing" these models seems like tangible progress to me - the more human factors' integration, better the AI answers.

Matou? Eibich

LLMs @ Datera

9 个月

Great read! Before I started with AI and LLMs, I was very interested in behavioral economics, critical thinking and evidence-based approach, so this is amazing for me. :) I'm not actually sure what I think about this method - wouldn't it be better to make the model as close to "perfectly rational" rather than incorporating human biases?

Petr Kazar

CTO / Chief Architect at ABIS Czech ?? Interested in AI research

9 个月

Stefan Wendin, in this context I'd also recommend the Blended ensemble "trick", including my comment here. It's so simple but surprisingly effective: https://www.dhirubhai.net/posts/pramodith_the-blending-is-all-you-need-paper-isn-activity-7151165892758839297-aEq0

2 次回应

查看更多评论

要查看或添加评论，请登录

Stefan Wendin的更多文章

Overcoming the Limitations of Softmax for Sharp Out-of-Distribution Performance in AI Systems

2024年10月4日

Overcoming the Limitations of Softmax for Sharp Out-of-Distribution Performance in AI Systems

Yesterday, I went to bed really late, mostly because I had just jumped on a plane to celebrate the marriage of my…

1 条评论
Building Our Own Knowledge System: Why We Took This Path

2024年9月24日

Building Our Own Knowledge System: Why We Took This Path

A few weeks ago, we unveiled SvenAI at an event held at Friends Arena—now known as Strawberry Arena. Amidst the social…

20 条评论
Solar Pro: High-Performance LLM on a Single GPU

2024年9月13日

Solar Pro: High-Performance LLM on a Single GPU

Solar Pro, an advanced large language model (LLM) developed by Upstage AI. With just 22 billion parameters, Solar Pro…

7 条评论
OpenAI o1 Is Out: Embracing Inference-Time Scaling and the Future of AI Reasoning

2024年9月12日

OpenAI o1 Is Out: Embracing Inference-Time Scaling and the Future of AI Reasoning

We are witnessing a shift toward inference-time Introducing OpenAI o1-preview OpenAI has unveiled the o1 series, a new…

4 条评论
Deep dive into LiGNN: Graph Neural Networks at LinkedIn

2024年2月23日

Deep dive into LiGNN: Graph Neural Networks at LinkedIn

This week, two developments caught my attention. Firstly, LinkedIn announced a significant policy change in its data…

12 条评论
The Illusion of Progress: Why Playing It Safe Is the Riskiest Move of All

2024年2月19日

The Illusion of Progress: Why Playing It Safe Is the Riskiest Move of All

Breaking the Cycle: How Businesses Stifle Their Own Growth The concept of marginal gains from the world of sports…

7 条评论
The Intersection of Innovation, Privacy, and Collaboration in South Korea's Tech Landscape

2024年1月15日

The Intersection of Innovation, Privacy, and Collaboration in South Korea's Tech Landscape

I seized the opportunity to reconnect with some familiar faces from last year's trip - Pete Tae-hoon Kim and Hyun-Kyu…

9 条评论
SOLAR 10.7B: the Sun of AI Rises in the East

2024年1月8日

SOLAR 10.7B: the Sun of AI Rises in the East

Last week, I had the incredible opportunity to meet with Hwalsuk Lee, the Chief Technology Officer at Upstage, for an…

3 条评论
The Three Traps of Busyness, Incentives, Cultural Fit - and the HIPPO-SLOTH Conundrum

2024年1月2日

The Three Traps of Busyness, Incentives, Cultural Fit - and the HIPPO-SLOTH Conundrum

Meet the Three Unwise Men: Busyness, Incentive Misalignment, and Cultural Fit Myths. Unlike their wise counterparts…

2 条评论
Gemini: A Family of Highly Capable Multimodal Models.

2023年12月30日

Gemini: A Family of Highly Capable Multimodal Models.

The Gemini family of models by Google represents a significant leap in artificial intelligence, introducing a…

See all articles

Integrating Behavioral Economics into AI

Stefan Wendin

Driving transformation, innovation & business growth by bridging the gap between technology and business; combining system & design thinking with cutting-edge technologies; Graphs, AI, GenAI, LLM, ML ??

领英推荐

Stefan Wendin的更多文章

社区洞察

其他会员也浏览了

Translating Perspectives on Rationality for AI

The Secrets of How Tebogo Beat Lyles. The 2024 Olympics. Lessons and Insights into Elite Mental, Intellectual, Academic, Physical and Performance.

From Neurons to Norms: A Holistic Approach to Human Experience

How does one learn the field of AI-Psychology?

Can AI Psychology Provide Insights into Human Cognitive Development?

#FOCUS2020 Heuristic Applied-Research: Understanding research on human factors based on psychology not computer science.

Behavioural Sciences in Corporates - Part 17 - In pursuit of causality

Random finds (2018, week 12)?—?On Cambridge Analytica’s persuasion machine, our obsession with peak productivity, and beauty in art

The Intersection of Artificial Intelligence and Society: A Sociological Analysis in the Technological Era

The Power of Inquiry

领英推荐

Stefan Wendin的更多文章

Overcoming the Limitations of Softmax for Sharp Out-of-Distribution Performance in AI Systems

Building Our Own Knowledge System: Why We Took This Path

Solar Pro: High-Performance LLM on a Single GPU

OpenAI o1 Is Out: Embracing Inference-Time Scaling and the Future of AI Reasoning

Deep dive into LiGNN: Graph Neural Networks at LinkedIn

The Illusion of Progress: Why Playing It Safe Is the Riskiest Move of All

The Intersection of Innovation, Privacy, and Collaboration in South Korea's Tech Landscape

SOLAR 10.7B: the Sun of AI Rises in the East

The Three Traps of Busyness, Incentives, Cultural Fit - and the HIPPO-SLOTH Conundrum

Gemini: A Family of Highly Capable Multimodal Models.

社区洞察

其他会员也浏览了

Translating Perspectives on Rationality for AI

The Secrets of How Tebogo Beat Lyles. The 2024 Olympics. Lessons and Insights into Elite Mental, Intellectual, Academic, Physical and Performance.

From Neurons to Norms: A Holistic Approach to Human Experience

How does one learn the field of AI-Psychology?

Can AI Psychology Provide Insights into Human Cognitive Development?

#FOCUS2020 Heuristic Applied-Research: Understanding research on human factors based on psychology not computer science.

Behavioural Sciences in Corporates - Part 17 - In pursuit of causality

Random finds (2018, week 12)?—?On Cambridge Analytica’s persuasion machine, our obsession with peak productivity, and beauty in art

The Intersection of Artificial Intelligence and Society: A Sociological Analysis in the Technological Era

The Power of Inquiry