LLM Personalization: User Persona based Personalization of LLM generated Responses

LLM Personalization: User Persona based Personalization of LLM generated Responses


Introduction

ChatGPT, or the underlying Large Language Models (LLMs) today, are able to generate contextualized responses given a prompt.?

As a next step in the LLM evolution, we expect the responses to be more and more personalized with respect to the persona, interaction history, current conversation context and sentiment of the end-users.

The key benefits of LLM personalization include:

  • Personalized responses: The Gen AI solution adapts its language, tone, and complexity based on the user it is interacting with. This ensures that the conversation is more aligned with the user’s expectations and communication style.
  • Conversation context: The Gen AI solution is aware of the user’s typical use cases, preferences, and history, allowing it to provide more contextually relevant and personalized responses.
  • Content customization: The Gen AI solution can prioritize or highlight different features, or types of content, based on the user’s needs, making the interaction more efficient and user-friendly.
  • Proactive Assistance: The Gen AI solution anticipates the needs of different users and offers proactive suggestions, resources, or reminders tailored to their specific profiles or tasks.

In my previous article [1], I wrote about designing a Use-case based evaluation strategy for LLMs. In a way, when we talk about applying Generative AI (Gen AI) to solve a specific use-case today, we are basically personalizing a pre-trained (foundational) LLM, such that it provides responses specific to that Use-case. Use-case contextualization [2] today entails primarily applying fine-tuning / RAG on pre-trained LLMs with use-case specific enterprise data.

Fig: Use-case based LLM contextualization
In this article, we discuss how the same techniques can be applied on user data, both user profile and conversation data?—?to personalize the LLM responses.

So we basically need to see how best to apply:?

  • Fine-tuning,
  • Retrieval-Augmented-Generation (RAG),
  • Reinforcement Learning with Human Feedback (RLHF)

on user data:

  • user profile, persona
  • conversation history
  • current conversation context and sentiment.

User Persona based LLM Fine-tuning

Users today expect a seamless and personalized experience with customized answers to meet their specific queries. However, user specific personalization remains challenging due to scale, performance, and privacy challenges.

Persona based personalization [3] aims to overcome these challenges by segmenting the end-users of a service into a manageable set of user categories, which represent the demographics and preferences of majority of users.?For example, the typical personas in a Gen AI enabled IT Service Desk (one of the areas with highest Gen AI adoption today) scenario include:

  • Leadership: Senior individuals (e.g., VPs, Directors) who require priority support with secure access to sensitive data, and assistance with high-level presentations and video conferencing.
  • Knowledge workers: Employees who rely heavily on technology to perform their daily tasks (e.g., analysts, engineers, designers).
  • Field workers: Employees who work primarily outside the office (e.g., sales representatives, service technicians). As such, their requirements are mostly focused on remote access to corporate systems, reliable VPNs, and support with offline work capabilities.
  • Administrative / HR: Support staff responsible for various administrative tasks (e.g., HR, Finance) with primary requirements around assistance with MS Office software, access to specific business applications, and quick resolution of routine IT issues.
  • New employees / Interns: Individuals who are new to the organization and may not be fully familiar with the company’s IT systems. As such, their queries mostly focus on onboarding related queries.

Given this, it makes sense to perform Persona based fine-tuning of LLMs?—?to create Persona specific Small Language Models (SLMs). The Model Router helps in performing prompt segmentation (scoring) and routing the prompts to the most relevant Persona SLM.?
Fig: User persona based LLM fine-tuning

The fine-tuning process consists of first parameterizing (aggregated) persona data and conversation history and storing it as memory in the LLM via adapters [4], followed by fine-tuning the LLM for personalized response generation. For example, refer to [5, 6] for details of Persona based LLM fine-tuning in Educational and Medical contexts, respectively.

  • [5] considers pre-training models on an educational corpus to establish a foundational knowledge base, and subsequently fine-tuning them on personalized tasks, e.g., essay assessment.
  • [6] combines parameter-efficient fine-tuning (PEFT) with a memory retrieval module to generate personalized medical responses.

LLM?—?User Embeddings

In this section, we focus on generating the User Conversation Embeddings, which is a pre-requisite for both fine-tuning and/or real-time RAG prompt context augmentation.

Fine-tuning LLMs on raw user data is often too complex, even if it is at the (aggregated) persona level.?

  • Conversation data usually spans multiple journeys with sparse data points, various interaction types (multimodal), and potential noise or inconsistencies with incomplete queries?—?responses.?
  • Moreover, effective personalization often requires a deep understanding of the latent intent / sentiment behind user actions, which can pose difficulties for generic (pre-trained) LLMs.?
  • Finally, fine-tuning is computationally intensive. User conversation data can be very lengthy. Processing and modeling such long sequences (e.g., multi-years' worth of conversational history) with LLMs can be practically infeasible.?

A good solution reference to overcome the above issues is Google’s work on User-LLMs [7].?

USER-LLM distills compressed representations from diverse and noisy user conversations, effectively capturing the essence of a user’s behavioral patterns and preferences across various interaction modalities.?

This approach empowers LLMs with a deeper understanding of users’ latent intent (inc. sentiment) and historical patterns (e.g., temporal evolution of user queries?—?responses) enabling LLMs to tailor responses and generate personalized outcomes. The solution architecture is illustrated in the below figure.


Fig: User embeddings based LLM Personalization

Reinforcement Learning based Personalization of LLMs

In this section, we show how LLM generated responses can be personalized based on a reinforcement learning based recommendation engine.

Reinforcement Learning (RL) is a powerful technique that is able to achieve complex goals by maximizing a reward function in real-time. The reward function works similar to incentivizing a child with candy and spankings, such that the algorithm is penalized when it takes a wrong decision and rewarded when it takes a right one?—?this is reinforcement.

We outline the high level steps below to enable a Reinforcement Learning based Recommendation Engine to personalize LLM generated responses.

1. The user conversation context and sentiment are gathered using available sensors to compute the ‘current’ user feedback,

2. which is then combined with user conversation history to quantify the user sentiment curve and discount any sudden changes in user sentiment;

3. leading to the aggregate reward value corresponding to the last LLM response provided to the user.

4. This reward value is then provided as feedback to the RL agent, to choose the next optimal LLM generated response to be provided to the user.

Fig: Reinforcement Learning based LLM Personalization

More concretely, we can formulate the integration of an RL enabled Recommendation Engine [8] with LLM based Chat App [9] as follows:

Action (a): An action a in this case corresponds to a LLM generated response delivered to the user in response to a user query / prompt, as part of an ongoing conversation.

Agent (A): is the one performing actions. In this case, the Agent is the Chat App delivering LLM responses to the users, where an action is selected based on its Policy (described below).

Environment: refers to the world with which the agent interacts, and which responds to the agent’s actions. In our case, the Environment corresponds to the User U interacting with the Chat App. U responds to A’s actions, by providing different types of feedback, both explicit (in the form of a chat response) and implicit (e.g., change in user sentiment).

Policy(??): is the strategy that the agent employs to select the next based action (NBA). Given a user profile Up, (current) sentiment Us, and query Uq; the Policy function computes the product of the response scores returned by the NLP and Recommendation Engines respectively, selecting the response with the highest score as the NBA:

  • The NLP Engine (NE) parses the query / prompt and outputs a ranked list of responses.
  • The Recommendation Engine (RE) provides a score for each response based on the reward function, and taking into account the user profile, preferences, conversation history / context and sentiment. The Policy function can be formalized as follows:

Reward (r): refers the feedback by which we measure the success or failure of an agent’s recommended action (response). The feedback can e.g. refer to the amount of time that a user spends reading a recommended article, or the change in user sentiment on receiving a response. We consider a 2-step reward function computation where the feedback fa received with respect to a recommended action is first mapped to a sentiment score, which is then mapped to a reward

r(a, fa) = s(fa)

where r and s refer to the reward and sentiment functions, respectively. The RL formulation is illustrated in the below figure:

Fig: LLM Personalization - RL formulation

Conclusion

In this article, we considered personalization of LLM generated responses based on user data. Personalization has the potential to significantly accelerate LLM adoption by improving user satisfaction rates. We proposed and detailed three LLM personalization techniques: (a) Persona based LLM fine-tuning, (b) generating User LLM embeddings for inferencing, and (c) Reinforcement Learning based LLM personalization. As future work, we are exploring a consolidated approach applying a mix of the personalization techniques based on use-case requirements and user data availability.

References

  1. D. Biswas. Enterprise Use Case-Based Evaluation of LLMs. https://towardsdatascience.com/enterprise-use-case-based-evaluation-of-llms-abcf8292889f
  2. D. Biswas. Contextualizing Large Language Models (LLMs) with Enterprise Data. https://medium.datadriveninvestor.com/contextualizing-large-language-models-llms-with-enterprise-data-419e252fcbb7
  3. N. Bhargava, et. al. How User Persona-based Services can Transform your Business. https://www.wipro.com/consulting/how-user-persona-based-services-can-transform-your-organization/
  4. K. Zhang, et. al. Personalized LLM Response Generation with Parameterized Memory Injection, 2024. https://arxiv.org/abs/2404.03565
  5. Y. Dan, et. al. EduChat: A Large-Scale Language Model-based Chatbot System for Intelligent Education, 2023. https://arxiv.org/abs/2308.0277
  6. K. Zhang, et. al. LLM-based Medical Assistant Personalization with Short- and Long-Term Memory Coordination, 2024. https://arxiv.org/abs/2309.11696
  7. L. Liu, et. al. USER-LLM: Efficient LLM Contextualization with User Embeddings, 2024. https://research.google/blog/user-llm-efficient-llm-contextualization-with-user-embeddings/
  8. D. Biswas. Delayed Rewards in the context of Reinforcement Learning based Recommender Systems. 24th European Conference on AI (ECAI) track on ‘Advances in AI for Healthcare’, 2020. https://ceur-ws.org/Vol-2820/AAI4H-10.pdf
  9. E. Ricciardelli, D. Biswas. Self-improving Chatbots based on Reinforcement Learning. 4th Multidisciplinary Conference on Reinforcement Learning and Decision Making (RLDM), 2019. https://towardsdatascience.com/self-improving-chatbots-based-on-reinforcement-learning-75cca62debce

Debmalya Biswas

AI/Analytics @ Wipro | x- Nokia, SAP, Oracle | 50+ patents | PhD - INRIA

3 个月
Alex Belov

AI Business Automation & Workflows | Superior Website Creation & Maintenance | Podcast

3 个月

Great insights! Totally agree.

Asankhaya Sharma

CTO @ Patched (YC S24)

3 个月

For personalizarion, if you can be explicit about the facts you want to store you can use long term memory. Libraries like Mem0 (YC S24) make it very simple to store and retrieve memories that can fork the basis of your users’s profiles and persona.

要查看或添加评论,请登录

社区洞察

其他会员也浏览了