LLM Personalization: User Persona based Personalization of LLM generated Responses
Debmalya Biswas
AI/Analytics @ Wipro | x- Nokia, SAP, Oracle | 50+ patents | PhD - INRIA
Introduction
ChatGPT, or the underlying Large Language Models (LLMs) today, are able to generate contextualized responses given a prompt.?
As a next step in the LLM evolution, we expect the responses to be more and more personalized with respect to the persona, interaction history, current conversation context and sentiment of the end-users.
The key benefits of LLM personalization include:
In my previous article [1], I wrote about designing a Use-case based evaluation strategy for LLMs. In a way, when we talk about applying Generative AI (Gen AI) to solve a specific use-case today, we are basically personalizing a pre-trained (foundational) LLM, such that it provides responses specific to that Use-case. Use-case contextualization [2] today entails primarily applying fine-tuning / RAG on pre-trained LLMs with use-case specific enterprise data.
In this article, we discuss how the same techniques can be applied on user data, both user profile and conversation data?—?to personalize the LLM responses.
So we basically need to see how best to apply:?
on user data:
User Persona based LLM Fine-tuning
Users today expect a seamless and personalized experience with customized answers to meet their specific queries. However, user specific personalization remains challenging due to scale, performance, and privacy challenges.
Persona based personalization [3] aims to overcome these challenges by segmenting the end-users of a service into a manageable set of user categories, which represent the demographics and preferences of majority of users.?For example, the typical personas in a Gen AI enabled IT Service Desk (one of the areas with highest Gen AI adoption today) scenario include:
Given this, it makes sense to perform Persona based fine-tuning of LLMs?—?to create Persona specific Small Language Models (SLMs). The Model Router helps in performing prompt segmentation (scoring) and routing the prompts to the most relevant Persona SLM.?
The fine-tuning process consists of first parameterizing (aggregated) persona data and conversation history and storing it as memory in the LLM via adapters [4], followed by fine-tuning the LLM for personalized response generation. For example, refer to [5, 6] for details of Persona based LLM fine-tuning in Educational and Medical contexts, respectively.
LLM?—?User Embeddings
In this section, we focus on generating the User Conversation Embeddings, which is a pre-requisite for both fine-tuning and/or real-time RAG prompt context augmentation.
Fine-tuning LLMs on raw user data is often too complex, even if it is at the (aggregated) persona level.?
A good solution reference to overcome the above issues is Google’s work on User-LLMs [7].?
领英推荐
USER-LLM distills compressed representations from diverse and noisy user conversations, effectively capturing the essence of a user’s behavioral patterns and preferences across various interaction modalities.?
This approach empowers LLMs with a deeper understanding of users’ latent intent (inc. sentiment) and historical patterns (e.g., temporal evolution of user queries?—?responses) enabling LLMs to tailor responses and generate personalized outcomes. The solution architecture is illustrated in the below figure.
Reinforcement Learning based Personalization of LLMs
In this section, we show how LLM generated responses can be personalized based on a reinforcement learning based recommendation engine.
Reinforcement Learning (RL) is a powerful technique that is able to achieve complex goals by maximizing a reward function in real-time. The reward function works similar to incentivizing a child with candy and spankings, such that the algorithm is penalized when it takes a wrong decision and rewarded when it takes a right one?—?this is reinforcement.
We outline the high level steps below to enable a Reinforcement Learning based Recommendation Engine to personalize LLM generated responses.
1. The user conversation context and sentiment are gathered using available sensors to compute the ‘current’ user feedback,
2. which is then combined with user conversation history to quantify the user sentiment curve and discount any sudden changes in user sentiment;
3. leading to the aggregate reward value corresponding to the last LLM response provided to the user.
4. This reward value is then provided as feedback to the RL agent, to choose the next optimal LLM generated response to be provided to the user.
More concretely, we can formulate the integration of an RL enabled Recommendation Engine [8] with LLM based Chat App [9] as follows:
Action (a): An action a in this case corresponds to a LLM generated response delivered to the user in response to a user query / prompt, as part of an ongoing conversation.
Agent (A): is the one performing actions. In this case, the Agent is the Chat App delivering LLM responses to the users, where an action is selected based on its Policy (described below).
Environment: refers to the world with which the agent interacts, and which responds to the agent’s actions. In our case, the Environment corresponds to the User U interacting with the Chat App. U responds to A’s actions, by providing different types of feedback, both explicit (in the form of a chat response) and implicit (e.g., change in user sentiment).
Policy(??): is the strategy that the agent employs to select the next based action (NBA). Given a user profile Up, (current) sentiment Us, and query Uq; the Policy function computes the product of the response scores returned by the NLP and Recommendation Engines respectively, selecting the response with the highest score as the NBA:
Reward (r): refers the feedback by which we measure the success or failure of an agent’s recommended action (response). The feedback can e.g. refer to the amount of time that a user spends reading a recommended article, or the change in user sentiment on receiving a response. We consider a 2-step reward function computation where the feedback fa received with respect to a recommended action is first mapped to a sentiment score, which is then mapped to a reward
r(a, fa) = s(fa)
where r and s refer to the reward and sentiment functions, respectively. The RL formulation is illustrated in the below figure:
Conclusion
In this article, we considered personalization of LLM generated responses based on user data. Personalization has the potential to significantly accelerate LLM adoption by improving user satisfaction rates. We proposed and detailed three LLM personalization techniques: (a) Persona based LLM fine-tuning, (b) generating User LLM embeddings for inferencing, and (c) Reinforcement Learning based LLM personalization. As future work, we are exploring a consolidated approach applying a mix of the personalization techniques based on use-case requirements and user data availability.
References
AI/Analytics @ Wipro | x- Nokia, SAP, Oracle | 50+ patents | PhD - INRIA
3 个月The full article is now published in Towards Data Science https://towardsdatascience.com/llm-personalization-3255dec7079f
AI Business Automation & Workflows | Superior Website Creation & Maintenance | Podcast
3 个月Great insights! Totally agree.
CTO @ Patched (YC S24)
3 个月For personalizarion, if you can be explicit about the facts you want to store you can use long term memory. Libraries like Mem0 (YC S24) make it very simple to store and retrieve memories that can fork the basis of your users’s profiles and persona.