登录查看更多内容

Contextual Thompson Sampling

Patrick Nicolas

Director Data Engineering @ aidéo technologies |software & data engineering, operations, and machine learning.

发布日期: 2024年5月20日

In this article, we delve into the notion of the multi-armed bandit, also known as the?K-armed bandit, with a particular focus on Thompson sampling as applied to the contextual bandit.

What you will learn:?How to add contextual information as prior to the Thompson sampling algorithm.

Multi-arm bandit problem

The multi-armed bandit problem is a well-known reinforcement learning technique, ?widely used for its simplicity. Let's consider a ?slot machine with n arms (bandits) with each arm having its own biased probability distribution of success. In its simplest form pulling any one of the arms gives you a ?reward of either 1 for success, or 0 for failure.?

Our objective is to pull the arms one-by-one in sequence such that we maximize our total reward collected in the long run.?

Data scientists have developed several solutions to tackle the multi-armed bandit problem for?which the 3 most common algorithms are:

Epsilon-Greedy
Upper confidence bounds
Thompson sampling

Epsilon-greedy

The Epsilon-greedy algorithm stands as the most straightforward method for navigating the exploration-exploitation dilemma. In the exploitation phase, it consistently chooses the lever with the highest recorded payoff. Nevertheless, occasionally (with a frequency of?ε, where ε is less than 0.1), it opts for a random lever to 'explore'—this allows for the investigation of levers whose payouts are not well-known. Levers with established or confirmed high payouts are selected the majority of the time, specifically?(1-ε)?of the instances.

Upper Confidence Bound (UCB)

The Upper Confidence Bound (UCB) algorithm is arguably the most popular approach to solving the multi-armed bandit problem. It is occasionally described by the principle of 'Optimism in the Face of Uncertainty': It operates under the assumption that the uncertain average rewards of each option will be on the higher end, according to past outcomes.

Thompson sampling

The?Thompson sampling?algorithm is essentially a?Bayesian optimization method. Its central tenet, known as the?probability matching strategy,?can be distilled to the concept of 'selecting a lever based on its likelihood of being the optimal choice.' This means the frequency of choosing a specific lever should correspond to its actual chances of being the best lever available.

The following diagram illustrates the difference between the upper confidence bounds and the Thompson sampling.

Fig. 1 Confidence Bounds and Thompson sampling for Multi-Armed bandit problem

Context anyone?

The methods discussed previously do not presume any specifics about the environment or the agent that is choosing the options. There are two scenarios:

Context-Free Bandit:?In this scenario, the choice of the option with the highest reward is based entirely on historical data, which includes?prior?outcomes and the record of?rewards (both successes and failures). This data can be represented using a probability distribution like the Bernoulli distribution. For example, the chance of rolling a '6' on a dice is not influenced by who is rolling the dice.
Contextual Bandit: Here, the current state of the environment, also known as the?context, plays a role in the decision-making process for selecting the option with the highest?expected reward. The algorithm takes into account a new context (state), makes a choice (selects an arm), and then observes the result (reward). For instance, in online advertising, the choice of which ad banner to display on a website is influenced by the historical reward data linked to the user's demographic information.

All of the strategies suited to the multi-armed bandit problem can be adapted for use with or without context. The following sections will concentrate on the problem of the contextual multi-armed bandit.

Contextual Thompson sampling (CTS)

Let's dive into the key characteristics of the Thompson sampling

We assume the prior distribution on the parameters of the distribution (unknown) of the reward for each arm.
At each step,?t, the arm is selected according to the posterior probability to be the optimal arm.

The components of the contextual Thompson sampling are

1. Model of parameters?w

2. A prior distribution?p(w)?of the model

领英推荐

Linear Regression is one of the most widely used…

Rashmi Priya 8 个月前

Naive bayes Classification

Bluechip Technologies Asia 9 个月前

Machine Learning - Hyperparameter Tuning

Gaurav Pahuja 3 年前

3. History?H?consisting of a context?x?and reward?r

4. Likelihood or probability?p(r|x, w)?of a reward given a context?x?and parameters?w

5. Posterior distribution computed using?na?ve Bayes?

But how can we model a context?

Actually, a process to select the most rewarding arm is actually a predictor or a learner. Each predictor takes the context, defines as a vector of features and predicts which arm will generate the highest reward.

The predictor is a model that can be defined as

Linear model
Generalized linear model?(i.e. logistic regression)
Neural network

The algorithm is implemented as a linear model (weights?w) for estimating the reward from a context?x??as

The ultimate objective for any reinforcement learning model is to extract a policy which quantifies the behavior of the agent. Given a set?X?of context?xi?and a set?A?of arms, the policy?π??is defined by the selection of an arm given a context?x

with the following variables initialization:

At each iteration:

The sampling of the normal distribution (line?1) is described in details in the post?Multivariate Normal Sampling with Scala. The algorithm computes the maximum reward estimation through sampling the normal distribution (line?2) and play the arm?a*?associated to it (line?3). The parameters?V?and?b?are updated with the estimated value (line?5?and?6) and the actual reward Rt?is computed (line?8) after the weights of the context?w?are updated (line?7).

Thank you for reading this article. For more information ...

References

---------------------------

Patrick Nicolas has over 25 years of experience in software and data engineering, architecture design and end-to-end deployment and support with extensive knowledge in machine learning.? He has been director of data engineering at Aideo Technologies since 2017 and he is the?author of "Scala for Machine Learning" Packt Publishing ISBN 978-1-78712-238-3

要查看或添加评论，请登录

Patrick Nicolas的更多文章

Riemannian Manifolds for Geometric Learning

2025年3月17日

Riemannian Manifolds for Geometric Learning

Intrigued by the idea of applying differential geometry to machine learning but feel daunted? Beyond theoretical…
Einstein Summation in Geometric Deep Learning

2025年3月7日

Einstein Summation in Geometric Deep Learning

The einsum function in NumPy and PyTorch, which implements Einstein summation notation, provides a powerful and…
Visualization of Graph Neural Networks

2025年2月26日

Visualization of Graph Neural Networks

Have you ever found it challenging to represent a graph from a very large dataset while building a graph neural network…
Modeling Graph Neural Networks with PyTorch

2025年2月17日

Modeling Graph Neural Networks with PyTorch

Have you ever wondered how to get started with Graph Neural Networks (GNNs)? Torch Geometric (PyG) provides a…
Approximating PCA on Manifolds

2025年2月6日

Approximating PCA on Manifolds

Have you ever wondered how to perform Principal Component Analysis on manifolds? An approximate solution relies on the…
Reviews of Papers on Geometric Learning - 2024

2025年1月27日

Reviews of Papers on Geometric Learning - 2024

2024 introduced a fascinating collection of papers on geometric deep learning. Here are reviews of a selection of them.

1 条评论
Fréchet Centroid on Manifolds in Python

2025年1月17日

Fréchet Centroid on Manifolds in Python

The Fréchet centroid (or intrinsic centroid) is a generalization of the concept of a mean to data points that lie on a…
Einstein Summation in Numpy

2025年1月11日

Einstein Summation in Numpy

Many research papers use Einstein summation notation to describe mathematical concepts. Wouldn't it be great to have a…
Deep Learning on Mac Laptop

2025年1月4日

Deep Learning on Mac Laptop

The latest high-performance Mac laptops are well-suited for experimentation. However, have you been frustrated by your…

1 条评论
Impact of Linear Activation on Convolution Networks

2024年12月21日

Impact of Linear Activation on Convolution Networks

Have you ever wondered how choosing an activation function can influence the performance of a convolutional neural…

See all articles

Contextual Thompson Sampling

Patrick Nicolas

Director Data Engineering @ aidéo technologies |software & data engineering, operations, and machine learning.

Multi-arm bandit problem

Epsilon-greedy

Upper Confidence Bound (UCB)

Thompson sampling

Context anyone?

Contextual Thompson sampling (CTS)

领英推荐

References

Patrick Nicolas的更多文章

社区洞察

其他会员也浏览了

Machine Learning Guide for Petroleum Professionals: Part 4

What Is Polynomial Regression in Machine Learning?

Machine Learning Guide for Petroleum Professionals: Part 2

Machine Learning 7:'Classification' Day 3

Linear Regression in Machine Learning

?Naive Bayes Algorithm - Explained??

Machine Learning : 'Regression' - Day 2

Violating Linear Regression Assumptions: A guide on what not to do

Top 10 Activation Function's Advantages & Disadvantages

Multi-arm bandit problem

Epsilon-greedy

Upper Confidence Bound (UCB)

Thompson sampling

Context anyone?

Contextual Thompson sampling (CTS)

领英推荐

References

Patrick Nicolas的更多文章

Riemannian Manifolds for Geometric Learning

Einstein Summation in Geometric Deep Learning

Visualization of Graph Neural Networks

Modeling Graph Neural Networks with PyTorch

Approximating PCA on Manifolds

Reviews of Papers on Geometric Learning - 2024

Fréchet Centroid on Manifolds in Python

Einstein Summation in Numpy

Deep Learning on Mac Laptop

Impact of Linear Activation on Convolution Networks

社区洞察

其他会员也浏览了

Machine Learning Guide for Petroleum Professionals: Part 4

What Is Polynomial Regression in Machine Learning?

Machine Learning Guide for Petroleum Professionals: Part 2

Machine Learning 7:'Classification' Day 3

Linear Regression in Machine Learning

?Naive Bayes Algorithm - Explained??

Machine Learning : 'Regression' - Day 2

Violating Linear Regression Assumptions: A guide on what not to do

Top 10 Activation Function's Advantages & Disadvantages