登录查看更多内容

Types of bandits

Asif Syed

Associate Data Scientist @ ISN Software Corporation

发布日期: 2022年2月13日

In my previous post, I have tried to give an introduction to the concept behind the formulation of a multi-armed bandit problem with a fictional story of a gambling-addicted statistician. These problems fall under the umbrella of 'stochastic scheduling', which deals with the task of scheduling several jobs that are of stochastic (random) nature in terms of their attributes like processing times or due dates etc. Even in the case of bandit machines, their reward distribution is assumed to be of random nature despite a finite probability distribution, which makes it a stochastic bandit machine problem.

Also, the problem we discussed is a basic scenario with two primary assumptions:

1) we do not have prior information about the reward distributions for any of these machines.

2) rewards from each of these machines are independent and identically distributed (IID).

Independent in the sense that the reward from a machine during a particular round is not influenced by the rewards from the same machine in previous rounds or the rewards dispensed by other machines. Identically distributed in the sense that despite the random nature of the rewards of a machine, they belong to a probability distribution like Bernoulli's or normal distribution and the parameters of these distributions (success rate in case of Bernoulli's distribution, mean and standard deviation in case of normal distribution) are assumed to remain constant for a machine throughout the experiment.

To understand the above assumptions with an example, let us assume that there is a machine with a 60% probability of dispensing the reward among the five bandit machines. The first assumption says that we have no clue about this probability rate and the second assumption indicates that this machine will dispense rewards with the same probability distribution (In this case, a Bernoulli's distribution since the outcomes are either a reward or a no reward), with the constant success rate of 60% for the entire duration of this experiment.

领英推荐

Ordinary Least Squares

Marcin Majka 5 个月前

How to Deal with Multicollinearity?

Mohammad Arshad 2 年前

Understanding statistical inference

Ajit Jaokar 8 个月前

Another implicit assumption of this problem is that once a machine is operated at any round of this experiment, we receive feedback only from that machine and the possible rewards had we chosen other machines at that round remain unknown. Tweaking some of these assumptions can lead us to a few different variants of this basic multi-armed bandit problem. Let us take a brief look at them.

If we tweak the first assumption such that we have some information regarding the machine before we start experimenting on it, then we have a 'contextual bandit machine'. One of the most commonly cited examples for this scenario is when a news website has to choose one among the multiple news articles to display a headline to a reader visiting the website. The reward in this scenario would be if the reader clicks on the displayed headline to check the entire article. If the user is not a first-time reader, then there is a very good chance that the website has some prior information about the user like his/her location, interests, previously viewed articles, etc. which were collected in the form of cookies from that user's previous sessions.

Next, the assumption that we only receive feedback from the machine we chose to operate at a particular round can be modified. This primary version is termed as 'bandit feedback' on which two other adaptations called - partial feedback and full feedback were developed. To understand partial feedback, let us look at the example of an eCommerce website that is trying to sell the same product to customers at different prices (of course, a totally absurd example). In this case, different prices displayed to customers can be thought of as different machines, but the reward would be a two-way game here; this website has to make the sale at the highest price possible to maximize their profit. Suppose, if the majority of users did not buy this product at the highest price available, but a certain section of users bought this when it was pitched to them at a lower price, then this eCommerce site has 'partial' feedback upon the completion of each round of this experiment. It certainly has some information about the higher prices that did not work, the lower prices which worked, and the characteristics of the users that bought this product, but they do not have the complete information about the exact price which the customer was willing to pay for this product in the first place. Probably he/she might have happily bought this item at a price that is a bit higher than the eventual lower price which was displayed to them, which makes it a partial, but not the full feedback.

And next, you might have guessed it by now, a full feedback scenario occurs when we have total information about the rewards dispensed by all the machines at each round of the experiment, despite us getting to choose only one among them. An example of this would be when a multi-armed bandit problem is formulated in the financial sector for the task of diversifying investment portfolios to maximize returns. Suppose there are a bunch of stocks available among which we will have to choose one to invest per each day as part of this experiment, then at the end of each round i.e., at the end of each day, we have complete information about the closing price of each of these stocks despite us investing in only one option.

Three more variants of the multi-armed bandit problem can be obtained by adjusting the assumption of IID rewards. As per this assumption, the probability distribution of the rewards dispensed by a bandit machine is supposed to remain constant, but what if this distribution changes over time? Let us consider the scenario where this problem is formulated in the manufacturing sector to increase the output from a set of machines. As per the practicality of this case, as we keep exploiting the best machine to maximize the reward, it might grow weary, and slowly its return might be diminished, which would be analogous to the success rate of the bandit machine decreasing steadily after certain rounds of experiment. This scenario is called a 'restless bandit problem'.

Till now, we have looked at the scenario of a fluctuating distribution of a machine due to natural consequences, but what if the owner of the casino who realized that you are exploiting the machine hides behind it and starts messing up with the machine's distribution intentionally through some internal settings to ensure that you do not get a reward at that round. This is called an 'adversarial' bandit problem, where in addition to cracking the code of bandit machines, we also need to deal with the adversary trying to thwart our attempts. If the casino owner begins to feel that too much tinkering with the settings might make us suspicious and so he decides to limit his meddling only to a certain proportion, then we do have an adversary, but he/she is constrained from plotting against us to the full extent, this type of problem is called a 'constrained-adversarial' bandit problem.

All these alterations of the original bandit problem resulted in several applications in many diverse fields, which makes us appreciate the simple mathematical formulation of this seemingly crazy problem.

要查看或添加评论，请登录

Asif Syed的更多文章

What percentage of US adults have flown on an airplane?

2022年3月27日

What percentage of US adults have flown on an airplane?

Acclaimed American author Daniel Pink recently asked this interesting question on LinkedIn as a test for one’s skill at…
The madness of proportions

2022年3月20日

The madness of proportions

This was first posted to my blog on 20th March, 2022. This post is inspired by a case study mentioned in the book “The…
Coffee, Nicolas Cage, and nuclear power.

2022年3月20日

Coffee, Nicolas Cage, and nuclear power.

This was first posted to my blog on 20th March 2022. Imagine that you decided to drink a delightful mocha latte on a…
Life is one big bowling alley!

2022年3月18日

Life is one big bowling alley!

This was first posted to my blog on 18th March Imagine you went to a bowling alley by paying some fixed amount as a…
Making 'better' decisions

2022年3月9日

Making 'better' decisions

Founding and operating an organization or a business is a very strenuous task. If such companies are of small scale…
Why's it so convoluted?

2022年3月6日

Why's it so convoluted?

This was first posted to my blog on 05th March 2022. In my previous article, I have talked about the basic idea behind…

2 条评论
Yet another neural network article!

2022年3月4日

Yet another neural network article!

This was first posted to my blog on March 1st, 2022. So, what’s a neural network in deep learning? Before we look into…
The home delivery of a roller-coaster!

2022年2月26日

The home delivery of a roller-coaster!

This was first posted to my blog on 25th Feb. One fine day in the year 1889, king Umberto I, then ruler of Italy, and…
My hatred towards dining out and the Bayesian Inference!

2022年2月20日

My hatred towards dining out and the Bayesian Inference!

This is an excerpt from this article posted to my blog on 19th Feb I am someone who loves to stay indoors and order…
The simplicity of DR. Feynman

2022年2月14日

The simplicity of DR. Feynman

This article is based on the memoir of DR. Feynman titled 'Surely you're joking, MR.

See all articles

Types of bandits

Asif Syed

Associate Data Scientist @ ISN Software Corporation

领英推荐

Asif Syed的更多文章

社区洞察

其他会员也浏览了

Understanding statistical inference

Kalman Filter: The first dive

K-Means Clustering in Machine Learning

Effective XGBoost by Matt Harrison

Understanding Vector Norms: A Comprehensive Guide to L1, L2, L∞, and Beyond...

Logic

What is Multicollinearity? A Visual Description

Correlation, causation and vector autoregressions

Kernel method in stock prices anomaly detection

Unified Convergence Analysis of Nonconvex Randomized Block Coordinate Descent Methods

领英推荐

Asif Syed的更多文章

What percentage of US adults have flown on an airplane?

The madness of proportions

Coffee, Nicolas Cage, and nuclear power.

Life is one big bowling alley!

Making 'better' decisions

Why's it so convoluted?

Yet another neural network article!

The home delivery of a roller-coaster!

My hatred towards dining out and the Bayesian Inference!

The simplicity of DR. Feynman

社区洞察

其他会员也浏览了

Understanding statistical inference

Kalman Filter: The first dive

K-Means Clustering in Machine Learning

Effective XGBoost by Matt Harrison

Understanding Vector Norms: A Comprehensive Guide to L1, L2, L∞, and Beyond...

Logic

What is Multicollinearity? A Visual Description

Correlation, causation and vector autoregressions

Kernel method in stock prices anomaly detection

Unified Convergence Analysis of Nonconvex Randomized Block Coordinate Descent Methods