The Possibilistic Reward Method and a Dynamic Extension for the Multi-armed Bandit Problem: A Numerical Study

Authors

Miguel Martin, Antonio Jiménez-Martín, Alfonso Mateos

Journal Paper

https://doi.org/10.5220/0006186400750084

Publisher URL

https://www.scitepress.org

Publication date

February 2017

Different allocation strategies can be found in the literature to deal with the multi-armed bandit problem under a frequentist view or from a Bayesian perspective. In this paper, we propose a novel allocation strategy, the possibilistic reward method. First, possibilistic reward distributions are used to model the uncertainty about the arm expected rewards, which are then converted into probability distributions using a pignistic probability transformation. Finally, a simulation experiment is carried out to find out the one with the highest expected reward, which is then pulled. A parametric probability transformation of the proposed is then introduced together with a dynamic optimization, which implies that neither previous knowledge nor a simulation of the arm distributions is required. A numerical study proves that the proposed method outperforms other policies in the literature in five scenarios: a Bernoulli distribution with very low success probabilities, with success probabilit ies close to 0.5 and with success probabilities close to 0.5 and Gaussian rewards; and truncated in [0,10] Poisson and exponential distributions.

Big Data Analytics

Neural Network

Aerospace Technology

Fluid dynamics simulation

Visualization and Data Interaction

Computer graphics and simulation

Computational Models and Algorithmics

Quantum machine learning

ICT for Healthcare

Data collection and modelling

Energy Efficiency

Development of active envelopes

The Possibilistic Reward Method and a Dynamic Extension for the Multi-armed Bandit Problem: A Numerical Study

Authors

Journal Paper

Publisher URL

Publication date