AUTHOR=Kim Song-Ju , Takahashi Taiki 

TITLE=Performance in Multi-Armed Bandit Tasks in Relation to Ambiguity-Preference Within a Learning Algorithm

JOURNAL=Frontiers in Applied Mathematics and Statistics

VOLUME=Volume 4 - 2018

YEAR=2018

URL=https://www.frontiersin.org/journals/applied-mathematics-and-statistics/articles/10.3389/fams.2018.00027

DOI=10.3389/fams.2018.00027

ISSN=2297-4687

ABSTRACT=Ellsberg paradox in decision theory posits that people will inevitably choose a known probability of winning over an unknown probability of winning even if the known probability is low. One of prevailing theories which addresses the Ellsberg paradox is known as ’ambiguity-aversion’. In this study, we investigate the properties of ambiguity-aversion in four distinct types of reinforcement learning algorithms: ucb1-tuned, modified ucb1-tuned, softmax, and tug-of-war. We take as our sample a scenario in which there are two slot machines and each machine dispenses a coin according to a probability that is generated by its own probability density function (PDF). We then investigate the choices of a learning algorithm in such multi-armed bandit tasks. There are different reactions in multi-armed bandit tasks, depending on the ambiguity-preference in the learning algorithms. Notably, we discovered clear performance enhancement related to ambiguity-preference in a learning algorithm. Although this study does not directly address the issue of ambiguity-aversion theory highlighted in Ellsberg paradox, the differences between different learning algorithms suggests that there is room for further study regarding the Ellsberg paradox and decision theory.