LP-duality Based Algorithms for Restless Bandit Problems
Kamesh Munagala, Duke University
February 15, 2008, 3:30PM, Wean 7220
Abstract:
In numerous areas of operations research and artificial intelligence,
we face a problem common to gamblers choosing slot machines (or
bandits) in a casino: whether to try new machines or keep playing the
one we know and hope for the best! More generally, we face the
trade-off between exploration and exploitation: between learning from
and adapting to a stochastic system and exploiting our current
best-knowledge. A model that captures this trade-off is the celebrated
Multi-arm Bandit Problem, which was formulated and solved many decades
ago, and has since then become a fundamental tool in decision theory.
However, in many stochastic applications, we cannot use the basic
multi-arm bandit model, but must consider a generalization called the
Restless Bandit Problem. Although this problem has been extensively
studied, little positive progress has been made. In fact, it has been
proven that in its ultimate generality, the restless bandit problem is
PSPACE-Hard to approximate to any non-trivial factor.
In this talk, we derive a 2-approximation algorithm for a general
subclass of the Restless Bandit Problem, in which the state-transition
probability for each bandit is "separable" into a constant probability
matrix and a vector of arbitrary monotone functions of time. The
monotonicity models increasing levels of uncertainty as
time-since-last-observation increases, which occurs in many
applications such as in wireless scheduling. The resulting algorithm
is simple and intuitive.
To derive the result, we modify a well-known LP relaxation, take its
dual, and add a "balance" condition which provides more structure to
the LP. From the dual variables, we construct an index policy, and
prove that it's a 2-approximation by using complementary slackness and
a potential function argument. Our techniques are fairly clean, yield
simple index policies with small approximation factors, and are
applicable to related stochastic scheduling problems.
Joint work with Sudipto Guha, UPenn, and Peng Shi, Duke.