ALADDIN
CENTER Carnegie Mellon UniversityCarnegie Mellon Computer Science DepartmentSchool of Computer Science
Abstracts
The Joint ALADDIN/Theory/Operations Research Seminar
Aladdin
About
News and Events
Calendar
People
PROBEs
Workshops
Papers
Education
Seminars
Courses
Related Activities
Corporate
Related Links
Intranet
Contact
 
Captcha
REUs
Outreach Roadshow

LP-duality Based Algorithms for Restless Bandit Problems
Kamesh Munagala, Duke University
February 15, 2008, 3:30PM, Wean 7220

Abstract:
In numerous areas of operations research and artificial intelligence, we face a problem common to gamblers choosing slot machines (or bandits) in a casino: whether to try new machines or keep playing the one we know and hope for the best! More generally, we face the trade-off between exploration and exploitation: between learning from and adapting to a stochastic system and exploiting our current best-knowledge. A model that captures this trade-off is the celebrated Multi-arm Bandit Problem, which was formulated and solved many decades ago, and has since then become a fundamental tool in decision theory. However, in many stochastic applications, we cannot use the basic multi-arm bandit model, but must consider a generalization called the Restless Bandit Problem. Although this problem has been extensively studied, little positive progress has been made. In fact, it has been proven that in its ultimate generality, the restless bandit problem is PSPACE-Hard to approximate to any non-trivial factor.

In this talk, we derive a 2-approximation algorithm for a general subclass of the Restless Bandit Problem, in which the state-transition probability for each bandit is "separable" into a constant probability matrix and a vector of arbitrary monotone functions of time. The monotonicity models increasing levels of uncertainty as time-since-last-observation increases, which occurs in many applications such as in wireless scheduling. The resulting algorithm is simple and intuitive.

To derive the result, we modify a well-known LP relaxation, take its dual, and add a "balance" condition which provides more structure to the LP. From the dual variables, we construct an index policy, and prove that it's a 2-approximation by using complementary slackness and a potential function argument. Our techniques are fairly clean, yield simple index policies with small approximation factors, and are applicable to related stochastic scheduling problems.

Joint work with Sudipto Guha, UPenn, and Peng Shi, Duke.