Online linear optimization and adaptive routing

作者:

Highlights:

摘要

This paper studies an online linear optimization problem generalizing the multi-armed bandit problem. Motivated primarily by the task of designing adaptive routing algorithms for overlay networks, we present two randomized online algorithms for selecting a sequence of routing paths in a network with unknown edge delays varying adversarially over time. In contrast with earlier work on this problem, we assume that the only feedback after choosing such a path is the total end-to-end delay of the selected path. We present two algorithms whose regret is sublinear in the number of trials and polynomial in the size of the network. The first of these algorithms generalizes to solve any online linear optimization problem, given an oracle for optimizing linear functions over the set of strategies; our work may thus be interpreted as a general-purpose reduction from offline to online linear optimization. A key element of this algorithm is the notion of a barycentric spanner, a special type of basis for the vector space of strategies which allows any feasible strategy to be expressed as a linear combination of basis vectors using bounded coefficients.We also present a second algorithm for the online shortest path problem, which solves the problem using a chain of online decision oracles, one at each node of the graph. This has several advantages over the online linear optimization approach. First, it is effective against an adaptive adversary, whereas our linear optimization algorithm assumes an oblivious adversary. Second, even in the case of an oblivious adversary, the second algorithm performs slightly better than the first, as measured by their additive regret.

论文关键词:Adaptive routing,Multi-armed bandit problems,Online learning,Online optimization

论文评审过程:Received 1 May 2005, Revised 1 July 2006, Available online 25 April 2007.

论文官网地址:https://doi.org/10.1016/j.jcss.2007.04.016