Distributing Frank–Wolfe via map-reduce

作者:Armin Moharrer, Stratis Ioannidis

摘要

Large-scale optimization problems abound in data mining and machine learning applications, and the computational challenges they pose are often addressed through parallelization. We identify structural properties under which a convex optimization problem can be massively parallelized via map-reduce operations using the Frank–Wolfe (FW) algorithm. The class of problems that can be tackled this way is quite broad and includes experimental design, AdaBoost, and projection to a convex hull. Implementing FW via map-reduce eases parallelization and deployment via commercial distributed computing frameworks. We demonstrate this by implementing FW over Spark, an engine for parallel data processing, and establish that parallelization through map-reduce yields significant performance improvements: We solve problems with 20 million variables using 350 cores in 79 min; the same operation takes 48 h when executed serially.

论文关键词:Frank–Wolfe, Distributed algorithms, Convex optimization and spark

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10115-018-1294-7