Fast stochastic second-order method logarithmic in condition number

Highlights：

• Contrast to previous work which approximates Hessian matrix by sketched or subsampled Hessian directly, we use the sketched or subsampled Hessian matrix as preconditioner and get the direction vector by preconditioned conjugate gradient method. Accordingly, we propose a novel stochastic second-order algorithm called Preconditioned Newton Conjugate Gradient with Sketched Hessian (PNCG). This strategy makes our algorithm almost independent on the condition number.

• We numerically demonstrate the effectiveness and robustness of our algorithms in widely used machine learning algorithm Ridge Logistic Regression on several real datasets with different condition number. Our algorithms show good performance compared with existing second order methods in all cases, especially when condition number is large.

摘要

•Optimization is an important issue in machine learning because many machine learning models are reformulated as optimization problems. Different kinds of machine learning algorithms mainly focus on minimizing their emiprical loss like deep learning, logisitic regression, and support vector machine. Because data is explosively growing, it is challenging to deal with a large-scale optimization problem. We propose a fast and most condition number free algorithm for train machine learning algorithms. This algorithm has several advantages.•Contrast to previous work which approximates Hessian matrix by sketched or subsampled Hessian directly, we use the sketched or subsampled Hessian matrix as preconditioner and get the direction vector by preconditioned conjugate gradient method. Accordingly, we propose a novel stochastic second-order algorithm called Preconditioned Newton Conjugate Gradient with Sketched Hessian (PNCG). This strategy makes our algorithm almost independent on the condition number.•We numerically demonstrate the effectiveness and robustness of our algorithms in widely used machine learning algorithm Ridge Logistic Regression on several real datasets with different condition number. Our algorithms show good performance compared with existing second order methods in all cases, especially when condition number is large.