Revisiting transductive support vector machines with margin distribution embedding

作者:

Highlights:

摘要

Transductive Support Vector Machine (TSVM) is one of the most successful classification methods for semi-supervised learning (SSL). One challenge of TSVMs is that the performance degeneration is caused by unlabeled examples that are obscure or misleading for the discovery of the underlying distribution. To address this problem, we disclose the underlying data distribution and describe the margin distribution of TSVMs as the first-order (margin mean) and second-order (margin variance) statistics of examples. Since the optimization problems of TSVMs are not convex, we utilize the concave-convex procedure and variation of stochastic variance reduced gradient methods to solve them. Particularly, we propose two specific algorithms to optimize the margin distribution of TSVM via maximizing the margin mean and minimizing the margin variance simultaneously, which the generalization ability is improved and being robust to the outliers and noise. In addition, we derive a bound on the expectation of error according to the leave-one-out cross-validation estimate, which is an unbiased estimate of the probability of test error. Finally, to validate the effectiveness of the proposed method, extensive experiments are conducted on diversity datasets. The experimental results demonstrate that the performance of proposed algorithms are superior to the existing TSVMs and other semi-supervised learning methods.

论文关键词:Semi-supervised learning,Transductive support vector machine,Margin distribution,Classification

论文评审过程:Received 10 June 2017, Revised 6 April 2018, Accepted 8 April 2018, Available online 12 April 2018, Version of Record 12 May 2018.

论文官网地址:https://doi.org/10.1016/j.knosys.2018.04.017