Fast convex-hull vector machine for training on large-scale ncRNA data classification tasks

作者:

Highlights:

摘要

Support vector machine (SVM) has been becoming a provably effective tool for non-coding RNA (ncRNA) data classification. However, as the species and sizes of ncRNA sequences quickly increase, its training time becomes intolerable and even impractical for large scale data. Although many fast SVM-based classification techniques have been developed, their applicability heavily depends on the involved formulations and particularly the computational reduction of the corresponding kernel matrix. In this paper, based on the latest advance in fast two-dimensional convex hull approximation with asymptotic linear time complexity, a fast convex-hull vector machine (CHVM) is developed to achieve a breakthrough of the applicability limitation of SVM-based classification techniques and provide more choices for large-scale ncRNA data classification tasks. By projecting a dataset onto all the corresponding two-dimensional projection combinations, CHVM first extracts the boundary vectors quickly for the whole training dataset in the kernel space, and then attempts to form the convex hull vectors for the whole kernelized training set by integrating all the obtained boundary vectors. Finally, the convex hull vectors are presented as the inputs to a SVM classifier, regardless of the adopted SVM's formulation. The experimental results on three large-scale ncRNA datasets indicate that CHVM outperforms the five SVM based classifiers, random forest (RF) and back propagation neural networks (BP), especially in training time.

论文关键词:Large scale ncRNA data classification,Fast convex hull approximation,Support vector machines,Kernelization

论文评审过程:Received 3 August 2017, Revised 17 March 2018, Accepted 21 March 2018, Available online 23 March 2018, Version of Record 11 May 2018.

论文官网地址:https://doi.org/10.1016/j.knosys.2018.03.029