A novel semantic smoothing kernel for text classification with class-based weighting

作者:

Highlights:

摘要

In this study, we propose a novel methodology to build a semantic smoothing kernel to use with Support Vector Machines (SVM) for text classification. The suggested approach is based on two key concepts; class-based term weighting and changing the orthogonality of vector space. A class-based term weighting methodology is used for transformation of documents from the original space to the feature space. This class-based weighting basically groups terms based on their importance for each class and consequently smooths the representation of documents. This is accomplished by changing the orthogonality of the Vector Space Model (VSM) with introducing class-based dependencies between terms. As a result, on the extreme case, two documents can be seen as similar even if they do not share any terms but their terms are similarly weighted for a particular class. The resulting semantic kernel can directly make use of class information in extracting semantic information between terms, therefore it can be considered as a supervised kernel. For our experimental evaluation, we analyze the performance of the suggested kernel with a large number of experiments on benchmark textual datasets and present results with respect to varying experimental conditions. To the best of our knowledge, this is the first study to use class-based term weighting in order to build a supervised semantic kernel for SVM. We compare our results with kernels that are commonly used in SVM such as linear kernel, polynomial kernel, Radial Basis Function (RBF) kernel and with several corpus-based semantic kernels. According to our experimental results the proposed method favorably improves classification accuracy over linear kernel and several corpus-based semantic kernels in terms of both accuracy and speed.

论文关键词:Support vector machines,Text classification,Semantic kernel,Semantic smoothing kernel,Class-based term weighting

论文评审过程:Received 24 December 2014, Revised 28 May 2015, Accepted 10 July 2015, Available online 17 July 2015, Version of Record 19 October 2015.

论文官网地址:https://doi.org/10.1016/j.knosys.2015.07.008