SDRS: A new lossless dimensionality reduction for text corpora

作者:

Highlights:

• Need of migrating from token-based representations to synset-based ones to achieve better performance on spam filtering.

• Review of current synset-based feature reduction schemes and representations.

• Introducing SDRS feature reduction process based on the usage of NSGA-II algoritm and semantic taxonomic relations between tokens.

• Design and execute a experimental protocol to test the suitability of SDRS dimensionality reduction method.

摘要

•Need of migrating from token-based representations to synset-based ones to achieve better performance on spam filtering.•Review of current synset-based feature reduction schemes and representations.•Introducing SDRS feature reduction process based on the usage of NSGA-II algoritm and semantic taxonomic relations between tokens.•Design and execute a experimental protocol to test the suitability of SDRS dimensionality reduction method.

论文关键词:Spam filtering,Token-based representation,Synset-based representation,Semantic-based feature reduction,Multi-objective evolutionary algorithms

论文评审过程:Received 17 December 2019, Revised 10 February 2020, Accepted 14 March 2020, Available online 21 March 2020, Version of Record 21 March 2020.

论文官网地址:https://doi.org/10.1016/j.ipm.2020.102249