Extended pre-processing pipeline for text classification: On the role of meta-feature representations, sparsification and selective sampling

作者：

Highlights：

• We propose and orchestrate new pre-processing steps for text classification pipelines.

• We explore meta-feature representations, sparsification and selective sampling.

• We provide thorough evaluations of the trade-offs between costs and effectiveness.

• Our final representations are more effective than word embeddings (up to 46%).

• Our processes induce large reductions in computational costs and memory consumption.

摘要

•We propose and orchestrate new pre-processing steps for text classification pipelines.•We explore meta-feature representations, sparsification and selective sampling.•We provide thorough evaluations of the trade-offs between costs and effectiveness.•Our final representations are more effective than word embeddings (up to 46%).•Our processes induce large reductions in computational costs and memory consumption.

论文关键词：Text classification pipelines,Pre-processing,Meta-features,Selective sampling,Sparsification,Experimental evaluation

论文评审过程：Received 13 December 2019, Revised 20 March 2020, Accepted 4 April 2020, Available online 29 April 2020, Version of Record 29 April 2020.

论文官网地址：https://doi.org/10.1016/j.ipm.2020.102263