Extended pre-processing pipeline for text classification: On the role of meta-feature representations, sparsification and selective sampling
作者:
Highlights:
• We propose and orchestrate new pre-processing steps for text classification pipelines.
• We explore meta-feature representations, sparsification and selective sampling.
• We provide thorough evaluations of the trade-offs between costs and effectiveness.
• Our final representations are more effective than word embeddings (up to 46%).
• Our processes induce large reductions in computational costs and memory consumption.
摘要
•We propose and orchestrate new pre-processing steps for text classification pipelines.•We explore meta-feature representations, sparsification and selective sampling.•We provide thorough evaluations of the trade-offs between costs and effectiveness.•Our final representations are more effective than word embeddings (up to 46%).•Our processes induce large reductions in computational costs and memory consumption.
论文关键词:Text classification pipelines,Pre-processing,Meta-features,Selective sampling,Sparsification,Experimental evaluation
论文评审过程:Received 13 December 2019, Revised 20 March 2020, Accepted 4 April 2020, Available online 29 April 2020, Version of Record 29 April 2020.
论文官网地址:https://doi.org/10.1016/j.ipm.2020.102263