Website categorization: A formal approach and robustness analysis in the case of e-commerce detection

作者:

Highlights:

• Robust formal approach to website categorization based on web mining and classification.

• Entirely automated procedure using a computationally viable pipeline.

• Application to an important case: the detection of e-commerce in corporate websites.

• Uses machine learning and dictionaries, hence applicable in other contexts or languages.

• Analysis of the robustness w.r.t. the presence of misclassified training records.

摘要

•Robust formal approach to website categorization based on web mining and classification.•Entirely automated procedure using a computationally viable pipeline.•Application to an important case: the detection of e-commerce in corporate websites.•Uses machine learning and dictionaries, hence applicable in other contexts or languages.•Analysis of the robustness w.r.t. the presence of misclassified training records.

论文关键词:Classification,Machine learning,E-commerce,Feature engineering,Text mining,Surveys

论文评审过程:Received 17 April 2019, Revised 9 September 2019, Accepted 1 October 2019, Available online 4 October 2019, Version of Record 18 October 2019.

论文官网地址:https://doi.org/10.1016/j.eswa.2019.113001