A comparative study of automated legal text classification using random forests and deep learning
作者:
Highlights:
• We apply domain concepts to legal text classification based on PCA and RFs to demonstrate its powerful ability for legal text.
• We conduct a systematic comparative study on a legal area classification dataset by using domain concept-based machine learning algorithms and pre-trained word embeddings-based deep learning algorithms.
• We propose a framework, which includes the strategy for selecting machine learning models in terms of four indicators: data, performance, computation, and interpretation.
摘要
•We apply domain concepts to legal text classification based on PCA and RFs to demonstrate its powerful ability for legal text.•We conduct a systematic comparative study on a legal area classification dataset by using domain concept-based machine learning algorithms and pre-trained word embeddings-based deep learning algorithms.•We propose a framework, which includes the strategy for selecting machine learning models in terms of four indicators: data, performance, computation, and interpretation.
论文关键词:Legal text classification,Machine learning,Deep learning,Domain concept,Word embedding,Random forests
论文评审过程:Received 1 March 2021, Revised 10 July 2021, Accepted 17 October 2021, Available online 17 November 2021, Version of Record 17 November 2021.
论文官网地址:https://doi.org/10.1016/j.ipm.2021.102798