A comparative study of Chinese named entity recognition with different segment representations

作者：Jun Pan, Chaohua Zhang, Haijun Wang, Zongda Wu

摘要

Named entity recognition (NER) is a fundamental but crucial task in the field of natural language processing and has been widely studied. Nevertheless, little attention has been given to the segment representation (SR) schemes used to map multi-token entities into categories in Chinese NER. To address this issue, in this paper, we explore and compare the impact of using different SR schemes on Chinese NER. Our experiments are conducted on four benchmark Chinese NER datasets extended with labels to include seven well-known SR schemes: IO, IOB2, IOE2, IOBES, BI, IE, and BIES. Moreover, all seven SR schemes are investigated via two sets of classifiers: machine learning-based and neural network-based classifiers. The experimental results demonstrate that the proper selection of the best SR scheme is a complicated problem that depends on various factors, such as corpus size, corpus distribution, and the chosen classifier. We also provide a comparative analysis of the time consumption of each classifier in different SR schemes and discuss the impacts of using different SR schemes on NER in Chinese and other languages.

论文关键词：Named entity recognition, Segment representation, Machine learning, Neural network

论文评审过程：

论文官网地址：https://doi.org/10.1007/s10489-022-03274-0