Reliable automatic word spacing using a space insertion and correction model based on neural networks in Korean

作者:

Highlights:

摘要

Automatic word spacing in Korean remains a significant task in natural language processing owing to the extremely complex word spacing rules involved. Most previous models remove all spaces in input sentences and insert new spaces in the modified input sentences. If input sentences include only a small number of spacing errors, the previous models often return sentences with even more spacing errors than the input sentences because they remove the correct spaces that were typed intentionally by the users. To reduce this problem, we propose an automatic word spacing model based on a neural network that effectively uses word spacing information from input sentences. The proposed model comprises a space insertion layer and a spacing-error correction layer. Using an approach similar to previous models, the space insertion layer inserts word spaces into input sentences from which all spaces have been removed. The spacing error correction layer post-corrects the spacing errors of the space insertion model using word spacing typed by users. Because the two layers are tightly connected in the proposed model, the backpropagation flows are not blocked. As a result, the space insertion and error correction are performed simultaneously. In experiments, the proposed model outperformed all compared models on all measures on the same test data. In addition, it exhibited reliable performance (word-unit F1-measures of 94.17%∼97.87%) regardless of how many word spacing errors were present in the input sentences.

论文关键词:Automatic word spacing,Word spacing information typed by users,Space insertion model,Spacing-error correction model,00-01,99-00

论文评审过程:Received 26 July 2018, Revised 14 February 2019, Accepted 15 February 2019, Available online 27 February 2019, Version of Record 27 February 2019.

论文官网地址:https://doi.org/10.1016/j.ipm.2019.02.015