A bio-inspired application of natural language processing: A case study in extracting multiword expression

作者:

Highlights:

摘要

For the multiword expression (MWE) extraction, the multiple sequence alignment (MSA) is proposed on the motivation of gene recognition. Because textual sequence is similar to gene sequence in pattern analysis. This MSA technique is combined with error-driven rules, with the improved efficiency beyond the traditional methods. It provides a guarantee for the MWE recall. It uses the dynamic programming method to prevent candidates from combinational explosion, and provides a global solution for pattern extraction instead of sub-pattern redundancy. Consequently, it has accurate measures for flexible patterns. In experiment, some advanced statistical measures are performed for ranking candidates. In the comparison experiment, the MSA approach achieved better results.

论文关键词:Text mining,Multiword expression,Multiple sequence alignment,Error driven rule

论文评审过程:Available online 12 June 2008.

论文官网地址:https://doi.org/10.1016/j.eswa.2008.05.046