Integrating statistical and lexical information for recognizing textual entailments in text

作者：

Highlights：

•

摘要

Recognizing textual entailment is to infer that a given text span follows from the meaning of a given hypothesis. To have better recognition capability, it is necessary to employ deep text processing units such as syntactic parsers and semantic taggers. However, these resources are not usually available in other non-English languages. In this paper, we present a light-weight Chinese textual entailment recognition system using part-of-speech information only. We designed two different feature models from training data and employed the well-known kernel method to learn to predict testing data. One feature set abstracts the generic statistics between the text pairs, while the other set directly models lexical features based on the traditional bag-of-words model. The ability of the proposed feature models not only brings additional statistical information from their datasets but also helps to enhance the prediction capability. To validate this, we conducted the experiments on the novel benchmark corpus – NTCIR-RITE-2011. The empirical results demonstrate that our method achieves the best results in comparison to the other competitors. In terms of accuracy, our method achieves 54.77% for the NTCIR RITE MC task.

论文关键词：Textual entailment,Text mining,Natural language processing,Machine learning,Kernel methods

论文评审过程：Received 23 March 2012, Revised 5 November 2012, Accepted 23 November 2012, Available online 10 December 2012.

论文官网地址：https://doi.org/10.1016/j.knosys.2012.11.009