Corpus-based semantic role approach in information retrieval

作者：

Highlights：

•

摘要

In this paper, a method to determine the semantic role for the constituents of a sentence is presented. This method, named SemRol, is a corpus-based approach that uses two different statistical models, conditional Maximum Entropy (ME) Probability Models and the TiMBL program, a Memory-based Learning. It consists of three phases that make use of features using words, lemmas, PoS tags and shallow parsing information. Our method introduces a new phase in the Semantic Role Labeling task which has usually been approached as a two phase procedure consisting of recognition and labeling arguments. From our point of view, firstly the sense of the verbs in the sentences must be disambiguated. That is why depending on the sense of the verb a different set of roles must be considered. Regarding the labeling arguments phase, a tuning procedure is presented. As a result of this procedure one of the best sets of features for the labeling arguments task is detected. With this set, that is different for TiMBL and ME, precisions of 76.71% for TiMBL or 70.55% for ME, are obtained. Furthermore, the semantic role information provided by our SemRol method could be used as an extension of Information Retrieval or Question Answering systems. We propose using this semantic information as an extension of an Information Retrieval system in order to reduce the number of documents or passages retrieved by the system.

论文关键词：Semantic roles,Information Retrieval systems,Corpus-based methods,Feature Selection Procedure,Word sense disambiguation,Shallow parsing,PoS tag,Lemma

论文评审过程：Received 20 June 2006, Accepted 20 June 2006, Available online 17 July 2006.

论文官网地址：https://doi.org/10.1016/j.datak.2006.06.010