Inferring the semantic properties of sentences by mining syntactic parse trees

作者：

Highlights：

•

摘要

We extend the mechanism of logical generalization toward syntactic parse trees and attempt to detect semantic signals unobservable in the level of keywords. Generalization from a syntactic parse tree as a measure of syntactic similarity is defined by the obtained set of maximum common sub-trees and is performed at the level of paragraphs, sentences, phrases and individual words. We analyze the semantic features of this similarity measure and compare it with the semantics of traditional anti-unification of terms. Nearest-Neighbor machine learning is then applied to relate the sentence to a semantic class.By using a syntactic parse tree-based similarity measure instead of the bag-of-words and keyword frequency approaches, we expect to detect a subtle difference between semantic classes that is otherwise unobservable. The proposed approach is evaluated in three distinct domains in which a lack of semantic information makes the classification of sentences rather difficult. We conclude that implicit indications of semantic classes can be extracted from syntactic structures.

论文关键词：Machine learning,Constituency parse tree,Search re-ranking

论文评审过程：Received 11 May 2010, Revised 22 July 2012, Accepted 22 July 2012, Available online 28 July 2012.

论文官网地址：https://doi.org/10.1016/j.datak.2012.07.003