A co-training framework for searching XML documents

作者:

Highlights:

摘要

In this paper, we study the use of XML tagged keywords (or simply key-tags) to search an XML fragment in a collection of XML documents. We present techniques that are able to employ users’ evaluations as feedback and then to generate an adaptive ranked list of XML fragments as the search results. First, we extend the vector space model as a basis to search XML fragments. The model examines the relevance between the imposed key-tags and identified fragments in XML documents, and determines the ranked result as an output. Second, in order to deal with the diversified nature of XML documents, we present four XML Rankers (XRs), which have different strengths in terms of similarity, granularity, and ranking features. The XRs are specially tailored to diversified XML documents. We then evaluate the XML search effectiveness and quality for each tailored XR and propose a meta-XML ranker (MXR) comprising the four XRs. The MXR is trained via a machine learning training scheme, which we term the ranking support vector machine (RSVM) in a co-training framework (RSCF). The RSCF takes as input two sets of labelled fragments and feature vectors and then generates as output adaptive rankers in a learning process. We show empirically that, with only a small set of training XML fragments, the RSCF is able to improve after a few iterations in the learning process. Finally, we demonstrate that the RSCF-based MXR is able to bring out the strengths of the underlying XRs in order to adapt the users’ perspectives on the returned search results. By using a set of key-tag queries on a variety of XML documents, we show that the precision of the result of the RSCF-based MXR is effective.

论文关键词:XML search query,Key-tag searching,Ranking XML,Meta XML ranker,Co-training

论文评审过程:Received 18 January 2005, Revised 8 September 2005, Accepted 23 January 2006, Available online 17 February 2006.

论文官网地址:https://doi.org/10.1016/j.is.2006.01.001