A co-training framework for searching XML documents

摘要

In this paper, we study the use of XML tagged keywords (or simply key-tags) to search an XML fragment in a collection of XML documents. We present techniques that are able to employ users’ evaluations as feedback and then to generate an adaptive ranked list of XML fragments as the search results. First, we extend the vector space model as a basis to search XML fragments. The model examines the relevance between the imposed key-tags and identified fragments in XML documents, and determines the ranked result as an output. Second, in order to deal with the diversified nature of XML documents, we present four XML Rankers (XRs), which have different strengths in terms of similarity, granularity, and ranking features. The XRs are specially tailored to diversified XML documents. We then evaluate the XML search effectiveness and quality for each tailored XR and propose a meta-XML ranker (MXR) comprising the four XRs. The MXR is trained via a machine learning training scheme, which we term the ranking support vector machine (RSVM) in a co-training framework (RSCF). The RSCF takes as input two sets of labelled fragments and feature vectors and then generates as output adaptive rankers in a learning process. We show empirically that, with only a small set of training XML fragments, the RSCF is able to improve after a few iterations in the learning process. Finally, we demonstrate that the RSCF-based MXR is able to bring out the strengths of the underlying XRs in order to adapt the users’ perspectives on the returned search results. By using a set of key-tag queries on a variety of XML documents, we show that the precision of the result of the RSCF-based MXR is effective.