A relevance model for a data warehouse contextualized with documents

作者：

Highlights：

•

摘要

This paper presents a relevance model to rank the facts of a data warehouse that are described in a set of documents retrieved with an information retrieval (IR) query. The model is based in language modeling and relevance modeling techniques. We estimate the relevance of the facts by the probability of finding their dimensions values and the query keywords in the documents that are relevant to the query. The model is the core of the so-called contextualized warehouse, which is a new kind of decision support system that combines structured data sources and document collections. The paper evaluates the relevance model with the Wall Street Journal (WSJ) TREC test subcollection and a self-constructed fact database.

论文关键词：Relevance-based language model,Data warehouse,Text-rich document collection

论文评审过程：Received 24 July 2008, Revised 17 October 2008, Accepted 9 November 2008, Available online 9 January 2009.

论文官网地址：https://doi.org/10.1016/j.ipm.2008.11.001