Multidimensional analysis model for a document warehouse that includes textual measures

作者:

Highlights:

• A new multidimensional model that integrates text based on three textual measures.

• The granularity of proposed model is at document level.

• The model allows getting topics according to dimensions implied in the query.

• The model allows getting documents according to dimensions implied in the query.

• The model allows getting the words or terms for each topic.

摘要

Data warehouses and On-Line Analytical Processing tools, OLAP, together permit a multi-dimensional analysis of structured data information. However, as business systems are increasingly required to handle substantial quantities of unstructured textual information, the need arises for an effective and similar means of analysis. To manage unstructured text data stored in data warehouses, a new multi-dimensional analysis model is proposed that includes textual measures as well as a topic hierarchy. In this model, the textual measures that associate the topics with the text documents are generated by Probabilistic Latent Semantic Analysis, while the hierarchy is created automatically using a clustering algorithm. Documents are then able to be queried using OLAP tools. The model was evaluated from two viewpoints — query execution time and user satisfaction. Evaluation of execution time was carried out on scientific articles using two query types and user satisfaction (with query time and ease of use) using statistical frequency and multivariate analyses. Encouraging observations included that as the number of documents increases, query time increases as a lineal, rather than exponential tendency. In addition, the model gained an increasing acceptance with use, while the visualization of the model was also well received by users.

论文关键词:Document warehouse,OLAP,Textual measures,Text warehouse

论文评审过程:Received 15 October 2012, Revised 6 February 2015, Accepted 8 February 2015, Available online 16 February 2015.

论文官网地址:https://doi.org/10.1016/j.dss.2015.02.008