A characterization of hierarchical computable distance functions for data warehouse systems

作者:

Highlights:

• A characterization of hierarchical and categorical attributes

• A characterization of hierarchical computable distances

• A probabilistic model that defines the discriminant capabilities of the distance functions.

• A set of experimental results that provide an empirical evaluation of proposed model.

摘要

A data warehouse is a huge multidimensional repository used for statistical analysis of historical data. In a data warehouse events are modeled as multidimensional cubes where cells store numerical indicators while dimensions describe the events from different points of view. Dimensions are typically described at different levels of details through hierarchies of concepts. Computing the distance/similarity between two cells has several applications in this domain. In this context distance is typically based on the least common ancestor between attribute values, but the effectiveness of such distance functions varies according to the structure and to the number of the involved hierarchies. In this paper we propose a characterization of hierarchy types based on their structure and expressiveness, we provide a characterization of the different types of distance functions and we verify their effectiveness on different types of hierarchies in terms of their intrinsic discriminant capacity.

论文关键词:Testing,Similarity measures,Hierarchical data,Categorical data

论文评审过程:Received 22 August 2013, Revised 10 March 2014, Accepted 30 March 2014, Available online 12 April 2014.

论文官网地址:https://doi.org/10.1016/j.dss.2014.03.011