HierarchyScan: A Hierarchical Algorithm for Similarity Search in Databases Consisting of Long Sequences

作者:Chung-Sheng Li, Philip S. Yu, Vittorio Castelli

摘要

In this paper, a hierarchical algorithm, HierarchyScan, is proposed to efficiently locate one-dimensional subsequences within a collection of sequences with arbitrary length. The proposed algorithm performs correlation between the stored sequences and the template pattern in the transformed domain to identify subsequences in a scale- and phase-independent fashion. This is in contrast to those approaches based on the computation of Euclidean distance in the transformed domain. In the proposed hierarchical algorithm, the transformed domain representation of each original sequence is divided into multiple groups of coefficients. The matching is performed hierarchically from the group with the greatest filtering capability to the group with the lowest filtering capability. Only those subsequences whose maximum correlation value is higher than a predefined threshold will be selected for additional screening. This approach is compared to the sequential scanning and an order-of-magnitude speedup is observed.

论文关键词:Similarity search, template matching, time series, correlation, correlation coefficient, content-based retrieval, hierarchical search

论文评审过程:

论文官网地址:https://doi.org/10.1007/BF03325099