NetDPO: (delta, gamma)-approximate pattern matching with gap constraints under one-off condition

作者:Yan Li, Lei Yu, Jing Liu, Lei Guo, Youxi Wu, Xindong Wu

摘要

Approximate pattern matching not only is more general than exact pattern matching, but also allows some data noise. Most of them adopt the Hamming distance to measure similarity, which indicates the number of different characters in two sequences, but it cannot reflect the approximation between two characters. This paper addresses the approximate pattern matching with a local distance no larger than δ and a global distance no larger than γ, which is named Delta and gamma Pattern matching with gap constraints under One-off condition (DPO). First, we show that the problem is an NP-Hard problem. Therefore, we construct a heuristic algorithm named approximate Nettree for DPO (NetDPO), which transforms the problem into an approximate Nettree based on δ distance which is a specially designed data structure. Then, NetDPO calculates the number of paths that reach the roots within γ distance. To find the maximal occurrences, we employ the rightmost parent strategy and the optimal parent strategy to select the better occurrence which can minimize the influence after removing the occurrence. Iterate this process until there are no occurrences. Finally, we analyze the time and space complexities of NetDPO. Extensive experimental results verify the superiority of the proposed algorithm.

论文关键词:Pattern matching, Pattern discovery, Approximate pattern matching, Gap constraint, Nettree structure

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10489-021-03000-2