DFSP: a Depth-First SPelling algorithm for sequential pattern mining of biological sequences

作者:Vance Chiang-Chi Liao, Ming-Syan Chen

摘要

Scientific progress in recent years has led to the generation of huge amounts of biological data, most of which remains unanalyzed. Mining the data may provide insights into various realms of biology, such as finding co-occurring biosequences, which are essential for biological data mining and analysis. Data mining techniques like sequential pattern mining may reveal implicitly meaningful patterns among the DNA or protein sequences. If biologists hope to unlock the potential of sequential pattern mining in their field, it is necessary to move away from traditional sequential pattern mining algorithms, because they have difficulty handling a small number of items and long sequences in biological data, such as gene and protein sequences. To address the problem, we propose an approach called Depth-First SPelling (DFSP) algorithm for mining sequential patterns in biological sequences. The algorithm’s processing speed is faster than that of PrefixSpan, its leading competitor, and it is superior to other sequential pattern mining algorithms for biological sequences.

论文关键词:Sequential patterns, Pattern mining, Data mining , Bioinformatics

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10115-012-0602-x