Efficient colossal pattern mining in high dimensional datasets

作者:

Highlights:

摘要

‘Frequent pattern mining’ is considered as an important data mining problem which has been extensively studied over the last decade. There are a large number of algorithms which have been developed for frequent pattern mining on a traditional commercial dataset which usually contains a huge number of transactions besides a small number of items in each transaction. The advent of bioinformatics contributed to the development of new form of datasets – called high dimensional – which are characterized by small number of transactions and large number of items in each transaction. The running time of traditional algorithms increases exponentially with increasing average transaction length, thus these algorithms cannot be suitable for the high dimensional datasets. On the other hand, the mining algorithms on high dimensional datasets create a very large output set as result which includes small and mid-size frequent patterns which do not bear any useful information for scientists. Colossal pattern mining is described as a solution to reduce the amount of output set of mining patterns. Due to ignoring the mining of the small and mid-sized patterns, mining process speed is increased in colossal patterns mining algorithms. Therefore, only very large (colossal) patterns are extracted and mined in this approach. In this paper we represent an efficient vertical bottom up method to conduct mining of frequent colossal patterns in high dimensional datasets. In our algorithm, we use a bit matrix to compress the dataset and make it easy to use in mining process. Our experimental result shows that our algorithm attains very good mining efficiencies on various input datasets. Furthermore, our performance study shows that this algorithm outperforms substantially the best former algorithms.

论文关键词:Frequent patterns mining,Colossal patterns,Bottom-up mining,Bit matrix,High dimensional dataset

论文评审过程:Received 18 June 2011, Revised 4 March 2012, Accepted 4 March 2012, Available online 15 March 2012.

论文官网地址:https://doi.org/10.1016/j.knosys.2012.03.003