Finding key attribute subset in dataset for outlier detection

作者：

Highlights：

•

摘要

Detection of outlier from high dimensional dataset have found important applications in many fields, yet the unexpected time consumption is likely to hinder its practical use. Thus, it makes sense to build an efficient method for finding meaningful outliers and analyzing their intentional knowledge. In this paper, we utilize the concept of rough set to construct a method for outlying reduction, based on an outlier detection and analysis system. By defining outlying partition similarity, we can mine outliers on the key attribute subset rather than on the full dimensional attribute set of dataset, as long as the similarity between outlying partitions produced on them is large enough. For this purpose, we propose a novel method for finding the key attribute subset in dataset, which starts by seeking all outliers on the full attribute set, and then searches through all outlying attribute subsets for these points. After that, it turns out to be able to determine the key attribute subset in accordance with the similarity between outlying partitions. By experiments, we show that our method allows more efficient seeking of key attribute subset than the previous methods, thereby improving the feasibility of outlier detection.

论文关键词：Outlier detection,Key attribute subset,Outlying reduction,Data mining,High dimensional dataset

论文评审过程：Received 22 March 2010, Revised 10 September 2010, Accepted 10 September 2010, Available online 19 September 2010.

论文官网地址：https://doi.org/10.1016/j.knosys.2010.09.003