Some issues about outlier detection in rough set theory

作者:

Highlights:

摘要

“One person’s noise is another person’s signal” (Knorr, E., Ng, R. (1998). Algorithms for mining distance-based outliers in large datasets. In Proceedings of the 24th VLDB conference, New York (pp. 392–403)). In recent years, much attention has been given to the problem of outlier detection, whose aim is to detect outliers – objects which behave in an unexpected way or have abnormal properties. Detecting such outliers is important for many applications such as criminal activities in electronic commerce, computer intrusion attacks, terrorist threats, agricultural pest infestations, etc. And outlier detection is critically important in the information-based society. In this paper, we discuss some issues about outlier detection in rough set theory which emerged about 20 years ago, and is nowadays a rapidly developing branch of artificial intelligence and soft computing. First, we propose a novel definition of outliers in information systems of rough set theory – sequence-based outliers. An algorithm to find such outliers in rough set theory is also given. The effectiveness of sequence-based method for outlier detection is demonstrated on two publicly available databases. Second, we introduce traditional distance-based outlier detection to rough set theory and discuss the definitions of distance metrics for distance-based outlier detection in rough set theory.

论文关键词:Outlier detection,Rough sets,Distance metric,KDD

论文评审过程:Available online 14 June 2008.

论文官网地址:https://doi.org/10.1016/j.eswa.2008.06.019