Capabilities of outlier detection schemes in large datasets, framework and methodologies

作者:Jian Tang, Zhixiang Chen, Ada Waichee Fu, David W. Cheung

摘要

Outlier detection is concerned with discovering exceptional behaviors of objects. Its theoretical principle and practical implementation lay a foundation for some important applications such as credit card fraud detection, discovering criminal behaviors in e-commerce, discovering computer intrusion, etc. In this paper, we first present a unified model for several existing outlier detection schemes, and propose a compatibility theory, which establishes a framework for describing the capabilities for various outlier formulation schemes in terms of matching users'intuitions. Under this framework, we show that the density-based scheme is more powerful than the distance-based scheme when a dataset contains patterns with diverse characteristics. The density-based scheme, however, is less effective when the patterns are of comparable densities with the outliers. We then introduce a connectivity-based scheme that improves the effectiveness of the density-based scheme when a pattern itself is of similar density as an outlier. We compare density-based and connectivity-based schemes in terms of their strengths and weaknesses, and demonstrate applications with different features where each of them is more effective than the other. Finally, connectivity-based and density-based schemes are comparatively evaluated on both real-life and synthetic datasets in terms of recall, precision, rank power and implementation-free metrics.

论文关键词:Outlier detection, Scheme capability, Distance-based outliers, Density-based outliers, Connectivity-based outliers, Performance metrics

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10115-005-0233-6