Estimation of incomplete values in heterogeneous attribute large datasets using discretized Bayesian max–min ant colony optimization

作者:Sivaraj Rajappan, DeviPriya Rangasamy

摘要

The size of datasets is becoming larger nowadays and missing values in such datasets pose serious threat to data analysts. Although various techniques have been developed by researchers to handle missing values in different kinds of datasets, there is not much effort to deal with the missing values in mixed attributes in large datasets. This paper has proposed novel strategies for dealing with this issue. The significant attributes (covariates) required for imputation are first selected using gain ratio measure to decrease the computational complexity. Since analysis of continuous attributes in imputation process is complex, they are first discretized using a novel methodology called Bayesian classifier-based discretization. Then, missing values in them are imputed using Bayesian max–min ant colony optimization algorithm which hybridizes ACO with Bayesian principles. The local search technique is also introduced in ACO implementation to improve its exploitative capability. The proposed methodology is implemented in real datasets with different missing rates ranging from 5 to 50% and from the experimental results, it is observed that the proposed discretization and imputation algorithms produce better results than the existing methods.

论文关键词:Ant colony optimization, Bayesian principles, Ignorable missingness, Non-ignorable missingness, Discretization

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10115-017-1123-4