Optimal bandwidth selection for re-substitution entropy estimation

作者:

Highlights:

摘要

A new fusion approach of selecting an optimal bandwidth for re-substitution entropy estimator (RE) is presented in this study. When approximating the continuous entropy with density estimation, two types of errors will be generated: entropy estimation error (type-I error) and density estimation error (type-II error). These two errors are all strongly dependent on the undetermined bandwidths. Firstly, an experimental conclusion based on 24 typical probability distributions is demonstrated that there is some inconsistency between the optimal bandwidths associated with these two errors. Secondly, two different error measures for type-I and type-II errors are derived. A trade-off between type-I and type-II errors is a fundamental and potential property of our proposed method called REI+II. Thus, the fusion of these two errors is conducted and an optimal bandwidth for REI+II is solved. Finally, the experimental comparisons are carried out to verify the estimation performance of our proposed strategy. The discretization method is deemed to be the necessary preprocessing technology for the calculation of continuous entropy traditionally. So, the nine mostly used unsupervised discretization methods are introduced to give comparison of their computational performances with that of REI+II. And, five most popular estimators for entropy approximation are also plugged into our comparisons: splitting data estimator (SDE), cross-validation estimator (CVE), m-spacing estimator (mSE), mn-spacing estimator (mnSE), and nearest neighbor distance estimator (NNDE). The simulation studies on 24 different typical density distributions show that REI+II can obtain the better estimation performance among the involved methods. Meanwhile, the estimation behaviors of different entropy estimation methods are also revealed based on the comparative results. The empirical analysis demonstrates that REI+II is more insensitive to data and a better generalizable way for the estimation of continuous entropy. REI+II makes it possible for a handy optimal bandwidth to be derived from a given dataset.

论文关键词:Information entropy,Re-substitution entropy estimator,Probability density estimation,Optimal bandwidth,Integrated mean square error,Discretization

论文评审过程:Available online 26 October 2012.

论文官网地址:https://doi.org/10.1016/j.amc.2012.08.056