A systematic mapping study for ensemble classification methods in cardiovascular disease

摘要

Ensemble methods overcome the limitations of single machine learning techniques by combining different techniques, and are employed in the quest to achieve a high level of accuracy. This approach has been investigated in various fields, one of them being that of bioinformatics. One of the most frequent applications of ensemble techniques involves research into cardiovascular diseases, which are considered the leading cause of death worldwide. The purpose of this research work is to identify the papers that investigate ensemble classification techniques applied to cardiology diseases, and to analyse them according to nine aspects: their publication venues, the medical tasks tackled, the empirical and research types adopted, the types of ensembles proposed, the single techniques used to construct the ensembles, the validation frameworks adopted to evaluate the proposed ensembles, the tools used to build the ensembles, and the optimization methods employed for the single techniques. This paper reports the carrying out of a systematic mapping study. An extensive automatic search in four digital libraries: IEEE Xplore, ACM Digital Library, PubMed, and Scopus, followed by a study selection process, resulted in the identification of 351 papers that were used to address our mapping questions. This study found that the papers selected had been published in a large number of different resources. The medical task addressed most frequently by the selected studies was diagnosis. In addition, the experiment-based empirical type and evaluation-based research type were the most dominant approaches adopted by the selected studies. Homogeneous ensembles were the ensemble type that was developed most often in literature, while decision trees, artificial neural networks and Bayesian classifiers were the single techniques used most frequently to develop ensemble classification methods. The weighted majority and majority voting rules were adopted to obtain the final decision of the ensembles developed. With regard to evaluation frameworks, the datasets obtained from the UCI and PhysioBank repositories were those used most often to evaluate the ensemble methods, while the k-fold cross-validation method was the most frequently-employed validation technique. Several tools with which to build ensemble classifiers were identified, and the type of software adopted with the greatest frequency was open source. Finally, only a few researchers took into account the optimization of the parameter settings of either single or meta ensemble classifiers. This mapping study attempts to provide a greater insight into the application of ensemble classification methods in cardiovascular diseases. The majority of the selected papers reported positive feedback as regards the ability of ensemble methods to perform better than single methods. Further analysis is required to aggregate the evidence reported in literature.