Evaluating the influence of parameter values on the performance of random subset feature selection algorithm on scientific data

作者:

Highlights:

摘要

Random Subset Feature Selection (RSFS) is Feature Subset Selection (FSS) algorithm based on the random forest technique. This algorithm is useful for selecting relevant features from large datasets resulting from scientific experiments. The random selection process eliminates bias and offers superior performance compared to other feature selection algorithms. The performance of the RSFS algorithm, which is primarily used in data mining, depends on proper parameter selection. The RSFS algorithm parameters dummy features, stopping criteria (Delta), maximum number of iterations, and K nearest neighbor distance are used for selecting the feature subset. The resulting subset, which is a reduced dataset is subjected to further processing such as classification and, detection. This study, is based on the design of experiments approach and model the effects of parameter variation on the RSFS algorithm performance. In this study, the influence of algorithm parameters on classification accuracy is evaluated.

论文关键词:RSFS,Optimization,Regression modeling,Scientific data

论文评审过程:Received 30 August 2016, Revised 2 March 2018, Accepted 22 July 2018, Available online 6 August 2018, Version of Record 13 October 2018.

论文官网地址:https://doi.org/10.1016/j.datak.2018.07.008