Combining multiple feature selection methods for stock prediction: Union, intersection, and multi-intersection approaches

作者：

Highlights：

•

摘要

To effectively predict stock price for investors is a very important research problem. In literature, data mining techniques have been applied to stock (market) prediction. Feature selection, a pre-processing step of data mining, aims at filtering out unrepresentative variables from a given dataset for effective prediction. As using different feature selection methods will lead to different features selected and thus affect the prediction performance, the purpose of this paper is to combine multiple feature selection methods to identify more representative variables for better prediction. In particular, three well-known feature selection methods, which are Principal Component Analysis (PCA), Genetic Algorithms (GA) and decision trees (CART), are used. The combination methods to filter out unrepresentative variables are based on union, intersection, and multi-intersection strategies. For the prediction model, the back-propagation neural network is developed. Experimental results show that the intersection between PCA and GA and the multi-intersection of PCA, GA, and CART perform the best, which are of 79% and 78.98% accuracy respectively. In addition, these two combined feature selection methods filter out near 80% unrepresentative features from 85 original variables, resulting in 14 and 17 important features respectively. These variables are the important factors for stock prediction and can be used for future investment decisions.

论文关键词：Stock prediction,Feature selection,Data mining,Principal Component Analysis,Genetic algorithm,Decision trees

论文评审过程：Received 10 May 2009, Revised 4 August 2010, Accepted 17 August 2010, Available online 21 August 2010.

论文官网地址：https://doi.org/10.1016/j.dss.2010.08.028