Protein fold recognition based on sparse representation based classification

Highlights：

• In this paper, we propose an effective method to solve the problem of protein fold recognition. Sparse representation based classification (SRC) method has been widely used in the computer vision, face recognition, and so on. In the SRC method, a test sample is expressed by training samples of all classes via a linear representation. We apply SRC method to solve the problem of protein fold recognition. The highlights of the paper are as follows,

摘要

In this paper, we propose an effective method to solve the problem of protein fold recognition. Sparse representation based classification (SRC) method has been widely used in the computer vision, face recognition, and so on. In the SRC method, a test sample is expressed by training samples of all classes via a linear representation. We apply SRC method to solve the problem of protein fold recognition. The highlights of the paper are as follows,•SRC method is an effective method. Compared with the widely used basic classifier, such as libsvm, KNN, random forest, Naïve Bayes, the accuracy of SRC is improved by 1%–14%. The results indicate that SRC is an efficient approach for protein fold classification.•We combine some traditional features as a new feature, which include ACC, Bigram, amino acid composition, second structure and some physicochemical features, and the new feature is fed into the SRC for classification. We call this new predictor as MF-SRC. Evaluated on three widely used benchmark datasets, which include DD dataset, EDD dataset and TG dataset, MF-SRC outperforms all the other compared methods, indicating that SRC is suitable for fold recognition, and the proposed method MF-SRC would be a useful tool for computational prediction of fold types.•SVM classifiers have quite promising results in protein fold recognition. The kernel function in the SVM can be regarded as the connection with the prior knowledge of source data and the discriminating features. Selecting the best kernel among the existing candidates plays the most critical role in applying SVM. Unfortunately, it is not an easy task to find the suitable kernel functions. In contrast, MF-SRC utilizes the sparse representation of all training samples, instead of getting related models or features that can be used for classification. As a result, the MF-SRC provides a succinct representation method for classification.In conclusion, the highlights of the paper have three points, and the experiment results indicate that the MF-SRC would be a useful tool for computational prediction of fold types.