A unified classifiability analysis framework based on meta-learner and its application in spectroscopic profiling data
作者:Yinsheng Zhang, Zhengyong Zhang, Haiyan Wang
摘要
Spectroscopic profiling data (e.g., Raman spectroscopy and mass spectroscopy), combined with machine learning, have provided a data-driven approach for discriminative tasks. In these tasks, researchers often start with simple classification models. If one model doesn’t work, they will try more sophisticated models. If all models fail, the researchers will deem the data set as “inseparable.“ This “trial-and-error” practice reveals a fundamental question: does the dataset possess the necessary statistical power for the current discriminative task? This “classifiability analysis” is an implicit and often neglected step in the data-driven pipeline. This paper aims to design a unified methodological framework for classifiability analysis. In this framework, a meta-learner model combines diversified atom metrics (e.g., Bayes error rate / irreducible error, classification accuracy, information gain / mutual information) into one unified metric (d). We have successfully used the proposed framework to analyze a spectroscopic profiling dataset to discriminate vintage liquors of different ages. A significant difference (d = 1.447. d > 0.8 indicates a significant difference) between 5-year and 16-year liquors.
论文关键词:Classifiability analysis, Meta-learner, Bayes error rate, Information gain, Spectroscopic profiling, Vintage liquor
论文评审过程:
论文官网地址:https://doi.org/10.1007/s10489-021-02810-8