An extended DEIM algorithm for subset selection and class identification

作者:Emily P. Hendryx, Béatrice M. Rivière, Craig G. Rusin

摘要

The discrete empirical interpolation method (DEIM) has been shown to be a viable index-selection technique for identifying representative subsets in data. Having gained some popularity in reducing dimensionality of physical models involving differential equations, its use in subset-/pattern-identification tasks is not yet broadly known within the machine learning community. While it has much to offer as is, the DEIM algorithm is limited in that the number of selected indices cannot exceed the rank of the corresponding data matrix. Although this is not an issue for many data sets, there are cases in which the number of classes represented in a given data set is greater than the rank of the data matrix; in such cases, it is impossible for the standard DEIM algorithm to identify all classes. To overcome this issue, we present a novel extension of DEIM, called E-DEIM. With the proposed algorithm, we also provide some theoretical results for using extensions of DEIM to form the CUR matrix factorization in identifying both rows and columns to approximate the original data matrix. Results from applying variations of E-DEIM to two different data sets indicate that the presented extension can indeed allow for the identification of additional classes along with those selected by standard DEIM. In addition, comparing these results to those of some more familiar methods demonstrates that the proposed deterministic E-DEIM approach including coherence performs comparably to or better than the other evaluated methods and should be considered in future class-identification tasks.

论文关键词:Subset selection, Class identification, Discrete empirical interpolation method, Low rank data

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10994-021-05954-3