Finding representative landmarks of data on manifolds

作者:

Highlights:

摘要

Data-driven non-parametric models, such as manifold learning algorithms, are promising data analysis tools. However, to fit an off-training-set data point in a learned model, one must first “locate” the point in the training set. This query has a time cost proportional to the problem size, which limits the model's scalability. In this paper, we address the problem of selecting a subset of data points as the landmarks helping locate the novel points on the data manifolds. We propose a new category of landmarks defined with the following property: the way the landmarks represent the data in the ambient Euclidean space should resemble the way they represent the data on the manifold. Given the data points and the subset of landmarks, we provide procedures to test whether the proposed property presents for the choice of landmarks. If the data points are organized with a neighbourhood graph, as it is often conducted in practice, we interpret the proposed property in terms of the graph topology. We also discuss the extent to which the topology is preserved for landmark set passing our test procedure. Another contribution of this work is to develop an optimization based scheme to adjust an existing landmark set, which can improve the reliability for representing the manifold data. Experiments on the synthetic data and the natural data have been done. The results support the proposed properties and algorithms.

论文关键词:Manifold learning,Data representation,Dimensionality reduction

论文评审过程:Received 30 August 2008, Revised 29 December 2008, Accepted 28 January 2009, Available online 11 February 2009.

论文官网地址:https://doi.org/10.1016/j.patcog.2009.01.032