Fast t-SNE algorithm with forest of balanced LSH trees and hybrid computation of repulsive forces

作者:

Highlights:

摘要

An acceleration of the well-known t-Stochastic Neighbor Embedding (t-SNE) (Hinton and Roweis, 2003; Maaten and Hinton, 2008) algorithm, probably the best (nonlinear) dimensionality reduction and visualization method, is proposed in this article.By using a specially-tuned forest of balanced trees constructed via locality sensitive hashing is improved significantly upon the results presented in Maaten (2014), achieving a complexity significantly closer to true O(nlogn), and vastly improving behavior for huge numbers of instances and attributes. Such acceleration removes the necessity to use PCA to reduce dimensionality before the start of t-SNE.Additionally, a fast hybrid method for repulsive forces computation (a part of the t-SNE algorithm), which is currently the fastest method known, is proposed.A parallelized version of our algorithm, characterized by a very good speedup factor, is proposed.

论文关键词:Visualization of high-dimensional data,Dimensionality reduction,Stochastic neighbor embedding,t-SNE,Forest of locality-sensitive hashing trees,Locality-sensitive hashing,Balanced trees

论文评审过程:Received 29 February 2020, Revised 21 July 2020, Accepted 24 July 2020, Available online 28 July 2020, Version of Record 4 August 2020.

论文官网地址:https://doi.org/10.1016/j.knosys.2020.106318