Fast approximate matching of binary codes with distinctive bits

作者:Chenggang Clarence Yan, Hongtao Xie, Bing Zhang, Yanping Ma, Qiong Dai, Yizhi Liu

摘要

Although the distance between binary codes can be computed fast in Hamming space, linear search is not practical for large scale datasets. Therefore attention has been paid to the efficiency of performing approximate nearest neighbor search, in which hierarchical clustering trees (HCT) are widely used. However, HCT select cluster centers randomly and build indexes with the entire binary code, this degrades search performance. In this paper, we first propose a new clustering algorithm, which chooses cluster centers on the basis of relative distances and uses a more homogeneous partition of the dataset than HCT has to build the hierarchical clustering trees. Then, we present an algorithm to compress binary codes by extracting distinctive bits according to the standard deviation of each bit. Consequently, a new index is proposed using compressed binary codes based on hierarchical decomposition of binary spaces. Experiments conducted on reference datasets and a dataset of one billion binary codes demonstrate the effectiveness and efficiency of our method.

论文关键词:binary codes, approximate nearest neighbor search, hierarchical clustering index

论文评审过程:

论文官网地址:https://doi.org/10.1007/s11704-015-4192-0