Two improved access methods on compact binary (CB) trees

作者:

Highlights:

摘要

In many applications, information retrieval is a very important research field. In several key strategies, the binary trie is famous as a fast access method able to retrieve keys in order, so that the binary trie is often used as the indices of key search strategies such as trie hashing. However, the greater the number of the registered keys, the larger storage is required, as a result, the binary trie can not be stored into the main memory. In order to solve this problem, Jonge et al. proposed the method to represent the binary trie by a compact data structure, which is called the compact binary (CB) tree. The storage required for the CB tree is very small, to be sure, but it is debatable whether time and space efficiencies for CB tree can be improved. As for the time efficiency, searching and updating a key takes a lot of time in large key sets. This paper proposes a new hierarchical CB tree structure able to avoid the increase of the time-cost. As regards the space efficiency, in order to compress the binary trie into the CB tree, his method adopts the imaginary leaf, which is called a dummy leaf, for the nodes which have only one leaf. At this point, the number of dummy leaves has a bad influence on the space efficiency. This paper extends his tree representation for the Patricia binary trie and proposes the method for composing the CB tree without the dummy leaf. The theoretical and experimental results, using 50,000 Japanese nouns and 50,000 English words, show that this method provides faster access and more compact storage than the traditional method. Retrieval is 18–20 times, the insertion is 11–13 times and the deletion is 4–6 times faster. This method, moreover, is able to generate 25–40% smaller CB tree than the traditional method.

论文关键词:Information retrieval,Access method,Data structure,Binary trie,CB tree

论文评审过程:Author links open overlay panelMasamiShishiboriPersonEnvelopeMasafumiKoyamaMakotoOkadaJun-ichiAoeEnvelope

论文官网地址:https://doi.org/10.1016/S0306-4573(99)00035-7