Interval-based approach to lexicographic representation and compression of numeric data

摘要

This paper proposes a new method of encoding numbers by variable-length byte-strings. The primary property of the encoding is that the lexicographic comparison of the encoded numbers corresponds correctly to the order of the real numbers. The encoding is space-efficient. Further, unlike the fixed-length representations of numbers (fixed-point, floating-point, etc.,) the encoded numbers are not limited in their magnitude or the number of their significant digits. The paper also elaborates the application of the encoding method to the storage of numeric data in databases. The proposed application for databases is a uniform format for all the numbers, regardless of their types and attributes (fields). All the numbers are represented in a form of lexicographically-comparable byte-strings. This form simplifies the data management software (only one format to deal with at the physical database level) and hardware (when associative memory and storage devices etc. are used); makes the applications more flexible (by removing limitations on the sizes of numbers); and is space-efficient for all numbers while being especially concise for those numbers that are used more frequently in databases.

论文关键词：Numeric data fields,number encoding,comparison operations,databases,file structures,compactness of data,data independency,formats,floating point,variable-length data fields,real numbers,data compression