Applying informetric characteristics of databases to ir system file design, part I: Informetric models

作者:

Highlights:

摘要

This study examines how informetric characteristics of information retrieval (IR) system databases can be used to help the systems designer decide what types of file structures would provide the best performance for a given type of information system environment. In this first of two papers, the development of appropriate models describing database contents, to be used later in a simulation study, are dealt with. Database characteristics for which data were collected include: the index term frequency distribution, the distribution of terms used per query, and the distribution of term frequency selections. A shifted generalized Waring distribution was found to provide the best fit for the index term distributions with the large data sets used. For the terms used per query, a shifted negative binomial was found to provide a reasonable fit. A complex relationship was observed for the term selection distribution data, for which the empirical distribution is used. As well, four other hypothetical term selection relationships are presented. With this information, a simulation study examining system performance under different informetric environments can be undertaken.

论文关键词:

论文评审过程:Available online 17 July 2002.

论文官网地址:https://doi.org/10.1016/0306-4573(92)90098-K