Measures of searcher performance: A psychometric evaluation

摘要

Several measures—such as recall, precision, term overlap, and efficiency— have been used to evaluate bibliographic searching. When applied to searches for specific facts in a full-text database, these measures seem less appropriate. For instance, recall is reduced to a binary measure reflecting the success or failure of the search to retrieve the desired fact; and lack of precision may simply reflect the searcher's unwillingness to expend further effort in narrowing a search. It is likely that new measures will need to be developed, and the applicability of known measures to factual searches in full-text databases needs to be evaluated. A study was conducted to evaluate 21 measures of performance on factual searches of a full-text database. The measures included two measures of recall, two measures of precision, seven measures of search term overlap, seven measures of improvement in search term overlap, and three measures of efficiency. Each of these measures was calculated for the searches performed by each of 26 first-year medical students on INQUIRER, a database of facts and concepts in microbiology. Their reliability and construct validity were investigated. Their underlying structure consisted of three factors: Process/Outcome (precision, recall, and term overlap). Improvement in Term Overlap, and Efficiency. One scale for each factor was constructed after eliminating 7 of the original 21 variables. Each of these scales demonstrated adequate reliability for research purposes. The utility of these measures in future research on online searching is discussed.