Ambiguous author query detection using crowdsourced digital library annotations

作者:

Highlights:

摘要

The name ambiguity problem is especially challenging in the field of bibliographic digital libraries. The problem is amplified when names are collected from heterogeneous sources. This is the case in the Scholarometer system, which performs bibliometric analysis by cross-correlating author names in user queries with those retrieved from digital libraries. The uncontrolled nature of user-generated annotations is very valuable, but creates the need to detect ambiguous names. Our goal is to detect ambiguous names at query time by mining digital library annotation data, thereby decreasing noise in the bibliometric analysis. We explore three kinds of heuristic features based on citations, metadata, and crowdsourced topics in a supervised learning framework. The proposed approach achieves almost 80% accuracy. Finally, we compare the performance of ambiguous author detection in Scholarometer using Google Scholar against a baseline based on Microsoft Academic Search.

论文关键词:Ambiguous name detection,Data mining,Citation analysis,Scholarly data,Discipline annotations

论文评审过程:Received 9 February 2012, Revised 30 August 2012, Accepted 11 September 2012, Available online 15 October 2012.

论文官网地址:https://doi.org/10.1016/j.ipm.2012.09.001