Reference metadata extraction using a hierarchical knowledge representation framework

作者:

Highlights:

摘要

The integration of bibliographical information on scholarly publications available on the Internet is an important task in the academic community. Accurate reference metadata extraction from such publications is essential for the integration of metadata from heterogeneous reference sources. In this paper, we propose a hierarchical template-based reference metadata extraction method for scholarly publications. We adopt a hierarchical knowledge representation framework called INFOMAP, which automatically extracts metadata. The experimental results show that, by using INFOMAP, we can extract author, title, journal, volume, number (issue), year, and page information from different kinds of reference styles with a high degree of precision. The overall average accuracy is 92.39% for the six major reference styles compared in this study.

论文关键词:Reference extraction,Metadata extraction,Knowledge representation framework,INFOMAP

论文评审过程:Received 13 January 2005, Revised 31 July 2006, Accepted 10 August 2006, Available online 28 September 2006.

论文官网地址:https://doi.org/10.1016/j.dss.2006.08.006