Recommending scientific paper via heterogeneous knowledge embedding based attentive recurrent neural networks

摘要

Tremendous academic information causes serious information overload problems while supporting scientific research. Scientific paper and citation recommendation systems have been developed to relieve this problem and work as a filter to furnish only relevant papers to researchers. Although previous studies have made comparative progress, this problem is still challenging because current paper recommendation systems rely on heterogeneous and multi-sourced features, thereby requiring a unified learning representation to cover different types and modalities of information. Additionally, the implicit influence of scholars’ previous preferences of writing and citing on his/her new manuscript has not been well considered in the previous studies. Facing the issue from these two aspects, in this paper, a heterogeneous knowledge embedding-based attentive RNN model is proposed to recommend scientific paper citations. First, the preparation of features consists of two parts: (1) building a unified learning representation of structural entities and relations for recommending paper citations; and (2) defining and constructing a bibliographic network comprising five types of entities and five relations. The bibliographic network enables learning a unified representation so that all graphical entities and relations can be vectorized using TransD. To establish textual representations, the PV-DM model is utilized to generate numeric features for the title of each paper. Second, by combining structural and textual representations focusing on the “author-text query” scenario, an attentive bidirectional RNN is constructed to recommend paper and citation based on an user’s identity with a length-limited inquiry to capture the scholars’ previous writing and citing preferences, thereby reducing recommendation error. Through the DBLP dataset, our experiment results show the feasibility and effectiveness of our method, both in terms of the number as well as the quality of the first few recommended items. In specific, compared with existing models, our model has improved MRR and NDCG by approximately 4.8% and 2.4%, respectively.