A statistical approach for modeling inter-document semantic relationships in digital libraries

作者:Jeyavaishnavi Muralikumar, Sri Ananda Seelan, Narendranath Vijayakumar, Vidhya Balasubramanian

摘要

E-Learning repositories and digital libraries are fast becoming important sources for gathering information and learning material. Such systems must therefore provide services to support the learning needs of their users. When a retrieval system shows how its documents relate to each other semantically, a user gets the liberty to choose from different material, and direct his/her study in a focused manner. This calls for a model that identifies types of document relationships, that need to address different aspects of learning. This article defines three such types and a unique statistical model that can automatically identify them in technical/scientific documents. The model defines measures to quantify the degree of relatedness based on distinct statistical patterns exhibited by the common terms in a pair of documents. This approach does not strictly require a knowledge base or hypertext for identifying the characteristic relationship between two documents. Such a statistical model can be extended to build further relatedness types and can be used alongside various other techniques in digital library recommendation engines. Our experiments over a large number of technical documents show that our techniques effectively extract the different types of relationships between documents.

论文关键词:Relatedness, Information retrieval, Digital libraries, Statistical modeling

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10844-016-0423-6