ISKE: An unsupervised automatic keyphrase extraction approach using the iterated sentences based on graph method

作者:

Highlights:

摘要

The automatic extraction of key information is necessary for knowledge discovery in this era of rapid knowledge growth. The extraction of key information can also help researchers quickly obtain the information they want instead of reading through all potential documents. Recently, researchers have refocused their attention from words to sentences because utilizing sentences outperforms with respect to illustrating semantics and reduces the calculation complexity. We present a novel and lightweight automatic keyphrase extraction algorithm that does not depend on any external resources, including an external dictionary or corpus. Unlike traditional graph-based algorithms that iterate words to generate keyphrase lists, our proposal uses iterated sentences to rank words and generate keyphrase lists for the semantic information of sentences that are more complete than the word. We initialize the values of words with weighted information and generate a sentence score using these values. Then, we integrate sentences to update their values; hence, the values of the words are updated with the sentence information. We iterate this process until the values of the sentences and words converge. The proposed method is based on a measurement of the relations between sentences and an evaluation of the flow of these relations in an easily understood manner. These relationships are based on the hypothesis that the causality between adjacent sentences is semantically stronger than the causality between words. We not only increase the extraction accuracy, but also reduce the number of iterations of the algorithm. We compare our proposed method with five strong, popular baseline algorithms on four datasets. The results show that our proposed method performs better than the other algorithms on three evaluation metrics.

论文关键词:Keyphrase extraction,Graph-based,Unsupervised learning,Iterated sentence

论文评审过程:Received 10 September 2020, Revised 28 March 2021, Accepted 30 March 2021, Available online 20 April 2021, Version of Record 20 April 2021.

论文官网地址:https://doi.org/10.1016/j.knosys.2021.107014