Efficiently mining frequent itemsets applied for textual aggregation

作者:Mustapha Bouakkaz, Youcef Ouinten, Sabine Loudcher, Philippe Fournier-Viger

摘要

Text mining approaches are commonly used to discover relevant information and relationships in huge amounts of text data. The term data mining refers to methods for analyzing data with the objective of finding patterns that aggregate the main properties of the data. The merger between the data mining approaches and on-line analytical processing (OLAP) tools allows us to refine techniques used in textual aggregation. In this paper, we propose a novel aggregation function for textual data based on the discovery of frequent closed patterns in a generated documents/keywords matrix. Our contribution aims at using a data mining technique, mainly a closed pattern mining algorithm, to aggregate keywords. An experimental study on a real corpus of more than 700 scientific papers collected on Microsoft Academic Search shows that the proposed algorithm largely outperforms four state-of-the-art textual aggregation methods in terms of recall, precision, F-measure and runtime.

论文关键词:Data mining, Closed keywords, Textual aggregation, OLAP

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10489-017-1050-9