HINMINE: heterogeneous information network mining with information retrieval heuristics

作者:Jan Kralj, Marko Robnik-Šikonja, Nada Lavrač

摘要

The paper presents an approach to mining heterogeneous information networks by decomposing them into homogeneous networks. The proposed HINMINE methodology is based on previous work that classifies nodes in a heterogeneous network in two steps. In the first step the heterogeneous network is decomposed into one or more homogeneous networks using different connecting nodes. We improve this step by using new methods inspired by weighting of bag-of-words vectors mostly used in information retrieval. The methods assign larger weights to nodes which are more informative and characteristic for a specific class of nodes. In the second step, the resulting homogeneous networks are used to classify data either by network propositionalization or label propagation. We propose an adaptation of the label propagation algorithm to handle imbalanced data and test several classification algorithms in propositionalization. The new methodology is tested on three data sets with different properties. For each data set, we perform a series of experiments and compare different heuristics used in the first step of the methodology. We also use different classifiers which can be used in the second step of the methodology when performing network propositionalization. Our results show that HINMINE, using different network decomposition methods, can significantly improve the performance of the resulting classifiers, and also that using a modified label propagation algorithm is beneficial when the data set is imbalanced.

论文关键词:Network analysis, Heterogeneous information networks, Network decomposition, Personalized PageRank, Information retrieval, Text mining heuristics, Centroid classifier, SVM, label propagation, Imbalanced data

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10844-017-0444-9