A flexible ILP formulation for hierarchical clustering

作者:

摘要

Hierarchical clustering is a popular approach in a number of fields with many well known algorithms. However, all existing work to our knowledge implements a greedy heuristic algorithm with no explicit objective function. In this work we formalize hierarchical clustering as an integer linear programming (ILP) problem with a natural objective function and the dendrogram properties enforced as linear constraints. Our experimental work shows that even for small data sets finding the global optimum produces more accurate results. Formalizing hierarchical clustering as an ILP with constraints has several advantages beyond finding the global optima. Relaxing the dendrogram constraints such as transitivity can produce novel problem variations such as finding hierarchies with overlapping clusterings. It is also possible to add constraints to encode guidance such as must–link, cannot–link, must–link–before etc. Finally, though exact solvers exist for ILP we show that a simple randomized algorithm and a linear programming (LP) relaxation can be used to provide approximate solutions faster.

论文关键词:Hierarchical clustering,Constraints,Integer linear programming

论文评审过程:Revised 10 May 2015, Accepted 23 May 2015, Available online 4 June 2015, Version of Record 9 February 2017.

论文官网地址:https://doi.org/10.1016/j.artint.2015.05.009