Sequence partitioning for process mining with unlabeled event logs

作者:

Highlights:

摘要

Finding the case id in unlabeled event logs is arguably one of the hardest challenges in process mining research. While this problem has been addressed with greedy approaches, these usually converge to sub-optimal solutions. In this work, we describe an approach to perform complete search over the search space. We formulate the problem as a matter of finding the minimal set of patterns contained in a sequence, where patterns can be interleaved but do not have repeating symbols. This represents a new problem that has not been previously addressed in the literature, with NP-hard variants and conjectured NP-completeness. We solve it in a stepwise manner, by generating and verifying a list of candidate solutions. The techniques, introduced to address various subtasks, can be applied independently for solving more specific problems. The approach has been implemented and applied in a case study with real data from a business process supported in a software application.

论文关键词:Process mining,Sequential pattern mining,Sequence partitioning,Combinatorics on words

论文评审过程:Received 9 June 2010, Revised 10 May 2011, Accepted 10 May 2011, Available online 20 May 2011.

论文官网地址:https://doi.org/10.1016/j.datak.2011.05.003