Mining cost-effective patterns in event logs

作者:

Highlights:

摘要

High Utility Pattern Mining is a popular task for analyzing data. It consists of discovering patterns having a high importance in databases. A popular application of high utility pattern mining is to identify high utility (profitable) patterns in customer transaction data. Though such analysis can be useful to understand data, it does not consider the cost (e.g. effort, resources, money or time) required for obtaining the utility (benefits). In this paper, we argue that to discover interesting patterns in event sequences, it is useful to consider both a utility model and a cost model. For example, to identify cost-effective ways of treating patients from medical pathways data, it is desirable to consider not only the ability of treatments to inhibit symptoms or cure a disease (utility) but also the resources consumed and the time spent (cost) to provide these treatments. Based on this perspective, this paper defines a novel task of discovering Cost-Effective Event Sequences in event logs. In this task, cost is modeled as numeric values, while utility is represented either as binary or numeric values. Measures are proposed to evaluate the trade-off and correlation between cost and utility of patterns to identify cost-effective patterns (patterns having a low cost but providing a high utility). Three efficient algorithms called CEPB, corCEPB and CEPN are designed to extract these patterns. They rely on a tight lower-bound on the cost and a memory buffering technique to find patterns efficiently. Experiments show that the proposed algorithms achieve high efficiency, that proposed optimizations improve efficiency, and that insightful cost-effective patterns are found in real-life e-learning data.

论文关键词:Event logs,Sequences,Pattern mining,Sequential patterns,Cost-effective patterns,Utility,Cost

论文评审过程:Received 11 March 2019, Revised 8 November 2019, Accepted 16 November 2019, Available online 20 November 2019, Version of Record 8 February 2020.

论文官网地址:https://doi.org/10.1016/j.knosys.2019.105241