Con2Vec: Learning embedding representations for contrast sets

作者:

Highlights:

摘要

Contrast sets are used in many knowledge-based systems to capture data patterns relevant to a target variable. While they have many advantages such as being highly interpretable, they do not come with a similarity measure or feature vectors for downstream tasks such as regression or classification. To address these disadvantages, we propose Con2Vec (Contrast set to Vector), a method to embed contrast sets into a low-dimensional continuous vector space. Con2Vec defines two novel similarity and co-occurrence contexts for a contrast set, and then leverages a neural embedding model to learn low-dimensional continuous vectors (aka embeddings) for contrast sets. We further apply contrast set embeddings to construct the feature vectors for transactional data. We extensively evaluate our method Con2Vec on four real-world datasets, compared against state-of-the-art embedding and non-embedding methods where the results demonstrate the clear advantages of our method.

论文关键词:Contrast set,Contrast set embedding,Pattern embedding,Representation learning,Classification

论文评审过程:Received 25 May 2021, Revised 13 July 2021, Accepted 9 August 2021, Available online 12 August 2021, Version of Record 19 August 2021.

论文官网地址:https://doi.org/10.1016/j.knosys.2021.107382