An approach for temporal analysis of email data based on segmentation

作者:

Highlights:

摘要

Many kinds of information are hidden in email data, such as the information being exchanged, the time of exchange, and the user IDs participating in the exchange. Analyzing the email data can reveal valuable information about the social networks of a single user or multiple users, the topics being discussed, and so on. In this paper, we describe a novel approach for temporally analyzing the communication patterns embedded in email data based on time series segmentation. The approach computes egocentric communication patterns of a single user, as well as sociocentric communication patterns involving multiple users. Time series segmentation is used to uncover patterns that may span multiple time points and to study how these patterns change over time. To find egocentric patterns, the email communication of a user is represented as an item-set time series. An optimal segmentation of the item-set time series is constructed, from which patterns are extracted. To find sociocentric patterns, the email data is represented as an item-setgroup time series. Patterns involving multiple users are then extracted from an optimal segmentation of the item-setgroup time series. The proposed approach is applied to the Enron email data set, which produced very promising results.

论文关键词:Item-set time series,Segment difference,Segmentation difference,Optimal segmentation,Egocentric patterns,Clique pattern

论文评审过程:Received 13 May 2008, Revised 30 April 2009, Accepted 30 April 2009, Available online 19 May 2009.

论文官网地址:https://doi.org/10.1016/j.datak.2009.04.011