Data summarization: a survey

作者:Mohiuddin Ahmed

摘要

Summarization has been proven to be a useful and effective technique supporting data analysis of large amounts of data. Knowledge discovery from data (KDD) is time consuming, and summarization is an important step to expedite KDD tasks by intelligently reducing the size of processed data. In this paper, different summarization techniques for structured and unstructured data are discussed. The key finding of this survey is that not all summarization techniques create a summary suitable for further analysis. It is highlighted that sampling techniques are a viable way of creating a summary for further knowledge discovery such as anomaly detection from summary. Also different summary evaluation metrics are discussed.

论文关键词:Summarization, Structured data, Unstructured data, Machine learning, Statistics, Semantics, Natural language processing, Cyber security

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10115-018-1183-0