Managing irrelevant knowledge in CBR models for unsolicited e-mail classification

作者:

Highlights:

摘要

The problem of unsolicited e-mail has been increasing during recent years. Fortunately, some advanced technologies have been successfully applied to spam filtering, achieving promising results. Recently, we have introduced SpamHunting, a successful spam filter able to address the concept drift problem by combining a relevant term identification technique with an evolving sliding window strategy.Several successful spam filtering techniques use continuous learning strategies to achieve better adaptation capabilities and address concept drift issues. Nevertheless, due to the presence of concept drift and hidden changes in the environment, the presence of obsolete and irrelevant knowledge becomes a serious drawback. Soon after the launch of the filter, many decisions are made based on irrelevant and/or obsolete knowledge. Therefore, in such a situation, the use of forgetting strategies is as important as the implementation of continuous learning approaches.In this paper we introduce a novel technique designed for identifying and removing the obsolete and irrelevant knowledge that has accumulated over to the passage of time. We have carried out several experiments to test for the suitability of our proposal showing the results obtained and its applicability.

论文关键词:Anti-spam filtering,Irrelevant knowledge,Concept drift,EIRN viewer,CBR system

论文评审过程:Available online 8 December 2007.

论文官网地址:https://doi.org/10.1016/j.eswa.2007.11.037