New Techniques for Data Reduction in a Database System for Knowledge Discovery Applications

作者:Akhil Kumar

摘要

Databases store large amounts of information about consumer transactions and other kinds of transactions. This information can be used to deduce rules about consumer behavior, and the rules can in turn be used to determine company policies, for instance with regards to production, marketing and in several other areas. Since databases typically store millions of records, and each record could have up to 100 or more attributes, as an initial step it is necessary to reduce the size of the database by eliminating attributes that do not influence the decision at all or do so very minimally. In this paper we present techniques that can be employed effectively for exact and approximate reduction in a database system. These techniques can be implemented efficiently in a database system using SQL (structured query language) commands. We tested their performance on a real data set and validated them. The results showed that the classification performance actually improved with a reduced set of attributes as compared to the case when all the attributes were present. We also discuss how our techniques differ from statistical methods and other data reduction methods such as rough sets.

论文关键词:Semantic information preserving reduction, relational databases, selection, projection, classification, reduced information systems

论文评审过程:

论文官网地址:https://doi.org/10.1023/A:1008633406999