Discovery of optimal factors in binary data via a novel method of matrix decomposition

作者:

Highlights:

摘要

We present a novel method of decomposition of an binary matrix I into a Boolean product ○ of an binary matrix A and a binary matrix B with k as small as possible. Attempts to solve this problem are known from Boolean factor analysis where I is interpreted as an object–attribute matrix, A and B are interpreted as object–factor and factor–attribute matrices, and the aim is to find a decomposition with a small number k of factors. The method presented here is based on a theorem proved in this paper. It says that optimal decompositions, i.e. those with the least number of factors possible, are those where factors are formal concepts in the sense of formal concept analysis. Finding an optimal decomposition is an NP-hard problem. However, we present an approximation algorithm for finding optimal decompositions which is based on the insight provided by the theorem. The algorithm avoids the need to compute all formal concepts and significantly outperforms a greedy approximation algorithm for a set covering problem to which the problem of matrix decomposition is easily shown to be reducible. We present results of several experiments with various data sets including those from CIA World Factbook and UCI Machine Learning Repository. In addition, we present further geometric insight including description of transformations between the space of attributes and the space of factors.

论文关键词:Binary matrix decomposition,Factor analysis,Binary data,Formal concept analysis,Concept lattice

论文评审过程:Received 15 December 2007, Revised 30 July 2008, Available online 18 May 2009.

论文官网地址:https://doi.org/10.1016/j.jcss.2009.05.002