Deep compression of probabilistic graphical networks

Highlights：

• The developed pruning approach is a layer-by-layer and pre-pruning one. That is, the final PGNs grow in a layer-by-layer and compressed manner. It is very efficient in redundancy reduction. The whole pruning only takes a few seconds to conduct, while other complex methods take much more time to select sparse architectures or conduct model compression, such as regularization and low-rank factorization approaches. These model compression techniques are more time-consuming compared with pruning due to the complexity of regularization and factorization.

• The correctness and efficiency of the proposed compression approach are evaluated by a number of PGNs with different datasets. It can be used to compress trained PGNs to achieve a lightweight network which can be embedded in on-chip memory of small mobile devices. It is also flexible to implement and can be generalized to other deep PGNs.

摘要

•Deep compression of these deterministic models has been proposed in last few years to reduce the number of connections and nodes, while remaining the classification accuracy of these models. This paper is the first attempt to combine deep probabilistic graphical networks (PGNs) and deep compression techniques together to derive sparse versions of the deep probabilistic models.•The developed pruning approach is a layer-by-layer and pre-pruning one. That is, the final PGNs grow in a layer-by-layer and compressed manner. It is very efficient in redundancy reduction. The whole pruning only takes a few seconds to conduct, while other complex methods take much more time to select sparse architectures or conduct model compression, such as regularization and low-rank factorization approaches. These model compression techniques are more time-consuming compared with pruning due to the complexity of regularization and factorization.•The correctness and efficiency of the proposed compression approach are evaluated by a number of PGNs with different datasets. It can be used to compress trained PGNs to achieve a lightweight network which can be embedded in on-chip memory of small mobile devices. It is also flexible to implement and can be generalized to other deep PGNs.