Methodologies of Compressing a Stable Performance Convolutional Neural Networks in Image Classification

摘要

Deep learning has made a real revolution in the embedded computing environment. Convolutional neural network (CNN) revealed itself as a reliable fit to many emerging problems. The next step, is to enhance the CNN role in the embedded devices including both implementation details and performance. Resources needs of storage and computational ability are limited and constrained, resulting in key issues we have to consider in embedded devices. Compressing (i.e., quantizing) the CNN network is a valuable solution. In this paper, Our main goals are: memory compression and complexity reduction (both operations and cycles reduction) of CNNs, using methods (including quantization and pruning) that don’t require retraining (i.e., allowing us to exploit them in mobile system, or robots). Also, exploring further quantization techniques for further complexity reduction. To achieve these goals, we compress a CNN model layers (i.e., parameters and outputs) into suitable precision formats using several quantization methodologies. The methodologies are: First, we describe a pruning approach, which allows us to reduce the required storage and computation cycles in embedded devices. Such enhancement can drastically reduce the consumed power and the required resources. Second, a hybrid quantization approach with automatic tuning for the network compression. Third, a K-means quantization approach. With a minor degradation relative to the floating-point performance, the presented pruning and quantization methods are able to produce a stable performance fixed-point reduced networks. A precise fixed-point calculations for coefficients, input/output signals and accumulators are considered in the quantization process.