An Efficient Dataflow Mapping Method for Convolutional Neural Networks

摘要

Convolutional neural network (CNN) have been widely used in speech recognition, object detection and image recognition. In the process of inference, data access operations consume more energy than calculations. Therefore, optimizing the dataflow from external storage to the on-chip processing unit is an effective method to reduce the power consumption. The state-of-the-art row stationary dataflow maximizes the data reusing to reduce the number of data movement and thus reducing the power consumption of the system. However, it has the problems of low processing unit utilization and poor scalability. In this letter, we propose an enhance row stationary (ERS) dataflow. ERS dataflow maps the data of each channel of the three-dimensional filter and input feature map (ifmap) into a column of processing units by changing the mapping of data. The processing array operates multiple filters in parallel, which effectively improves the utilization of computing resources. Within each processing unit, one row of filter data and one row of ifmap data are processed at a time, and the filter row data and ifmap row data are reused in multiple convolution operations. In addition, a configurable sliding window model is proposed to solve the problem that the existing dataflow cannot handle the calculation array width smaller than the filter width. Simulation results show that for AlexNet and VGG16, the execution speed of ERS dataflow is improved by about 60% compared with row stationary dataflow, and the hardware resource utilization is improved by about 30%. For the MobileNet V1, the execution speed of ERS dataflow is improved by about 4% compared with the row stationary dataflow.