Evaluating performance of AI operators using roofline model

作者：Zhengbo Chen, Fang Zheng, Qi Yu, Rujun Sun, Feng Guo, Zuoning Chen

摘要

Artificial Intelligence algorithms have shown performance advantages in a wide range of application domains. However, the increasing demand for hardware resources raises challenges for AI accelerator design. To alleviate this issue, the academic community has presented much research. Among these, evaluating the performance of AI algorithms on accelerators is a hot topic. However, such work usually requires a miscellaneous experimental setup configuration, and may involve repetitive tests. Instead of conducting redundant experiments with prior research, in this paper, we present a comprehensive evaluation of AI operators rather than AI algorithms in an easy-to-operate manner. We first explore common AI operators in a variety of AI algorithms with an in-depth analysis. We identify six representative operator categories. Then, we analyze their performance using roofline model. To verify our analysis, we conduct simple evaluation experiments, where several AI operators are evaluated on two NVIDIA GPUs. We observe from the evaluation results that AI operators benefit from low-precision, large-size on-chip cache and high-bandwidth off-chip memory, and sparsity processing. Based on the observations, we propose three optimization opportunities for AI accelerator design, including multiple-precision support, an efficient memory system, and sparsity processing.

论文关键词：AI algorithms, Accelerators, Convolutional neural network, AI operators, Sparsity processing

论文评审过程：

论文官网地址：https://doi.org/10.1007/s10489-021-02794-5