Semantics-to-Signal Scalable Image Compression with Learned Revertible Representations

作者：Kang Liu, Dong Liu, Li Li, Ning Yan, Houqiang Li

摘要

Image/video compression and communication need to serve both human vision and machine vision. To address this need, we propose a scalable image compression solution. We assume that machine vision needs less information that is related to semantics, whereas human vision needs more information that is to reconstruct signal. We then propose semantics-to-signal scalable compression, where partial bitstream is decodeable for machine vision and the entire bitstream is decodeable for human vision. Our method is inspired by the scalable image coding standard, JPEG2000, and similarly adopts subband-wise representations. We first design a trainable and revertible transform based on the lifting structure, which converts an image into a pyramid of multiple subbands; the transform is trained to make the partial representations useful for multiple machine vision tasks. We then design an end-to-end optimized encoding/decoding network for compressing the multiple subbands, to jointly optimize compression ratio, semantic analysis accuracy, and signal reconstruction quality. We experiment with two datasets: CUB200-2011 and FGVC-Aircraft, taking coarse-to-fine image classification tasks as an example. Experimental results demonstrate that our proposed method achieves semantics-to-signal scalable compression, and outperforms JPEG2000 in compression efficiency. The proposed method sheds light on a generic approach for image/video coding for human and machines.

论文关键词：Deep learning, Image compression, Lifting structure, Machine vision, Scalable coding

论文评审过程：

论文官网地址：https://doi.org/10.1007/s11263-021-01491-7