Self-supervised Monocular Trained Depth Estimation Using Triplet Attention and Funnel Activation

作者：Xuezhi Xiang, Xiangdong Kong, Yujian Qiu, Kaixu Zhang, Ning Lv

摘要

Dense depth estimation based on a single image is a basic problem in computer vision and has exciting applications in many robotic tasks. Modelling fully supervised methods requires the acquisition of accurate and large ground truth data sets, which is often complex and expensive. On the other hand, self-supervised learning has emerged as a promising alternative to monocular depth estimation as it does not require ground truth depth data. In this paper, we propose a novel self-supervised joint learning framework for depth estimation using consecutive frames from monocular and stereo videos. Our architecture leverages two new ideas for improvement: (1) triplet attention and (2) funnel activation (FReLU). By adding triplet attention to the deep and pose networks, this module captures the importance of features across dimensions in a tensor without any information bottlenecks, making the optimisation learning framework more reliable. FReLU is used at the non-linear activation layer to grasp the local context adaptively in images, rather than using more complex convolutions at the convolution layer. FReLU extracts the spatial structure of objects by the pixel-wise modeling capacity provided by the spatial condition, making the details of the complex image richer. The experimental results show that the proposed method is comparable with the state-of-the-art self-supervised monocular depth estimation method.

论文关键词：Self-supervised, Depth estimation, Triplet attention, Funnel activation

论文评审过程：

论文官网地址：https://doi.org/10.1007/s11063-021-10608-5