Progressively diffused networks for semantic visual parsing

作者:

Highlights:

摘要

Recent deep models advance the task of semantic visual parsing by increasing the depth of networks and the resolution (size) of the predicted labelmaps. However, the contextual information within each layer and between layers is not fully explored. Long Short Term Memory Networks(LSTM) that learn to propagate information is well-suited to model pixels dependencies with respect to spacial locations within layers and depths across layers. Unlike previous LSTM-based methods that tend to enhance representation of each pixel only by involving the information from adjacent area. This work proposes Progressively Diffused Networks (PDNs) to deal with complex semantic parsing tasks. It can explore spatial dependencies in a larger field that represents the rich contextual information among pixels. The proposed model has three appealing properties. First, it enables information to be progressively broadcast across feature maps by stacking multiple diffusion layers. Second, in each layer, multiple convolutional LSTMs are adopted to generate a series of feature maps with different ranges of contexts. Third, in each LSTM unit, a special type of atrous filters are designed to capture the short range and long range dependencies from various neighbors. Extensive experiments demonstrate the effectiveness of PDNs to substantially improve the performances of existing LSTM-based models.

论文关键词:Visual understanding,Image segmentation,Recurrent neural networks,Representation learning

论文评审过程:Received 28 February 2018, Revised 20 October 2018, Accepted 7 January 2019, Available online 15 January 2019, Version of Record 22 January 2019.

论文官网地址:https://doi.org/10.1016/j.patcog.2019.01.011