Multimodal fusion for indoor sound source localization

作者：

Highlights：

• We propose a novel solution based on fusing visual and acoustic models to accurately identify the localization information of sound localization.

• We develop a HMM-based method for separation of the acoustic transfer function (ATF) to describe clean speech sound.

• We propose a new Fourier domain method for fast implementation of the HOG-type polar feature descriptor.

• The proposed method has rotation-invariant capabilities and also preserves the discriminative power of extracted features.

摘要

•We propose a novel solution based on fusing visual and acoustic models to accurately identify the localization information of sound localization.•We develop a HMM-based method for separation of the acoustic transfer function (ATF) to describe clean speech sound.•We propose a new Fourier domain method for fast implementation of the HOG-type polar feature descriptor.•The proposed method has rotation-invariant capabilities and also preserves the discriminative power of extracted features.

论文关键词：Sound source localization,Acoustic transfer function,HMM,Polar HOG,SVM

论文评审过程：Received 6 July 2020, Revised 8 December 2020, Accepted 17 February 2021, Available online 23 February 2021, Version of Record 27 February 2021.

论文官网地址：https://doi.org/10.1016/j.patcog.2021.107906