A comparison of models for fusion of the auditory and visual sensors in speech perception

作者：Jordi Robert-Ribes, Jean-Luc Schwartz, Pierre Escudier

摘要

Though a large amount of psychological and physiological evidence of audio-visual integration in speech has been collected in the last 20 years, there is no agreement about the nature of the fusion process. We present the main experimental data, and describe the various models proposed in the literature, together with a number of studies in the field of automatic audiovisual speech recognition. We discuss these models in relation to general proposals arising from psychology in the field of intersensory interaction, or from the field of vision and robotics in the field of sensor fusion. Then we examine the characteristics of four main models, in the light of psychological data and formal properties, and we present the results of a modelling study on audio-visual recognition of French vowels in noise. We conclude in favor of the relative superiority of a model in which the auditory and visual inputs are projected and fused in a common representation space related to motor properties of speech objects, the fused representation being further classified for lexical access.

论文关键词：audiovisual speech perception, sensor fusion, noisy speech recognition, intersensory interactions, nowel processing

论文评审过程：

论文官网地址：https://doi.org/10.1007/BF00849043