Visual units and confusion modelling for automatic lip-reading

作者:

Highlights:

• A novel technique for automatic lip-reading is proposed.

• A weighted finite state transducer cascade is used incorporating a confusion model.

• Performance was slightly better than a standard HMM system.

• The issue of suitable units for automatic lip-reading was also studied.

• It was found that visemes are sub-optimal because of reduced contextual modelling.

摘要

•A novel technique for automatic lip-reading is proposed.•A weighted finite state transducer cascade is used incorporating a confusion model.•Performance was slightly better than a standard HMM system.•The issue of suitable units for automatic lip-reading was also studied.•It was found that visemes are sub-optimal because of reduced contextual modelling.

论文关键词:Lip-reading,Speech recognition,Visemes,Weighted finite state transducers,Confusion matrices,Confusion modelling

论文评审过程:Received 26 June 2015, Revised 20 January 2016, Accepted 3 March 2016, Available online 1 April 2016, Version of Record 17 April 2016.

论文官网地址:https://doi.org/10.1016/j.imavis.2016.03.003