Towards better subtitles: A multilingual approach for punctuation restoration of speech transcripts

作者:

Highlights:

• A state-of-the-art approach for multilingual punctuation prediction.

• Knowledge about punctuation from pre-trained transformer-based encoder models.

• Monolingual models tested both in human-edited and in automatic transcripts.

• Single multilingual model predicts punctuation in multiple languages.

• Integration within an existing multilingual video subtitling pipeline.

摘要

•A state-of-the-art approach for multilingual punctuation prediction.•Knowledge about punctuation from pre-trained transformer-based encoder models.•Monolingual models tested both in human-edited and in automatic transcripts.•Single multilingual model predicts punctuation in multiple languages.•Integration within an existing multilingual video subtitling pipeline.

论文关键词:Punctuation marks,Intelligent subtitles,Pre-trained embeddings,Speech transcripts,Sentence boundaries,Multilingual embeddings

论文评审过程:Received 12 May 2020, Revised 6 August 2021, Accepted 6 August 2021, Available online 14 August 2021, Version of Record 18 August 2021.

论文官网地址:https://doi.org/10.1016/j.eswa.2021.115740