A survey of tagging techniques for music, speech and environmental sound

作者:Shufei Duan, Jinglan Zhang, Paul Roe, Michael Towsey

摘要

Sound tagging has been studied for years. Among all sound types, music, speech, and environmental sound are three hottest research areas. This survey aims to provide an overview about the state-of-the-art development in these areas. We discuss about the meaning of tagging in different sound areas at the beginning of the journey. Some examples of sound tagging applications are introduced in order to illustrate the significance of this research. Typical tagging techniques include manual, automatic, and semi-automatic approaches. After reviewing work in music, speech and environmental sound tagging, we compare them and state the research progress to date. Research gaps are identified for each research area and the common features and discriminations between three areas are discovered as well. Published datasets, tools used by researchers, and evaluation measures frequently applied in the analysis are listed. In the end, we summarise the worldwide distribution of countries dedicated to sound tagging research for years.

论文关键词:Sound tagging, Music tagging, Speech recognition, Environmental sound tagging, Manual tagging, Automatic tagging, Semi-automatic tagging

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10462-012-9362-y