Nearest-Neighbor Automatic Sound Annotation with a WordNet Taxonomy

作者：Pedro Cano, Markus Koppenberger, Sylvain Le Groux, Julien Ricard, Nicolas Wack, Perfecto Herrera

摘要

Sound engineers need to access vast collections of sound effects for their film and video productions. Sound effects providers rely on text-retrieval techniques to give access to their collections. Currently, audio content is annotated manually, which is an arduous task. Automatic annotation methods, normally fine-tuned to reduced domains such as musical instruments or limited sound effects taxonomies, are not mature enough for labeling with great detail any possible sound. A general sound recognition tool would require first, a taxonomy that represents the world and, second, thousands of classifiers, each specialized in distinguishing little details. We report experimental results on a general sound annotator. To tackle the taxonomy definition problem we use WordNet, a semantic network that organizes real world knowledge. In order to overcome the need of a huge number of classifiers to distinguish many different sound classes, we use a nearest-neighbor classifier with a database of isolated sounds unambiguously linked to WordNet concepts. A 30% concept prediction is achieved on a database of over 50,000 sounds and over 1600 concepts.

论文关键词：audio identification, WordNet, nearest-neighbor, everyday sound, knowledge management

论文评审过程：

论文官网地址：https://doi.org/10.1007/s10844-005-0318-4