Learning from crowdsourced labeled data: a survey

作者：Jing Zhang, Xindong Wu, Victor S. Sheng

摘要

With the rapid growing of crowdsourcing systems, quite a few applications based on a supervised learning paradigm can easily obtain massive labeled data at a relatively low cost. However, due to the variable uncertainty of crowdsourced labelers, learning procedures face great challenges. Thus, improving the qualities of labels and learning models plays a key role in learning from the crowdsourced labeled data. In this survey, we first introduce the basic concepts of the qualities of labels and learning models. Then, by reviewing recently proposed models and algorithms on ground truth inference and learning models, we analyze connections and distinctions among these techniques as well as clarify the level of the progress of related researches. In order to facilitate the studies in this field, we also introduce open accessible real-world data sets collected from crowdsourcing systems and open source libraries and tools. Finally, some potential issues for future studies are discussed.

论文关键词：Crowdsourcing, Learning from crowds, Multiple noisy labeling, Label quality, Learning model quality, Ground truth inference

论文评审过程：

论文官网地址：https://doi.org/10.1007/s10462-016-9491-9