Techniques for Quality Control in Applications that Use Crowdsourced Input
Crowdsourcing is the use of human workers, usually through the Internet, for obtaining useful services and performing computation tasks for which automated computation is inefficient or inapplicable. Crowdsourced input usually has a monetary cost and is obtained through the use of a crowdsourcing marketplace.
Applications with crowdsourced input have to address several challenges: obtaining worker input has high latency; workers may disagree on the same tasks and some workers may
provide wrong input on purpose. We introduce novel methods that provide state-of-the-art solutions to the quality control problem of crowdsourcing in the context of different applications.
Some of the most basic and common applications of crowdsourcing concern ranking and labeling problems. In ranking problems, the input of the crowd is used to sort elements in ranked order; whereas in labeling, the input of the crowd is used to assign each element a label. We study here two of the most common instances of these problems, providing algorithms that advance the state of the art: the top-k ranking problem, and the boolean labeling problem.
For crowdsourced top-k lists the goal is, for some number k, to obtain the top-k items out of larger itemsets, using human workers to perform comparisons among items. An example application of the top-k problem is to short-list a large set of college applications using advanced students as workers. We evaluate our proposed techniques for the top-k problem against prior art using simulations as well as real crowds in Amazon Mechanical Turk. A randomized variant of the proposed algorithms achieves significant budget saves, especially for very large itemsets and large top-k lists, with negligible risk of lowering the quality of the output. The boolean labeling problem consists in aggregating boolean (yes/no) answers by the crowd reliably; such tasks are widely used, for instance, to label input as spam or to judge the appropriateness of web content. We introduce two unsupervised algorithms for this problem: one derived heuristically which is simple to implement, and one based on iterated bayesian parameter estimation of user reputation models.
We provide mathematical insight into the benefits of the proposed algorithms over existing approaches, and we validate these insights by showing that both algorithms offer improved performance on several occasions on both synthetic datasets and real-world datasets obtained via Amazon Mechanical Turk. In some applications, there may be access to a set of informative features on workers, on their tasks and on how workers complete tasks (e.g., the time a worker took to complete a task).
Our third contribution consists in techniques for enhancing the precision of any crowdsourcing task. At the core of each crowdsourcing task is the consideration of user input on items of work: fundamentally, this constitutes a bipartite graph between users and items, on which some inference must be performed. We introduce a machine-learning approach that works directly on the graph of workers, tasks and the associated features, utilizing a multi-level architecture of Long Short-Term Memory neural nets (LSTMs), without the need for resorting to a-priori models of user behavior or item dynamics. We show that when such informative features are present, the machine-learning approach can provide enhanced quality compared to model-based inference approaches.