Improving Reliability of Crowdsourced Results by Detecting Crowd Workers with Multiple Identities

Quality control in crowdsourcing marketplaces plays a vital role in ensuring useful outcomes. In this paper, we focus on tackling the issue of crowd workers participating in tasks multiple times using different worker-ids to maximize their earnings. Workers attempting to complete the same task repeatedly may not be harmful in cases where the aim of a requester is to gather data or annotations, wherein more contributions from a single worker are fruitful. However, in several cases where the outcomes are subjective, requesters prefer the participation of distinct crowd workers. We show that traditional means to identify unique crowd workers such as worker-ids and ip-addresses are not sufficient. To overcome this problem, we propose the use of browser fingerprinting in order to ascertain the unique identities of crowd workers in paid crowdsourcing microtasks. By using browser fingerprinting across 8 different crowdsourced tasks with varying task difficulty, we found that 6.18% of crowd workers participate in the same task more than once, using different worker-ids to avoid detection. Moreover, nearly 95% of such workers in our experiments pass gold-standard questions and are deemed to be trustworthy, significantly biasing the results thus produced. 

Authors: Ujwal Gadiraju and Ricardo Kawase.

A Layered Approach to Revisitation Prediction

kawase-icwe2011

Session chair Daniel Schwabe and presenter Ricardo Kawase @ICWE2011

Web browser users return to Web pages for various reasons. Apart from pages visited due to backtracking, they typically have a number of favorite/important pages that they monitor or tasks that reoccur on an infrequent basis. In this paper, we introduce the architecture of a system that facilitates revisitations through the effective prediction of the next page request. It consists of three layers, each dealing with a specific aspect of revisitation patterns: the first one estimates the value of each page by balancing the recency and the frequency of its requests; the second one captures the contextual regularities in users’ navigational activity in order to promote related pages, and the third one dynamically adapts the page associations of the second layer to the constant drift in the interests of users. For each layer, we introduce several methods, and evaluate them over a large, real-world dataset. The outcomes of our experimental evaluation suggest a significant improvement over other methods typically used in this context.

Venue: ICWE2011

Authors:  George Papadakis, Ricardo Kawase, Eelco Herder and Claudia Niederée

PDF: kawase-icwe2011