Crowd Anatomy Beyond the Good and Bad: Behavioral Traces for Crowd Worker Modeling and Pre-selection.

The suitability of crowdsourcing to solve a variety of problems has been investigated widely. Yet, there is still a lack of understanding about the distinct behavior and performance of workers within microtasks. In this paper, we first introduce a fine-grained data-driven worker typology based on different dimensions and derived from behavioral traces of workers. Next, we propose and evaluate novel models of crowd worker behavior and show the benefits of behavior-based worker pre-selection using machine learning models. We also study the effect of task complexity on worker behavior. Finally, we evaluate our novel typology-based worker pre-selection method in image transcription and information finding tasks involving crowd workers completing 1,800 HITs. Our proposed method for worker pre-selection leads to a higher quality of results when compared to the standard practice of using qualification or pre-screening tests. For image transcription tasks our method resulted in an accuracy increase of nearly 7% over the baseline and of almost 10% in information finding tasks, without a significant difference in task completion time. Our findings have important implications for crowdsourcing systems where a worker’s behavioral type is unknown prior to participation in a task. We highlight the potential of leveraging worker types to identify and aid those workers who require further training to improve their performance. Having proposed a powerful automated mechanism to detect worker types, we reflect on promoting fairness, trust and transparency in microtask crowdsourcing platforms.

Authors: Ujwal Gadiraju, Gianluca Demartini, Ricardo Kawase, Stefan Dietze

Methods for web revisitation prediction: survey and experimentation

More than 45 % of the pages that we visit on the Web are pages that we have visited before. Browsers support revisits with various tools, including bookmarks, history views and URL auto-completion. However, these tools only support revisits to a small number of frequently and recently visited pages. Several browser plugins and extensions have been proposed to better support the long tail of less frequently visited pages, using recommendation and prediction techniques. In this article, we present a systematic overview of revisitation prediction techniques, distinguishing them into two main types and several subtypes. We also explain how the individual prediction techniques can be combined into comprehensive revisitation workflows that achieve higher accuracy. We investigate the performance of the most important workflows and provide a statistical analysis of the factors that affect their predictive accuracy. Further, we provide an upper bound for the accuracy of revisitation prediction using an ‘oracle’ that discards non-revisited pages.

Authors: George Papadakis, Ricardo Kawase, Eelco Herder & Wolfgang Nejdl

Understanding Malicious Behavior in Crowdsourcing Platforms: The Case of Online Surveys.

Crowdsourcing is increasingly being used as a means to tackle problems requiring human intelligence. With the ever-growing worker base that aims to complete microtasks on crowdsourcing platforms in exchange for financial gains, there is a need for stringent mechanisms to prevent exploitation of deployed tasks. Quality control mechanisms need to accommodate a diverse pool of workers, exhibiting a wide range of behavior. A pivotal step towards fraud-proof task design is understanding the behavioral patterns of microtask workers. In this paper, we analyze the prevalent malicious activity on crowdsourcing platforms and study the behavior exhibited by trustworthy and untrustworthy workers, particularly on crowdsourced surveys. Based on our analysis of the typical malicious activity, we define and identify different types of workers in the crowd, propose a method to measure malicious activity, and finally present guidelines for the efficient design of crowdsourced surveys.

Authors: Ujwal Gadiraju, Ricardo Kawase, Stefan Dietze, Gianluca Demartini

A Taxonomy of Microtasks on the Web

Nowadays, a substantial number of people are turning to crowdsourcing, in order to solve tasks that require human intervention. Despite a considerable amount of research done in the field of crowdsourcing, existing works fall short when it comes to classifying typically crowdsourced tasks. Understanding the dynamics of the tasks that are crowdsourced and the behaviour of workers, plays a vital role in efficient task-design. In this paper, we propose a two-level categorization scheme for tasks, based on an extensive study of 1000 workers on CrowdFlower. In addition, we present insights into certain aspects of crowd behavior; the task affinity of workers, effort exerted by workers to complete tasks of various types, and their satisfaction with the monetary incentives.

Authors: Ujwal , Ricardo Kawase and Stefan Dietze

PDF: gadiraju-ht2014

 

Predicting User Locations and Trajectories

Location-based services usually recommend new locations based on the user’s current location or a given destination. However, human mobility involves to a large extent routine behavior and visits to already visited locations. In this paper, we show how daily and weekly routines can be modeled with basic prediction techniques. We compare the methods based on their performance, entropy and correlation measures. Further, we discuss how location prediction for everyday activities can be used for personalization techniques, such as timely or delayed recommendations.

Authors: Eelco Herder, Patrick Siehndel and Ricardo Kawase

PDF: herder-umap2014

Classification of user interest patterns using a virtual folksonomy

User interest in topics and resources is known to be recurrent and to follow specific patterns, depending on the type of topic or resource. Traditional methods for predicting reoccurring patterns are based on ranking and associative models. In this paper we identify several ‘canonical’ patterns by clustering keywords related to visited resources, making use of a large repository of Web usage data. The keywords are derived from a ‘virtual’ folksonomy of tags assigned to these resources using a collaborative bookmarking system.

Venue: JCDL2011

Authors: Ricardo Kawase and Eelco Herder

PDF: kawase-jcdl2011a