Those were the days: learning to rank social media posts for reminiscence

Social media posts are a great source for life summaries aggregating activities, events, interactions and thoughts of the last months or years. They can be used for personal reminiscence as well as for keeping track with developments in the lives of not-so-close friends. One of the core challenges of automatically creating such summaries is to decide which posts are memorable, i.e., should be considered for retention and which ones to forget. To address this challenge, we design and conduct user evaluation studies and construct a corpus that captures human expectations towards content retention. We analyze this corpus to identify a small set of seed features that are most likely to characterize memorable posts. Next, we compile a broader set of features that are leveraged to build general and personalized machine-learning models to rank posts for retention. By applying feature selection, we identify a compact yet effective subset of these features. The models trained with the presented feature sets outperform the baseline models exploiting an intuitive set of temporal and social features.

Authors: Kaweh Djafari Naini, Ricardo Kawase, Nattiya Kanhabua, Claudia Niederée, Ismail Sengor Altingovde

Crowd Anatomy Beyond the Good and Bad: Behavioral Traces for Crowd Worker Modeling and Pre-selection.

The suitability of crowdsourcing to solve a variety of problems has been investigated widely. Yet, there is still a lack of understanding about the distinct behavior and performance of workers within microtasks. In this paper, we first introduce a fine-grained data-driven worker typology based on different dimensions and derived from behavioral traces of workers. Next, we propose and evaluate novel models of crowd worker behavior and show the benefits of behavior-based worker pre-selection using machine learning models. We also study the effect of task complexity on worker behavior. Finally, we evaluate our novel typology-based worker pre-selection method in image transcription and information finding tasks involving crowd workers completing 1,800 HITs. Our proposed method for worker pre-selection leads to a higher quality of results when compared to the standard practice of using qualification or pre-screening tests. For image transcription tasks our method resulted in an accuracy increase of nearly 7% over the baseline and of almost 10% in information finding tasks, without a significant difference in task completion time. Our findings have important implications for crowdsourcing systems where a worker’s behavioral type is unknown prior to participation in a task. We highlight the potential of leveraging worker types to identify and aid those workers who require further training to improve their performance. Having proposed a powerful automated mechanism to detect worker types, we reflect on promoting fairness, trust and transparency in microtask crowdsourcing platforms.

Authors: Ujwal Gadiraju, Gianluca Demartini, Ricardo Kawase, Stefan Dietze

Using Worker Self-Assessments for Competence-Based Pre-Selection in Crowdsourcing Microtasks

Paid crowdsourcing platforms have evolved into remarkable marketplaces where requesters can tap into human intelligence to serve a multitude of purposes, and the workforce can benefit through monetary returns for investing their efforts. In this work, we focus on individual crowd worker competencies. By drawing from self-assessment theories in psychology, we show that crowd workers often lack awareness about their true level of competence. Due to this, although workers intend to maintain a high reputation, they tend to participate in tasks that are beyond their competence. We reveal the diversity of individual worker competencies, and make a case for competence-based pre-selection in crowdsourcing marketplaces. We show the implications of flawed self-assessments on real-world microtasks, and propose a novel worker pre-selection method that considers accuracy of worker self-assessments. We evaluated our method in a sentiment analysis task and observed an improvement in the accuracy by over 15%, when compared to traditional performance-based worker pre-selection. Similarly, our proposed method resulted in an improvement in accuracy of nearly 6% in an image validation task. Our results show that requesters in crowdsourcing platforms can benefit by considering worker self-assessments in addition to their performance for pre-selection.

Authors: Ujwal Gadiraju, Besnik Fetahu, Ricardo Kawase, Patrick Siehndel, Stefan Dietze

Improving Reliability of Crowdsourced Results by Detecting Crowd Workers with Multiple Identities

Quality control in crowdsourcing marketplaces plays a vital role in ensuring useful outcomes. In this paper, we focus on tackling the issue of crowd workers participating in tasks multiple times using different worker-ids to maximize their earnings. Workers attempting to complete the same task repeatedly may not be harmful in cases where the aim of a requester is to gather data or annotations, wherein more contributions from a single worker are fruitful. However, in several cases where the outcomes are subjective, requesters prefer the participation of distinct crowd workers. We show that traditional means to identify unique crowd workers such as worker-ids and ip-addresses are not sufficient. To overcome this problem, we propose the use of browser fingerprinting in order to ascertain the unique identities of crowd workers in paid crowdsourcing microtasks. By using browser fingerprinting across 8 different crowdsourced tasks with varying task difficulty, we found that 6.18% of crowd workers participate in the same task more than once, using different worker-ids to avoid detection. Moreover, nearly 95% of such workers in our experiments pass gold-standard questions and are deemed to be trustworthy, significantly biasing the results thus produced. 

Authors: Ujwal Gadiraju and Ricardo Kawase.

Human Beyond the Machine: Challenges and Opportunities of Microtask Crowdsourcing.

In the 21st century, where automated systems and artificial intelligence are replacing arduous manual labor by supporting data-intensive tasks, many problems still require human intelligence. Over the last decade, by tapping into human intelligence through microtasks, crowdsourcing has found remarkable applications in a wide range of domains. In this article, the authors discuss the growth of crowdsourcing systems since the term was coined by columnist Jeff Howe in 2006. They shed light on the evolution of crowdsourced microtasks in recent times. Next, they discuss a main challenge that hinders the quality of crowdsourced results: the prevalence of malicious behavior. They reflect on crowdsourcing’s advantages and disadvantages. Finally, they leave the reader with interesting avenues for future research.

Authors: Ujwal Gadiraju, Gianluca Demartini, Ricardo Kawase, Stefan Dietze.

Analyzing and Predicting Privacy Settings in the Social Web

Social networks provide a platform for people to connect and share information and moments of their lives. With the increasing engagement of users in such platforms, the volume of personal information that is exposed online grows accordingly. Due to carelessness, unawareness or difficulties in defining adequate privacy settings, private or sensitive information may be exposed to a wider audience than intended or advisable, potentially with serious problems in the private and professional life of a user. Although these causes usually receive public attention when it involves companies’ higher managing staff, athletes, politicians or artists, the general public is also subject to these issues. To address this problem, we envision a mechanism that can suggest users the appropriate privacy setting for their posts taking into account their profiles. In this paper, we present a thorough analysis of privacy settings in Facebook posts and evaluate prediction models that can anticipate the desired privacy settings with high accuracy, making use of the users’ previous posts and preferences.

Authors: Kaweh Djafari Naini, Ismail Sengor Altingovde, Ricardo Kawase, Eelco Herder, Claudia Niederée

PDF: naini-umap2015.pdf

Methods for web revisitation prediction: survey and experimentation

More than 45 % of the pages that we visit on the Web are pages that we have visited before. Browsers support revisits with various tools, including bookmarks, history views and URL auto-completion. However, these tools only support revisits to a small number of frequently and recently visited pages. Several browser plugins and extensions have been proposed to better support the long tail of less frequently visited pages, using recommendation and prediction techniques. In this article, we present a systematic overview of revisitation prediction techniques, distinguishing them into two main types and several subtypes. We also explain how the individual prediction techniques can be combined into comprehensive revisitation workflows that achieve higher accuracy. We investigate the performance of the most important workflows and provide a statistical analysis of the factors that affect their predictive accuracy. Further, we provide an upper bound for the accuracy of revisitation prediction using an ‘oracle’ that discards non-revisited pages.

Authors: George Papadakis, Ricardo Kawase, Eelco Herder & Wolfgang Nejdl

Understanding Malicious Behavior in Crowdsourcing Platforms: The Case of Online Surveys.

Crowdsourcing is increasingly being used as a means to tackle problems requiring human intelligence. With the ever-growing worker base that aims to complete microtasks on crowdsourcing platforms in exchange for financial gains, there is a need for stringent mechanisms to prevent exploitation of deployed tasks. Quality control mechanisms need to accommodate a diverse pool of workers, exhibiting a wide range of behavior. A pivotal step towards fraud-proof task design is understanding the behavioral patterns of microtask workers. In this paper, we analyze the prevalent malicious activity on crowdsourcing platforms and study the behavior exhibited by trustworthy and untrustworthy workers, particularly on crowdsourced surveys. Based on our analysis of the typical malicious activity, we define and identify different types of workers in the crowd, propose a method to measure malicious activity, and finally present guidelines for the efficient design of crowdsourced surveys.

Authors: Ujwal Gadiraju, Ricardo Kawase, Stefan Dietze, Gianluca Demartini

Breaking Bad: Understanding Behavior of Crowd Workers in Categorization Microtasks

Crowdsourcing systems are being widely used to overcome several challenges that require human intervention. While there is an increase in the adoption of the crowdsourcing paradigm as a solution, there are no established guidelines or tangible recommendations for task design with respect to key parameters such as task length, monetary incentive and time required for task completion. In this paper, we propose the tuning of these parameters based on our findings from extensive experiments and analysis of categorization tasks. We delve into the behavior of workers that consume categorization tasks to determine measures that can make task design more effective.

Authors: Ujwal Gadiraju, Patrick Siehndel, Besnik Fetahu, Ricardo Kawase