Towards a Semantically Enriched Online Newspaper

The Internet plays a major role as a source of news. Many publishers offer online versions of their newspapers to paying customers. Online newspapers bear more similarity with traditional print papers than with regular news sites. In a close collaboration with Mediengruppe Madsack – publisher of newspapers in several German federal states, we aim at providing a semantically enriched online newspaper. News articles are annotated with relevant entities – places, persons and organizations. These annotations form the basis for an entity-based `Theme Radar’, a dashboard for monitoring articles related to the users’ explicitly indicated and inferred interests.

Authors: Ricardo Kawase, Eelco Herder, Patrick Siehndel

PDF: kawase-iswc2014

Extracting architectural patterns from Web data

Knowledge about the reception of architectural structures is crucial for architects or urban planners. Yet obtaining such information has been a challenging and costly activity. With the advent of the Web, a vast amount of structured and unstructured data describing architectural structures has become available publicly. This includes information about the perception and use of buildings (for instance, through social media), and structured information about the building’s features and characteristics (for instance, through public Linked Data). In this paper, we present the first step towards the exploitation of structured data available in the Linked Open Data cloud, in order to determine well-perceived architectural patterns.

Authors: Ujwal Gadiraju, Ricardo Kawase, Stefan Dietze

PDF:gadiraju-iswc2014

Identifying Topic-Related Hyperlinks in the Crowd

The microblogging service Twitter has become one of the most popular sources of real time information. Every second, hundreds of URLs are posted on Twitter.
Due to the maximum tweet length of 140 characters, these URLs are in most cases a shortened version of the original URLs. In contrast to the original URLS, which usually provide some hints on the destination Web site and the specific page, shortened links do not tell the users what to expect behind them. These links might contain relevant information or news regarding a certain topic of interest, but they might just as well be completely irrelevant, or even lead to a malicious or harmful website. In this paper, we present our work towards identifying credible Twitter users for given topics. We achieve this by characterizing the content of the posted URLs to further relate to the expertise of Twitter users.

Authors: Patrick Siehndel, Ricardo Kawase, Eelco Herder, Thomas Risse

PDF: siehndel-iswc2014

 

Exploiting the Wisdom of the Crowds for Characterizing and Connecting Heterogeneous Resources

presentation-ht2014

Ricardo Kawase presenting @HT2014 in Santiago, Chile (picture taken by Christoph Trattner)

Heterogeneous content is an inherent problem for cross-system search, recommendation and personalization. In this paper we investigate differences in topic coverage and the impact of topics in different kinds of Web services. We use entity extraction and categorization to create fingerprints that allow for meaningful comparison. As a basis taxonomy, we use the 23 main categories of Wikipedia Category Graph, which has been assembled over the years by the wisdom of the crowds. Following a proof of concept of our approach, we analyze differences in topic coverage and topic impact. The results show many differences between Web services like Twitter, Flickr and Delicious, which reflect users’ behavior and the usage of each system. The paper concludes with a user study that demonstrates the benefits of fingerprints over traditional textual methods for recommendations of heterogeneous resources.

Authors: Ricardo Kawase, Patrick Siehndel, Bernardo Pereira Nunes, Eelco Herder and Wolfgang Nejdl

PDF: kawase-ht2014

Online Prototype: http://twikime.l3s.uni-hannover.de

 

A Taxonomy of Microtasks on the Web

Nowadays, a substantial number of people are turning to crowdsourcing, in order to solve tasks that require human intervention. Despite a considerable amount of research done in the field of crowdsourcing, existing works fall short when it comes to classifying typically crowdsourced tasks. Understanding the dynamics of the tasks that are crowdsourced and the behaviour of workers, plays a vital role in efficient task-design. In this paper, we propose a two-level categorization scheme for tasks, based on an extensive study of 1000 workers on CrowdFlower. In addition, we present insights into certain aspects of crowd behavior; the task affinity of workers, effort exerted by workers to complete tasks of various types, and their satisfaction with the monetary incentives.

Authors: Ujwal , Ricardo Kawase and Stefan Dietze

PDF: gadiraju-ht2014

 

Predicting User Locations and Trajectories

Location-based services usually recommend new locations based on the user’s current location or a given destination. However, human mobility involves to a large extent routine behavior and visits to already visited locations. In this paper, we show how daily and weekly routines can be modeled with basic prediction techniques. We compare the methods based on their performance, entropy and correlation measures. Further, we discuss how location prediction for everyday activities can be used for personalization techniques, such as timely or delayed recommendations.

Authors: Eelco Herder, Patrick Siehndel and Ricardo Kawase

PDF: herder-umap2014

To the Point: A Shortcut to Essential Learning

The volume of information on the Web is constantly growing. Consequently, finding specific pieces of information becomes a harder task. Wikipedia, the largest online reference Website is beginning to witness this phenomenon. Learners often turn to Wikipedia in order to learn facts regarding different subjects. However, as time passes, Wikipedia articles get larger and specific information gets more difficult to be located. In this work, we propose an automatic annotation method that is able to precisely assign categories to any textual resource. Our approach relies on semantic enhanced annotations and Wikipedia’s categorization schema. The results of a user study shows that our proposed method provides solid results for classifying text and provides a useful support for locating information. As implication, our research will help future learners to easily identify desired learning topics of interest in large textual resources.

Authors: Ricardo Kawase, Patrick Siehndel and Bernardo Pereira Nunes

PDF: kawase-icalt2014

A Topic Extraction Process for Online Forums

presentation-icalt2014

Ricardo Kawase presenting @ICALT2014 (picture taken by Mikhail Fominykh)

Forums play a key role in the process of knowledge creation, providing means for users to exchange ideas and to collaborate. However, educational forums, along several others online educational environments, often suffer from topic disruption. Since the contents are mainly produced by participants (in our case learners), one or a few individuals might change the course of the discussions. Thus, realigning the discussed topics of a forum thread is a task often conducted by a tutor or moderator. In order to support learners and tutors to harmonically align forum discussions that are pertinent to a given lecture or course, in this paper, we present a method that combines semantic technologies and a statistical method to find and expose relevant topics to be discussed in online discussion forums.

Authors: Bernardo Pereira Nunes, Alexander Arturo Mera Caraballo, Ricardo Kawase, Besnik Fetahu, Marco Antonio Casanova and Gilda Helena Bernardino De Campos

PDF: nunes-icalt2014

DBLPXplorer: Interactive Graphical Interfaces for the Computer Science Bibliography

presentation-eswc2014

Ricardo Kawase presenting @ESWC2014 LinkedUp – Vidi Challenge

Every year thousands of new research works are indexed and published online. Scientific publications involve mainly two sets of actors; namely, authors and articles. Consequently, a huge tangle of relations emerge together, where authors collaborate with several other authors and articles reference past literature. Due to this complex network, keeping up to date with the latest research in a particular field is often a time consuming task. Currently, available tools to explore such information are solely text based. The information seeker has to search, browse and navigate page by page in order to find relevant research. Yet, one cannot harness an overview of underlying networks and connections. At the same time, there is an abundance of information in the form of nearly disjoint datasets relevant to research and the actors involved in the Linked Open Data cloud. To facilitate the exploration of authors, scientific research and relations, we propose a visual exploratory interface for DBLP Computer Science Bibliography. To further enrich the data we extract authors’ keywords from the articles and additionally annotate each article with identified DBPedia entities. The presentation layer consists of several user friendly exploratory interfaces that utilize state of the art javascript library D3 (Data-Driven Documents). Our interfaces include overview of particular venues, authors’ profiles, scientific articles, relations and a knowledge base of keywords and semantic annotations. To complete our work, we expose all the enriched data as linked data. – See more at: http://linkedup-challenge.org/vidi/#DBLPXplorer

Authors: Ricardo Kawase, Ujwal Gadiraju and Patrick Siehndel

Demo: http://www.l3s.de/~kawase/DBLPXplorer/

Haters gonna hate: job-related offenses in Twitter

In this paper, we aim at finding out which users are likely to publicly demonstrate frustration towards their jobs on the microblogging platform Twitter – we will call these users haters. We show that the profiles of haters have specific characteristics in terms of vocabulary and connections. The implications of these findings may be used for the development of an early alert system that can help users to think twice before they post potentially self-harming content.

Authors: Ricardo Kawase, Patrick Siehndel, and Eelco Herder

PDF: kawase-www2014

FireMe! Website: http://fireme.l3s.uni-hannover.de/