Exploiting the Wisdom of the Crowds for Characterizing and Connecting Heterogeneous Resources

presentation-ht2014

Ricardo Kawase presenting @HT2014 in Santiago, Chile (picture taken by Christoph Trattner)

Heterogeneous content is an inherent problem for cross-system search, recommendation and personalization. In this paper we investigate differences in topic coverage and the impact of topics in different kinds of Web services. We use entity extraction and categorization to create fingerprints that allow for meaningful comparison. As a basis taxonomy, we use the 23 main categories of Wikipedia Category Graph, which has been assembled over the years by the wisdom of the crowds. Following a proof of concept of our approach, we analyze differences in topic coverage and topic impact. The results show many differences between Web services like Twitter, Flickr and Delicious, which reflect users’ behavior and the usage of each system. The paper concludes with a user study that demonstrates the benefits of fingerprints over traditional textual methods for recommendations of heterogeneous resources.

Authors: Ricardo Kawase, Patrick Siehndel, Bernardo Pereira Nunes, Eelco Herder and Wolfgang Nejdl

PDF: kawase-ht2014

Online Prototype: http://twikime.l3s.uni-hannover.de

 

To the Point: A Shortcut to Essential Learning

The volume of information on the Web is constantly growing. Consequently, finding specific pieces of information becomes a harder task. Wikipedia, the largest online reference Website is beginning to witness this phenomenon. Learners often turn to Wikipedia in order to learn facts regarding different subjects. However, as time passes, Wikipedia articles get larger and specific information gets more difficult to be located. In this work, we propose an automatic annotation method that is able to precisely assign categories to any textual resource. Our approach relies on semantic enhanced annotations and Wikipedia’s categorization schema. The results of a user study shows that our proposed method provides solid results for classifying text and provides a useful support for locating information. As implication, our research will help future learners to easily identify desired learning topics of interest in large textual resources.

Authors: Ricardo Kawase, Patrick Siehndel and Bernardo Pereira Nunes

PDF: kawase-icalt2014

Automatic classification of documents in cold-start

Document classification is key to ensuring quality of any digital library. However, classifying documents is a very time-consuming task. In addition, few or none of the documents in a newly created repository are classified. The non-classification of documents not only prevents users from finding information but also hinders the system’s aptitude to recommend relevant items. Moreover, the lack of classified documents prevents any kind of machine learning algorithm to automatically annotate these items. In this work, we propose a novel approach to automatically classifying documents that differs from previous works in the sense that it exploits the wisdom of the crowds available on the Web. Our proposed strategy adapts an automatic tagging approach combined with a straightforward matching algorithm to classify documents in a given domain classification. To validate our findings, we compared our methods against the existing and performed a user evaluation with 61 participants to estimate the quality of the classifications. Results show that, in 72% of the cases, the automatic classification is relevant and well accepted by participants. In conclusion, automatic classification can facilitate access to relevant documents.

Authors:  Ricardo Kawase, Marco Fisichella, Bernardo Pereira Nunes, Kyung-Hun Ha and Markus Bick

PDF: kawase-wims2013

TwikiMe! User profiles that make sense.

The use of social media has been rapidly increasing in the last years. Social media, such as Twitter, has become an important source of information for a variety of people. The public availability of data describing some of these social networks has led to a great deal of research in this area. Link prediction, user classification and community detection are some of the main research areas related to social networks. In this paper, we present a user modeling framework that uses Wikipedia to model user interests inside a social network. Our model of user interests reflects the areas a user is interested in, as well as the level of expertise a user has in a certain field.

Authors: Patrick Siehndel, Ricardo Kawase

PDF: siehndel-iswc2012

Online Prototype: http://twikime.l3s.uni-hannover.de/twikime.php

Classification of user interest patterns using a virtual folksonomy

User interest in topics and resources is known to be recurrent and to follow specific patterns, depending on the type of topic or resource. Traditional methods for predicting reoccurring patterns are based on ranking and associative models. In this paper we identify several ‘canonical’ patterns by clustering keywords related to visited resources, making use of a large repository of Web usage data. The keywords are derived from a ‘virtual’ folksonomy of tags assigned to these resources using a collaborative bookmarking system.

Venue: JCDL2011

Authors: Ricardo Kawase and Eelco Herder

PDF: kawase-jcdl2011a