Automatic classification of documents in cold-start

Document classification is key to ensuring quality of any digital library. However, classifying documents is a very time-consuming task. In addition, few or none of the documents in a newly created repository are classified. The non-classification of documents not only prevents users from finding information but also hinders the system’s aptitude to recommend relevant items. Moreover, the lack of classified documents prevents any kind of machine learning algorithm to automatically annotate these items. In this work, we propose a novel approach to automatically classifying documents that differs from previous works in the sense that it exploits the wisdom of the crowds available on the Web. Our proposed strategy adapts an automatic tagging approach combined with a straightforward matching algorithm to classify documents in a given domain classification. To validate our findings, we compared our methods against the existing and performed a user evaluation with 61 participants to estimate the quality of the classifications. Results show that, in 72% of the cases, the automatic classification is relevant and well accepted by participants. In conclusion, automatic classification can facilitate access to relevant documents.

Authors:  Ricardo Kawase, Marco Fisichella, Bernardo Pereira Nunes, Kyung-Hun Ha and Markus Bick

PDF: kawase-wims2013

Client- and Server-side Revisitation Prediction with SUPRA

Users of collaborative applications as well as individual users in their private environment return to previously visited Web pages for various  reasons; apart from pages visited due to backtracking, they typically have a number of favorite or important pages that they monitor or tasks that reoccur on an infrequent basis. In this paper, we introduce a library of methods that facilitate revisitation through the effective prediction of the next page request. It is based on a generic framework that inherently incorporates contextual information, handling uniformly both server- and the client-side applications. Unlike other existing approaches, the methods it encompasses are real-time, since they do not rely on training data or machine learning algorithms. We evaluate them over two large, real-world datasets, with the outcomes suggesting a significant improvement over methods typically used in this context. We have also made our implementation and data publicly available, thus encouraging other researchers to use it as a benchmark and to extend it with new techniques for supporting user’s navigational activity.

Venue: WIMS2012

Authors: George Papadakis, Ricardo Kawase, and Eelco Herder

PDF: papadakis-wims12

User Profile Based Activities in Flexible Processes

COOPER platform is a collaborative, open environment that leverages on the idea of flexible, user-centric process support. It allows cooperating team members to define collaborative processes and flexibly modify the process activities even during process execution. In this paper, we describe how the incorporation of decentralized user data through mashups, allows the COOPER platform to support the definition and execution of the so called user profile based activities, i.e., process activities that are adapted based on the preferences of the process actors. We define two basic types of user profile based activities, namely user adapted activities and user conditional activities. The first are modeled according to the user profile data, while the second employs the same user data to enable automatic workflow decisions.

Venue: WIMS2012

Authors:  Marco Fisichella, Ricardo Kawase, Juri Luca De Coi and Maristella Matera

PDF: fisichella-wims12