ECIR 2011 in Dublin

ECIR 2011, hosted in Dublin, Ireland, is over. It was a great week with lots of opportunities for scientific exchange, discussion and the occasional pint of fresh Guinness.

While there was a large number of interesting publications presented at the conference, this is a short list of my personal favourite talks:

  • Stephen RobertsonOn the Contribution of Topics to System Evaluation
    An analysis of the usefulness of given topics for IR system evaluation. The conclusion is, that there does not seem to be a general property that makes a topic more or less discriminative in terms of system evaluation.  An interesting example of a “negative results paper” constructing a strong case and being accepted for publication.
  • Jaime Arguello et al.A Methodology for Evaluating Aggregated Search Results (Best Student Paper)
    A method for evaluation of search interfaces that aggregate results from multiple verticals based on pairwise comparison. User studies show a high correlation between the proposed method and holistic, page-wide human judgements.
  • Jagadeesh Jagarlamudi et al.Fractional Similarity: Cross-Lingual Feature Selection for Search
    A method for improving foreign (as in non-English) search result ranking by gathering hints from equivalent rankings in more frequently-observed languages.
  • Kamran Massoudi et al.Incorporating Query Expansion and Quality Indicators in Searching Microblog Posts
    The authors identify a range of features (general textual vs. micro blog-specific ones) to identify credible posts from which subsequent query expansion terms are extracted.
  • Gabriel Dulac-Arnold et al.Text Classification: A Sequential Reading Approach
    The authors present a sequential reading problem of text classification. They propose using a probabilistic model to determine at which point the classification does not improve significantly by reading further lines of the document.
  • Jinyoung Kim et al.An Analysis of Time-Instability in Web Search Results
    A nice survey of the frequency at which top-ranked result lists change for major search engines.
  • Wolfgang GatterbauerRules of Thumb for Information Acquisition from Large and Redundant Data
    An overview of clever sampling heuristics to approximate exhaustive reads of textual collections.
  • Elena Smirnova et al.A User-Oriented Model for Expert Finding (Best Paper)
    The authors introduce novel considerations into expert finding based on the searcher. An expert’s proximity to the searcher (with respect to a social network, organizational chart of an institution or a building’s office layout) as well as the relative expertise given the searcher are discussed.

PuppyIR presented their query assistance demonstrator for children. The open source code base of the demo can be obtained from Sourceforge.


Language and Speech Colloquium at Radboud University Nijmegen

This month’s Language and Speech Colloquium at Radboud University Nijmegen targeted “Internet Data for L&S research”

Carsten Eickhoff – An Introduction to Crowdsourcing

Crowdsourcing is a topic of growing interest for many fields of research and industry. It facilitates outsourcing short and simple tasks which require human cognitive abilities to a large pool of potential workers. Traditional examples of such human intelligence tasks (HITs) are image tagging, text annotation or the creation of quality judgements.

A commonly observed problem in crowdsourcing environments is the presence of workers who strive to maximize their financial efficiency by completing as many HITs as possible in a given period of time. This approach leads to very low overall result quality and often rends the HIT useless for the requester. In most cases it is easy to identify workers who repeatedly produce quick low quality results to increase their financial gain. However, this identification step requires additional resources such as time or money. In our work we demonstrate how innovative and non-repetitive HIT design can a priori discourage sloppy, gain-driven workers from taking up our HITs without having to invest further resources in identification and rejection.

Claudia Peersman (University of Antwerp) – Age and gender prediction on Netlog posts

In recent years social networking sites and chat applications have redefined how people communicate. Millions of people use social networking sites to support their personal and professional communications by creating digital communities. A common characteristic of communication on these online social networks and in chat rooms is that it happens via very short entries using non-standard language variations, which makes this type of text a challenging text genre for natural language processing.

In this presentation we will discuss some of the main characteristics of our corpus of chat texts, which we collected from the Belgian Netlog social networking site and present the results of our exploratory study in which we apply an automatic text categorization approach for the prediction of age and gender in this Netlog corpus. In spite of the challenging characteristics of this type of data, our results show that it is feasible to reach useful performance for both age and gender prediction.